SAS中文论坛

 找回密码
 立即注册

扫一扫,访问微社区

查看: 4900|回复: 0
打印 上一主题 下一主题

Fitting Cox Model Using PROC PHREG and Beyond in SAS

[复制链接]

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
楼主
 楼主| 发表于 2011-10-15 07:44:38 | 只看该作者

Fitting Cox Model Using PROC PHREG and Beyond in SAS

From LCChien's blog on blogspot

<a href="http://support.sas.com/resources/papers/proceedings09/236-2009.pdf"><!-- m --><a class="postlink" href="http://support.sas.com/resources/papers/proceedings09/236-2009.pdf">http://support.sas.com/resources/papers ... 6-2009.pdf</a><!-- m --></a><br /><br />Cox PH 模型在倖存分析(survival analysis)被廣泛的使用。一篇發表在SAS GLOBAL FORUM 2009的技術文件解說了在 SAS 底下如何用 Cox PH 模型來計算一些重要估計量的方法,並且提供一個方便的巨集程式來簡化繁複的程式寫作和運行。<br /><br /><a name='more'></a><b><span class="Apple-style-span" style="font-size: large;">C-index</span></b><br />C-index原本是在ROC curve裡面用來解釋預測鑒別能力的測量指標,但後來在Cox PH model裡也可以應用。在倖存分析裡,C-index可以解釋為一個機率,是從某個事件(如:死亡)來的一個病人,他會比非該事件內的任何一個病人都擁有較高的機會會發生該事件。以下列程式來說明如何用 SAS 算出 C-index:<br /><pre><code>proc phreg data=sample;<br />  id idn;<br />  model combdays*combfv(0)=mihx diabhx lowef;<br />  <b><span class="Apple-style-span" style="color: red;">output out=obs survival=surv;</span></b><br />run;</code></pre><br />這個PROC PHREG程序並沒有什麼特別,使用者唯一需要額外做的就是利用一個ODS OUTPUT指令把個體的ID, 觀測到的倖存時間以及倖存函數估計值另存一個新檔(ods)。接著,用下面兩到程序去整理出一個concordance data。<br /><pre><code>proc sql;<br />      create table allset as<br />      select idn_j, y_j, x_j, idn as idn_i, surv as y_i, combdays as x_i<br />      from evtset, obs<br />      where idn_j&lt;&gt;idn;<br />quit;<br />data concord;<br />      set allset;<br />      if (x_i<x_j and="" y_i="">y_j) or (x_i&gt;x_j and y_i<y_j) code="" concord="1;" else="" run;<="" then=""></y_j)></x_j></code></pre><br />然後用下面這一串程式直接把C-index及其95%信賴區間該算出來:<br /><pre><code>data _null_;<br />    set concord end=eof;<br />    retain nch ndh;<br />    if _N_=1 then do;<br />      nch=0;<br />      ndh=0;<br />      end;<br />    if concord=1 then nch+1;<br />    if concord=0 then ndh+1;<br />    if eof=1 then do;<br />      call symput('ch',trim(left(nch)));<br />      call symput('dh',trim(left(ndh)));<br />      call symput('uspairs',trim(left(_n_)));<br />      end;<br />run;<br />data _null_;<br />     set sample end=eof;<br />     if eof=1 then call symput('totobs',trim(left(_n_)));<br />run;<br />%put &amp;ch &amp;dh &amp;uspairs &amp;totobs;<br />data calculat;<br />    ch=input("&amp;ch",12.0);<br />    dh=input("&amp;dh",12.0);<br />    uspairs=input("&amp;uspairs",12.0);<br />    totobs=input("&amp;totobs",10.0);<br />    pc=ch/(totobs*(totobs-1));<br />    pd=dh/(totobs*(totobs-1));<br />    c_hat=pc/(pc+pd);<br />    w=(2*1.96**2)/(totobs*(pc+pd));<br />    low_ci_w=((w+2*c_hat)/(2*(1+w)))-(sqrt((w**2+4*w*c_hat*(1-c_hat))/(2*(1+w))));<br />    upper_ci_w=((w+2*c_hat)/(2*(1+w)))+(sqrt((w**2+4*w*c_hat*(1-c_hat))/(2*(1+w))));<br />run;</code></pre><br />結果如下圖所示:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-OQR68PHZBt0/Tpi7LuG6WDI/AAAAAAAAGWE/x52sAZY4WsQ/s1600/sugi1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="116" src="http://4.bp.blogspot.com/-OQR68PHZBt0/Tpi7LuG6WDI/AAAAAAAAGWE/x52sAZY4WsQ/s640/sugi1.png" width="640" /></a></div><br /><b><span class="Apple-style-span" style="font-size: large;">Calculating adjusted survival rates using corrected group prognosis method</span></b><br />作者自製了一段巨集程式 %CGPM_adj 可將倖存機率以及倖存曲線圖直接輸出。其原始碼如下:<br /><pre><code>*****************************************************************************;<br />* Program: CGPM_adj.sas;<br />* Purpose: Calculate the unadjusted and adjusted survival rates and plot the<br />*survival curves using corrected group prognosis method;<br />*****************************************************************************;<br />*----------------------------------------------------------------------------;<br />* 1. All variables used in the model have to be binary variable.<br />*Categorical and numeric variables need to be re-coded as dummy<br />*variable to meet this requirement;<br />* 2. The maximum number of variables in the model is limited to 20;<br />*(not including the control variable). But this number can be<br />*easily expanded as needed.;<br />* 3. The analysis dataset for modeling was assign to be in SAS<br />*default library WORK. ;<br />*----------------------------------------------------------------------------;<br />%macro CGPM_adj(outputpath, figname, tbname, dsn, covars, ctrlvar, outcom, cnsrval=0, timevar, rlalpha=0.05);<br />*----------------------------------------------------------------------------;<br />* Calculate unadjusted survival rate;<br />*----------------------------------------------------------------------------;<br />*create dataset for controlled variable;<br />proc sql;<br />      create table ctrlset as<br />      select distinct &amp;ctrlvar<br />      from &amp;dsn<br />      order by &amp;ctrlvar;<br />quit;<br />*get the unadjusted survival rates for controlled variable;<br />proc phreg data=&amp;dsn noprint;<br />      model &amp;timevar * &amp;outcom (&amp;cnsrval) = &amp;ctrlvar;<br />      baseline covariates=ctrlset out=unadjset survival=survival / nomean;<br />run;<br /><br />*----------------------------------------------------------------------------;<br />* Calculate adjusted survival rates using corrected group prognosis method;<br />*----------------------------------------------------------------------------;<br />*create identifier set for all possible combinations of covariates;<br />data _null_;<br />      cnt=1+count("&amp;covars",' ');<br />      call symputx('varcnt', cnt);<br />run;<br />%put &amp;varcnt;<br /><br />%macro idx;<br />      %let i=1;<br />      %do %until(%scan(&amp;covars,&amp;i,' ')=%str());<br />      %local varnam;<br />            %let varnam=%scan(&amp;covars,&amp;i,' ');<br />            xp=&amp;i+1;<br />            if &amp;varnam=1 then substr(idx,xp,1)=1;<br />      %let i=%eval(&amp;i+1);<br />      %end;<br />%mend;<br />%macro idxrv;<br />      %let i=1;<br />      %do %until(%scan(&amp;covars,&amp;i,' ')=%str());<br />      %local varnam;<br />            %let varnam=%scan(&amp;covars,&amp;i,' ');<br />            xp=&amp;i+1;<br />            if substr(idx,xp,1)='1' then &amp;varnam=1; else &amp;varnam=0;<br />      %let i=%eval(&amp;i+1);<br />      %end;<br />%mend;<br /><br />data modelset;<br />      set &amp;dsn;<br />      idx=left(put(10**(input("&amp;varcnt",2.0)),21.));<br />      %idx;<br />      keep idx &amp;ctrlvar &amp;covars &amp;timevar &amp;outcom;<br />run;<br />proc freq data=modelset noprint;<br />      tables idx / out=idx;<br />run;<br />data popset (drop=percent xp);<br />      if _n_=1 then do until(last);<br />            set idx end=last;<br />            &amp;ctrlvar=0;<br />            %idxrv;<br />            output;<br />            end;<br />      set idx;<br />            &amp;ctrlvar=1;<br />            %idxrv;<br />            output;<br />run;<br />ods output censoredsummary=censum parameterEstimates=parasum;<br />proc phreg data=modelset;<br />      id idx;<br />      model &amp;timevar * &amp;outcom (&amp;cnsrval) = &amp;ctrlvar &amp;covars / rl alpha=&amp;rlalpha;<br />      baseline covariates=popset out=adjset0 survival=survival / nomean;<br />run;<br />ods output close;<br />proc sort data=popset out=count (keep=idx count);<br />by idx;<br />run;<br />proc sort data=adjset0;<br />by idx;<br />run;<br />data adjset1;<br />      merge adjset0 count;<br />      by idx;<br />run;<br />proc sort data=adjset1;<br />      by &amp;ctrlvar &amp;timevar;<br />run;<br />data adjset;<br />      set adjset1;<br />      by &amp;ctrlvar &amp;timevar;<br />            if first.&amp;timevar then frqsum=0;<br />            frqsum+count;<br />            if first.&amp;timevar then sursum=0;<br />            sursum+survival*count;<br />      if last.&amp;timevar;<br />      adjsurv=sursum/frqsum;<br />      keep &amp;ctrlvar &amp;timevar adjsurv;<br />run;<br />data allsurv;<br />      length group $22;<br />      set adjset (in=a rename=(adjsurv=survival)) unadjset (in=b);<br />      if a=1 and &amp;ctrlvar=0 then group='Adjusted'||' '||"&amp;ctrlvar"||'=0';<br />     else if a=1 and &amp;ctrlvar=1 then group='Adjusted'||' '||"&amp;ctrlvar"||'=1';<br />      else if b=1 and &amp;ctrlvar=0 then group='Unadjusted'||'<br />'||"&amp;ctrlvar"||'=0';<br />      else if b=1 and &amp;ctrlvar=1 then group='Unadjusted'||'<br />'||"&amp;ctrlvar"||'=1';<br />      eventrate=1-survival;<br />run;<br />data summary;<br />      set allsurv;<br />      by group;<br />      if last.group;<br />run;<br />*----------------------------------------------------------------------------;<br />* You can change the code below to modify the title and graph axis scale etc.;<br />*----------------------------------------------------------------------------;<br />goption reset=all device=emf gsfname=grfa hsize=6.0 vsize=4 ftext='Arial/bo' htext=11pt display;<br />filename grfa "&amp;outputpath.\&amp;figname..emf";<br /><br />symbol1 i=steplj v=none l=1 ci=green w=1;<br />symbol2 i=steplj v=none l=1 ci=black w=1;<br />symbol3 i=steplj v=none l=3 ci=green w=1;<br />symbol4 i=steplj v=none l=3 ci=black w=1;<br />axis1 offset=(0.5 cm,0.5 cm);<br /><br />axis2 label=(angle=90 j=c 'Survival') order=(0.8 to 1 by 0.05) offset=(0.0cm,0.0 cm);<br />   legend1 across=2<br />   mode=share<br />   position=(bottom left inside)<br />   label=none<br />   value=(j=l h=10pt)<br />   offset=(0.5cm, 0cm);<br />proc gplot data=allsurv;<br />   plot survival * &amp;timevar = group / legend=legend1 haxis=axis1 vaxis=axis2;<br />   label eventrate='Survival Rate' &amp;timevar='Days After Randomization';<br />run;<br />quit;<br />title;<br />footnote;<br />ods rtf style=journal file="&amp;outputpath.\&amp;tbname..rtf" bodytitle<br />startpage=no;<br />proc print data=parasum noobs;<br />title "Analysis of Maximum Likelihood Estimates (alpha=&amp;rlalpha for CI)";<br />run;<br />proc print data=summary noobs;<br />var group survival eventrate;<br />title 'Estimated survival and event rates at the time of last event observed';<br />run;<br />ods rtf close;<br />%mend;</code></pre><br />巨集參數定義如下:<br /><br /><ul><li>outputpath=報表與圖形輸出的路徑&nbsp;</li><li>figname=圖形檔名</li><li>tbname=報表檔名</li><li>dsn=存放在SAS裡面的資料檔名</li><li>covars=所有預測變數的名稱,不包含控制變數的名稱</li><li>ctrlvar=控制變數的名稱</li><li>outcom=輸出變數(事件)的名稱</li><li>cnsrval=設限的代碼,預設值為0</li><li>timevar=時間變數名稱</li><li>rlalpha=顯著水準alpha,預設值為0.05</li></ul><br />承上面的範例,巨集程式執行的寫法如下:<br /><pre><code>%CGPM_adj(outputpath=L:\SGF2009\Examples, <br />          figname=Adjdiab_fig, <br />          tbname=adjdiab_summary, <br />          dsn=sample, <br />          covars=age65 lowef chfhx, <br />          ctrlvar=diabhx, <br />          outcom=deathfv, <br />          cnsrval=0, <br />          timevar=deathdays, <br />          rlalpha=0.05);</code></pre><br />結果如下:<br /><a href="http://3.bp.blogspot.com/-WP_3r9uILc0/Tpi7ODu2EII/AAAAAAAAGWM/k5vCMqlhcLA/s1600/sugi2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="316" src="http://3.bp.blogspot.com/-WP_3r9uILc0/Tpi7ODu2EII/AAAAAAAAGWM/k5vCMqlhcLA/s640/sugi2.png" width="640" /></a><a href="http://4.bp.blogspot.com/-P4gYoTawjWk/Tpi7OppvWCI/AAAAAAAAGWU/K1g9zoCfTs8/s1600/sugi3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="408" src="http://4.bp.blogspot.com/-P4gYoTawjWk/Tpi7OppvWCI/AAAAAAAAGWU/K1g9zoCfTs8/s640/sugi3.png" width="640" /></a><br /><b><span class="Apple-style-span" style="font-size: large;"><br /></span></b><br /><b><span class="Apple-style-span" style="font-size: large;">Presenting the effect of a continuous covariate on estimated survival</span></b><br />當模型內包含連續型變數時,需要額外多兩個步驟來計算n-year倖存率。首先,用PROC PHREG程序把基底函數(baseline function)等一干相關的估計量另存成兩個新檔 yrout &amp; sample(紅色部分):<br /><pre><code>proc phreg data=sample;<br />model combdays*combfv(0)=age male;<br /><b><span class="Apple-style-span" style="color: red;">baseline out=yrout lower=low_ci upper=up_ci survival=surv covariates=sample/alpha=0.05 nomean;</span></b><br />run;</code></pre><br />再依照想要分類的變數(male),去和倖存時間變數去排序(combdays)後再另存新檔(maxday):<br /><pre><code>proc sort data=sample out=maxday (where=(combfv=1) keep=idn male combfv combdays);<br />      by male combdays;<br />run;</code></pre><br />然後利用下面兩個資料步驟程序去計算男人和女人的5-year倖存率:<br /><pre><code>data _null_;<br />      set maxday;<br />      by male;<br />      if male=0 and last.male then call symput('lastday_f',combdays);<br />      if male=1 and last.male then call symput('lastday_m',combdays);<br />run;<br />%put &amp;lastday_f &amp;lastday_m;<br />data yrest;<br />      set yrout;<br />      evtrate=(1-surv)*100;<br />      evtlow=(1-low_ci)*100;<br />      evtup=(1-up_ci)*100;<br />      if combdays=&amp;lastday_m and male=1 then output;<br />      if combdays=&amp;lastday_f and male=0 then output;<br />run;</code></pre><br />最後畫出的圖形如下所示:<br /><a href="http://3.bp.blogspot.com/-Bc6oGdJryz0/Tpi7O0tnWXI/AAAAAAAAGWc/Absq1KL9l3E/s1600/sugi4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="362" src="http://3.bp.blogspot.com/-Bc6oGdJryz0/Tpi7O0tnWXI/AAAAAAAAGWc/Absq1KL9l3E/s640/sugi4.png" width="640" /></a><br />(PS) 原文內並沒有附此圖的繪製程式。<br /><br /><b><span class="Apple-style-span" style="font-size: large;">Contact information</span></b> <br />Lea Liu, &nbsp;Maryland Medical Research Institute<br />600 Wyndhurst Ave. Baltimore, MD 21210<br />(410) 435-4200, &nbsp;Email: <!-- e --><a href="mailto:lliu@mmri.org">lliu@mmri.org</a><!-- e -->, &nbsp;Web: <!-- w --><a class="postlink" href="http://www.mmri.org">www.mmri.org</a><!-- w --><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6268919072942670865-4224919921175314040?l=sugiclub.blogspot.com' alt='' /></div>
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|小黑屋|手机版|Archiver|SAS中文论坛  

GMT+8, 2025-5-7 23:18 , Processed in 0.092961 second(s), 20 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表