|
楼主

楼主 |
发表于 2011-10-15 07:44:38
|
只看该作者
Fitting Cox Model Using PROC PHREG and Beyond in SAS
From LCChien's blog on blogspot
<a href="http://support.sas.com/resources/papers/proceedings09/236-2009.pdf"><!-- m --><a class="postlink" href="http://support.sas.com/resources/papers/proceedings09/236-2009.pdf">http://support.sas.com/resources/papers ... 6-2009.pdf</a><!-- m --></a><br /><br />Cox PH 模型在倖存分析(survival analysis)被廣泛的使用。一篇發表在SAS GLOBAL FORUM 2009的技術文件解說了在 SAS 底下如何用 Cox PH 模型來計算一些重要估計量的方法,並且提供一個方便的巨集程式來簡化繁複的程式寫作和運行。<br /><br /><a name='more'></a><b><span class="Apple-style-span" style="font-size: large;">C-index</span></b><br />C-index原本是在ROC curve裡面用來解釋預測鑒別能力的測量指標,但後來在Cox PH model裡也可以應用。在倖存分析裡,C-index可以解釋為一個機率,是從某個事件(如:死亡)來的一個病人,他會比非該事件內的任何一個病人都擁有較高的機會會發生該事件。以下列程式來說明如何用 SAS 算出 C-index:<br /><pre><code>proc phreg data=sample;<br /> id idn;<br /> model combdays*combfv(0)=mihx diabhx lowef;<br /> <b><span class="Apple-style-span" style="color: red;">output out=obs survival=surv;</span></b><br />run;</code></pre><br />這個PROC PHREG程序並沒有什麼特別,使用者唯一需要額外做的就是利用一個ODS OUTPUT指令把個體的ID, 觀測到的倖存時間以及倖存函數估計值另存一個新檔(ods)。接著,用下面兩到程序去整理出一個concordance data。<br /><pre><code>proc sql;<br /> create table allset as<br /> select idn_j, y_j, x_j, idn as idn_i, surv as y_i, combdays as x_i<br /> from evtset, obs<br /> where idn_j<>idn;<br />quit;<br />data concord;<br /> set allset;<br /> if (x_i<x_j and="" y_i="">y_j) or (x_i>x_j and y_i<y_j) code="" concord="1;" else="" run;<="" then=""></y_j)></x_j></code></pre><br />然後用下面這一串程式直接把C-index及其95%信賴區間該算出來:<br /><pre><code>data _null_;<br /> set concord end=eof;<br /> retain nch ndh;<br /> if _N_=1 then do;<br /> nch=0;<br /> ndh=0;<br /> end;<br /> if concord=1 then nch+1;<br /> if concord=0 then ndh+1;<br /> if eof=1 then do;<br /> call symput('ch',trim(left(nch)));<br /> call symput('dh',trim(left(ndh)));<br /> call symput('uspairs',trim(left(_n_)));<br /> end;<br />run;<br />data _null_;<br /> set sample end=eof;<br /> if eof=1 then call symput('totobs',trim(left(_n_)));<br />run;<br />%put &ch &dh &uspairs &totobs;<br />data calculat;<br /> ch=input("&ch",12.0);<br /> dh=input("&dh",12.0);<br /> uspairs=input("&uspairs",12.0);<br /> totobs=input("&totobs",10.0);<br /> pc=ch/(totobs*(totobs-1));<br /> pd=dh/(totobs*(totobs-1));<br /> c_hat=pc/(pc+pd);<br /> w=(2*1.96**2)/(totobs*(pc+pd));<br /> low_ci_w=((w+2*c_hat)/(2*(1+w)))-(sqrt((w**2+4*w*c_hat*(1-c_hat))/(2*(1+w))));<br /> upper_ci_w=((w+2*c_hat)/(2*(1+w)))+(sqrt((w**2+4*w*c_hat*(1-c_hat))/(2*(1+w))));<br />run;</code></pre><br />結果如下圖所示:<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-OQR68PHZBt0/Tpi7LuG6WDI/AAAAAAAAGWE/x52sAZY4WsQ/s1600/sugi1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="116" src="http://4.bp.blogspot.com/-OQR68PHZBt0/Tpi7LuG6WDI/AAAAAAAAGWE/x52sAZY4WsQ/s640/sugi1.png" width="640" /></a></div><br /><b><span class="Apple-style-span" style="font-size: large;">Calculating adjusted survival rates using corrected group prognosis method</span></b><br />作者自製了一段巨集程式 %CGPM_adj 可將倖存機率以及倖存曲線圖直接輸出。其原始碼如下:<br /><pre><code>*****************************************************************************;<br />* Program: CGPM_adj.sas;<br />* Purpose: Calculate the unadjusted and adjusted survival rates and plot the<br />*survival curves using corrected group prognosis method;<br />*****************************************************************************;<br />*----------------------------------------------------------------------------;<br />* 1. All variables used in the model have to be binary variable.<br />*Categorical and numeric variables need to be re-coded as dummy<br />*variable to meet this requirement;<br />* 2. The maximum number of variables in the model is limited to 20;<br />*(not including the control variable). But this number can be<br />*easily expanded as needed.;<br />* 3. The analysis dataset for modeling was assign to be in SAS<br />*default library WORK. ;<br />*----------------------------------------------------------------------------;<br />%macro CGPM_adj(outputpath, figname, tbname, dsn, covars, ctrlvar, outcom, cnsrval=0, timevar, rlalpha=0.05);<br />*----------------------------------------------------------------------------;<br />* Calculate unadjusted survival rate;<br />*----------------------------------------------------------------------------;<br />*create dataset for controlled variable;<br />proc sql;<br /> create table ctrlset as<br /> select distinct &ctrlvar<br /> from &dsn<br /> order by &ctrlvar;<br />quit;<br />*get the unadjusted survival rates for controlled variable;<br />proc phreg data=&dsn noprint;<br /> model &timevar * &outcom (&cnsrval) = &ctrlvar;<br /> baseline covariates=ctrlset out=unadjset survival=survival / nomean;<br />run;<br /><br />*----------------------------------------------------------------------------;<br />* Calculate adjusted survival rates using corrected group prognosis method;<br />*----------------------------------------------------------------------------;<br />*create identifier set for all possible combinations of covariates;<br />data _null_;<br /> cnt=1+count("&covars",' ');<br /> call symputx('varcnt', cnt);<br />run;<br />%put &varcnt;<br /><br />%macro idx;<br /> %let i=1;<br /> %do %until(%scan(&covars,&i,' ')=%str());<br /> %local varnam;<br /> %let varnam=%scan(&covars,&i,' ');<br /> xp=&i+1;<br /> if &varnam=1 then substr(idx,xp,1)=1;<br /> %let i=%eval(&i+1);<br /> %end;<br />%mend;<br />%macro idxrv;<br /> %let i=1;<br /> %do %until(%scan(&covars,&i,' ')=%str());<br /> %local varnam;<br /> %let varnam=%scan(&covars,&i,' ');<br /> xp=&i+1;<br /> if substr(idx,xp,1)='1' then &varnam=1; else &varnam=0;<br /> %let i=%eval(&i+1);<br /> %end;<br />%mend;<br /><br />data modelset;<br /> set &dsn;<br /> idx=left(put(10**(input("&varcnt",2.0)),21.));<br /> %idx;<br /> keep idx &ctrlvar &covars &timevar &outcom;<br />run;<br />proc freq data=modelset noprint;<br /> tables idx / out=idx;<br />run;<br />data popset (drop=percent xp);<br /> if _n_=1 then do until(last);<br /> set idx end=last;<br /> &ctrlvar=0;<br /> %idxrv;<br /> output;<br /> end;<br /> set idx;<br /> &ctrlvar=1;<br /> %idxrv;<br /> output;<br />run;<br />ods output censoredsummary=censum parameterEstimates=parasum;<br />proc phreg data=modelset;<br /> id idx;<br /> model &timevar * &outcom (&cnsrval) = &ctrlvar &covars / rl alpha=&rlalpha;<br /> baseline covariates=popset out=adjset0 survival=survival / nomean;<br />run;<br />ods output close;<br />proc sort data=popset out=count (keep=idx count);<br />by idx;<br />run;<br />proc sort data=adjset0;<br />by idx;<br />run;<br />data adjset1;<br /> merge adjset0 count;<br /> by idx;<br />run;<br />proc sort data=adjset1;<br /> by &ctrlvar &timevar;<br />run;<br />data adjset;<br /> set adjset1;<br /> by &ctrlvar &timevar;<br /> if first.&timevar then frqsum=0;<br /> frqsum+count;<br /> if first.&timevar then sursum=0;<br /> sursum+survival*count;<br /> if last.&timevar;<br /> adjsurv=sursum/frqsum;<br /> keep &ctrlvar &timevar adjsurv;<br />run;<br />data allsurv;<br /> length group $22;<br /> set adjset (in=a rename=(adjsurv=survival)) unadjset (in=b);<br /> if a=1 and &ctrlvar=0 then group='Adjusted'||' '||"&ctrlvar"||'=0';<br /> else if a=1 and &ctrlvar=1 then group='Adjusted'||' '||"&ctrlvar"||'=1';<br /> else if b=1 and &ctrlvar=0 then group='Unadjusted'||'<br />'||"&ctrlvar"||'=0';<br /> else if b=1 and &ctrlvar=1 then group='Unadjusted'||'<br />'||"&ctrlvar"||'=1';<br /> eventrate=1-survival;<br />run;<br />data summary;<br /> set allsurv;<br /> by group;<br /> if last.group;<br />run;<br />*----------------------------------------------------------------------------;<br />* You can change the code below to modify the title and graph axis scale etc.;<br />*----------------------------------------------------------------------------;<br />goption reset=all device=emf gsfname=grfa hsize=6.0 vsize=4 ftext='Arial/bo' htext=11pt display;<br />filename grfa "&outputpath.\&figname..emf";<br /><br />symbol1 i=steplj v=none l=1 ci=green w=1;<br />symbol2 i=steplj v=none l=1 ci=black w=1;<br />symbol3 i=steplj v=none l=3 ci=green w=1;<br />symbol4 i=steplj v=none l=3 ci=black w=1;<br />axis1 offset=(0.5 cm,0.5 cm);<br /><br />axis2 label=(angle=90 j=c 'Survival') order=(0.8 to 1 by 0.05) offset=(0.0cm,0.0 cm);<br /> legend1 across=2<br /> mode=share<br /> position=(bottom left inside)<br /> label=none<br /> value=(j=l h=10pt)<br /> offset=(0.5cm, 0cm);<br />proc gplot data=allsurv;<br /> plot survival * &timevar = group / legend=legend1 haxis=axis1 vaxis=axis2;<br /> label eventrate='Survival Rate' &timevar='Days After Randomization';<br />run;<br />quit;<br />title;<br />footnote;<br />ods rtf style=journal file="&outputpath.\&tbname..rtf" bodytitle<br />startpage=no;<br />proc print data=parasum noobs;<br />title "Analysis of Maximum Likelihood Estimates (alpha=&rlalpha for CI)";<br />run;<br />proc print data=summary noobs;<br />var group survival eventrate;<br />title 'Estimated survival and event rates at the time of last event observed';<br />run;<br />ods rtf close;<br />%mend;</code></pre><br />巨集參數定義如下:<br /><br /><ul><li>outputpath=報表與圖形輸出的路徑 </li><li>figname=圖形檔名</li><li>tbname=報表檔名</li><li>dsn=存放在SAS裡面的資料檔名</li><li>covars=所有預測變數的名稱,不包含控制變數的名稱</li><li>ctrlvar=控制變數的名稱</li><li>outcom=輸出變數(事件)的名稱</li><li>cnsrval=設限的代碼,預設值為0</li><li>timevar=時間變數名稱</li><li>rlalpha=顯著水準alpha,預設值為0.05</li></ul><br />承上面的範例,巨集程式執行的寫法如下:<br /><pre><code>%CGPM_adj(outputpath=L:\SGF2009\Examples, <br /> figname=Adjdiab_fig, <br /> tbname=adjdiab_summary, <br /> dsn=sample, <br /> covars=age65 lowef chfhx, <br /> ctrlvar=diabhx, <br /> outcom=deathfv, <br /> cnsrval=0, <br /> timevar=deathdays, <br /> rlalpha=0.05);</code></pre><br />結果如下:<br /><a href="http://3.bp.blogspot.com/-WP_3r9uILc0/Tpi7ODu2EII/AAAAAAAAGWM/k5vCMqlhcLA/s1600/sugi2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="316" src="http://3.bp.blogspot.com/-WP_3r9uILc0/Tpi7ODu2EII/AAAAAAAAGWM/k5vCMqlhcLA/s640/sugi2.png" width="640" /></a><a href="http://4.bp.blogspot.com/-P4gYoTawjWk/Tpi7OppvWCI/AAAAAAAAGWU/K1g9zoCfTs8/s1600/sugi3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="408" src="http://4.bp.blogspot.com/-P4gYoTawjWk/Tpi7OppvWCI/AAAAAAAAGWU/K1g9zoCfTs8/s640/sugi3.png" width="640" /></a><br /><b><span class="Apple-style-span" style="font-size: large;"><br /></span></b><br /><b><span class="Apple-style-span" style="font-size: large;">Presenting the effect of a continuous covariate on estimated survival</span></b><br />當模型內包含連續型變數時,需要額外多兩個步驟來計算n-year倖存率。首先,用PROC PHREG程序把基底函數(baseline function)等一干相關的估計量另存成兩個新檔 yrout & sample(紅色部分):<br /><pre><code>proc phreg data=sample;<br />model combdays*combfv(0)=age male;<br /><b><span class="Apple-style-span" style="color: red;">baseline out=yrout lower=low_ci upper=up_ci survival=surv covariates=sample/alpha=0.05 nomean;</span></b><br />run;</code></pre><br />再依照想要分類的變數(male),去和倖存時間變數去排序(combdays)後再另存新檔(maxday):<br /><pre><code>proc sort data=sample out=maxday (where=(combfv=1) keep=idn male combfv combdays);<br /> by male combdays;<br />run;</code></pre><br />然後利用下面兩個資料步驟程序去計算男人和女人的5-year倖存率:<br /><pre><code>data _null_;<br /> set maxday;<br /> by male;<br /> if male=0 and last.male then call symput('lastday_f',combdays);<br /> if male=1 and last.male then call symput('lastday_m',combdays);<br />run;<br />%put &lastday_f &lastday_m;<br />data yrest;<br /> set yrout;<br /> evtrate=(1-surv)*100;<br /> evtlow=(1-low_ci)*100;<br /> evtup=(1-up_ci)*100;<br /> if combdays=&lastday_m and male=1 then output;<br /> if combdays=&lastday_f and male=0 then output;<br />run;</code></pre><br />最後畫出的圖形如下所示:<br /><a href="http://3.bp.blogspot.com/-Bc6oGdJryz0/Tpi7O0tnWXI/AAAAAAAAGWc/Absq1KL9l3E/s1600/sugi4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="362" src="http://3.bp.blogspot.com/-Bc6oGdJryz0/Tpi7O0tnWXI/AAAAAAAAGWc/Absq1KL9l3E/s640/sugi4.png" width="640" /></a><br />(PS) 原文內並沒有附此圖的繪製程式。<br /><br /><b><span class="Apple-style-span" style="font-size: large;">Contact information</span></b> <br />Lea Liu, Maryland Medical Research Institute<br />600 Wyndhurst Ave. Baltimore, MD 21210<br />(410) 435-4200, Email: <!-- e --><a href="mailto:lliu@mmri.org">lliu@mmri.org</a><!-- e -->, Web: <!-- w --><a class="postlink" href="http://www.mmri.org">www.mmri.org</a><!-- w --><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6268919072942670865-4224919921175314040?l=sugiclub.blogspot.com' alt='' /></div> |
|