Fitting Cox Model Using PROC PHREG and Beyond in SAS

shiyiming · 发表于 2011-10-15 07:44:38

From LCChien's blog on blogspot

<a href="http://support.sas.com/resources/papers/proceedings09/236-2009.pdf"><a class="postlink" href="http://support.sas.com/resources/papers/proceedings09/236-2009.pdf">http://support.sas.com/resources/papers ... 6-2009.pdf</a></a> Cox PH 模型在倖存分析（survival analysis）被廣泛的使用。一篇發表在SAS GLOBAL FORUM 2009的技術文件解說了在 SAS 底下如何用 Cox PH 模型來計算一些重要估計量的方法，並且提供一個方便的巨集程式來簡化繁複的程式寫作和運行。 <a name='more'></a>C-index C-index原本是在ROC curve裡面用來解釋預測鑒別能力的測量指標，但後來在Cox PH model裡也可以應用。在倖存分析裡，C-index可以解釋為一個機率，是從某個事件(如：死亡)來的一個病人，他會比非該事件內的任何一個病人都擁有較高的機會會發生該事件。以下列程式來說明如何用 SAS 算出 C-index： <pre><code>proc phreg data=sample; id idn; model combdays*combfv(0)=mihx diabhx lowef; output out=obs survival=surv; run;</code></pre> 這個PROC PHREG程序並沒有什麼特別，使用者唯一需要額外做的就是利用一個ODS OUTPUT指令把個體的ID, 觀測到的倖存時間以及倖存函數估計值另存一個新檔(ods)。接著，用下面兩到程序去整理出一個concordance data。 <pre><code>proc sql; create table allset as select idn_j, y_j, x_j, idn as idn_i, surv as y_i, combdays as x_i from evtset, obs where idn_j<>idn; quit; data concord; set allset; if (x_i<x_j and="" y_i="">y_j) or (x_i>x_j and y_i<y_j) code="" concord="1;" else="" run;<="" then=""></y_j)></x_j></code></pre> 然後用下面這一串程式直接把C-index及其95%信賴區間該算出來： <pre><code>data _null_; set concord end=eof; retain nch ndh; if _N_=1 then do; nch=0; ndh=0; end; if concord=1 then nch+1; if concord=0 then ndh+1; if eof=1 then do; call symput('ch',trim(left(nch))); call symput('dh',trim(left(ndh))); call symput('uspairs',trim(left(_n_))); end; run; data _null_; set sample end=eof; if eof=1 then call symput('totobs',trim(left(_n_))); run; %put &ch &dh &uspairs &totobs; data calculat; ch=input("&ch",12.0); dh=input("&dh",12.0); uspairs=input("&uspairs",12.0); totobs=input("&totobs",10.0); pc=ch/(totobs*(totobs-1)); pd=dh/(totobs*(totobs-1)); c_hat=pc/(pc+pd); w=(2*1.96**2)/(totobs*(pc+pd)); low_ci_w=((w+2*c_hat)/(2*(1+w)))-(sqrt((w**2+4*w*c_hat*(1-c_hat))/(2*(1+w)))); upper_ci_w=((w+2*c_hat)/(2*(1+w)))+(sqrt((w**2+4*w*c_hat*(1-c_hat))/(2*(1+w)))); run;</code></pre> 結果如下圖所示： <div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-OQR68PHZBt0/Tpi7LuG6WDI/AAAAAAAAGWE/x52sAZY4WsQ/s1600/sugi1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="116" src="http://4.bp.blogspot.com/-OQR68PHZBt0/Tpi7LuG6WDI/AAAAAAAAGWE/x52sAZY4WsQ/s640/sugi1.png" width="640" /></a></div> Calculating adjusted survival rates using corrected group prognosis method 作者自製了一段巨集程式 %CGPM_adj 可將倖存機率以及倖存曲線圖直接輸出。其原始碼如下： <pre><code>*****************************************************************************; * Program: CGPM_adj.sas; * Purpose: Calculate the unadjusted and adjusted survival rates and plot the *survival curves using corrected group prognosis method; *****************************************************************************; *----------------------------------------------------------------------------; * 1. All variables used in the model have to be binary variable. *Categorical and numeric variables need to be re-coded as dummy *variable to meet this requirement; * 2. The maximum number of variables in the model is limited to 20; *(not including the control variable). But this number can be *easily expanded as needed.; * 3. The analysis dataset for modeling was assign to be in SAS *default library WORK. ; *----------------------------------------------------------------------------; %macro CGPM_adj(outputpath, figname, tbname, dsn, covars, ctrlvar, outcom, cnsrval=0, timevar, rlalpha=0.05); *----------------------------------------------------------------------------; * Calculate unadjusted survival rate; *----------------------------------------------------------------------------; *create dataset for controlled variable; proc sql; create table ctrlset as select distinct &ctrlvar from &dsn order by &ctrlvar; quit; *get the unadjusted survival rates for controlled variable; proc phreg data=&dsn noprint; model &timevar * &outcom (&cnsrval) = &ctrlvar; baseline covariates=ctrlset out=unadjset survival=survival / nomean; run; *----------------------------------------------------------------------------; * Calculate adjusted survival rates using corrected group prognosis method; *----------------------------------------------------------------------------; *create identifier set for all possible combinations of covariates; data _null_; cnt=1+count("&covars",' '); call symputx('varcnt', cnt); run; %put &varcnt; %macro idx; %let i=1; %do %until(%scan(&covars,&i,' ')=%str()); %local varnam; %let varnam=%scan(&covars,&i,' '); xp=&i+1; if &varnam=1 then substr(idx,xp,1)=1; %let i=%eval(&i+1); %end; %mend; %macro idxrv; %let i=1; %do %until(%scan(&covars,&i,' ')=%str()); %local varnam; %let varnam=%scan(&covars,&i,' '); xp=&i+1; if substr(idx,xp,1)='1' then &varnam=1; else &varnam=0; %let i=%eval(&i+1); %end; %mend; data modelset; set &dsn; idx=left(put(10**(input("&varcnt",2.0)),21.)); %idx; keep idx &ctrlvar &covars &timevar &outcom; run; proc freq data=modelset noprint; tables idx / out=idx; run; data popset (drop=percent xp); if _n_=1 then do until(last); set idx end=last; &ctrlvar=0; %idxrv; output; end; set idx; &ctrlvar=1; %idxrv; output; run; ods output censoredsummary=censum parameterEstimates=parasum; proc phreg data=modelset; id idx; model &timevar * &outcom (&cnsrval) = &ctrlvar &covars / rl alpha=&rlalpha; baseline covariates=popset out=adjset0 survival=survival / nomean; run; ods output close; proc sort data=popset out=count (keep=idx count); by idx; run; proc sort data=adjset0; by idx; run; data adjset1; merge adjset0 count; by idx; run; proc sort data=adjset1; by &ctrlvar &timevar; run; data adjset; set adjset1; by &ctrlvar &timevar; if first.&timevar then frqsum=0; frqsum+count; if first.&timevar then sursum=0; sursum+survival*count; if last.&timevar; adjsurv=sursum/frqsum; keep &ctrlvar &timevar adjsurv; run; data allsurv; length group $22; set adjset (in=a rename=(adjsurv=survival)) unadjset (in=b); if a=1 and &ctrlvar=0 then group='Adjusted'||' '||"&ctrlvar"||'=0'; else if a=1 and &ctrlvar=1 then group='Adjusted'||' '||"&ctrlvar"||'=1'; else if b=1 and &ctrlvar=0 then group='Unadjusted'||' '||"&ctrlvar"||'=0'; else if b=1 and &ctrlvar=1 then group='Unadjusted'||' '||"&ctrlvar"||'=1'; eventrate=1-survival; run; data summary; set allsurv; by group; if last.group; run; *----------------------------------------------------------------------------; * You can change the code below to modify the title and graph axis scale etc.; *----------------------------------------------------------------------------; goption reset=all device=emf gsfname=grfa hsize=6.0 vsize=4 ftext='Arial/bo' htext=11pt display; filename grfa "&outputpath.\&figname..emf"; symbol1 i=steplj v=none l=1 ci=green w=1; symbol2 i=steplj v=none l=1 ci=black w=1; symbol3 i=steplj v=none l=3 ci=green w=1; symbol4 i=steplj v=none l=3 ci=black w=1; axis1 offset=(0.5 cm,0.5 cm); axis2 label=(angle=90 j=c 'Survival') order=(0.8 to 1 by 0.05) offset=(0.0cm,0.0 cm); legend1 across=2 mode=share position=(bottom left inside) label=none value=(j=l h=10pt) offset=(0.5cm, 0cm); proc gplot data=allsurv; plot survival * &timevar = group / legend=legend1 haxis=axis1 vaxis=axis2; label eventrate='Survival Rate' &timevar='Days After Randomization'; run; quit; title; footnote; ods rtf style=journal file="&outputpath.\&tbname..rtf" bodytitle startpage=no; proc print data=parasum noobs; title "Analysis of Maximum Likelihood Estimates (alpha=&rlalpha for CI)"; run; proc print data=summary noobs; var group survival eventrate; title 'Estimated survival and event rates at the time of last event observed'; run; ods rtf close; %mend;</code></pre> 巨集參數定義如下： <ul><li>outputpath=報表與圖形輸出的路徑 </li><li>figname=圖形檔名</li><li>tbname=報表檔名</li><li>dsn=存放在SAS裡面的資料檔名</li><li>covars=所有預測變數的名稱，不包含控制變數的名稱</li><li>ctrlvar=控制變數的名稱</li><li>outcom=輸出變數(事件)的名稱</li><li>cnsrval=設限的代碼，預設值為0</li><li>timevar=時間變數名稱</li><li>rlalpha=顯著水準alpha，預設值為0.05</li></ul> 承上面的範例，巨集程式執行的寫法如下： <pre><code>%CGPM_adj(outputpath=L:\SGF2009\Examples, figname=Adjdiab_fig, tbname=adjdiab_summary, dsn=sample, covars=age65 lowef chfhx, ctrlvar=diabhx, outcom=deathfv, cnsrval=0, timevar=deathdays, rlalpha=0.05);</code></pre> 結果如下： <a href="http://3.bp.blogspot.com/-WP_3r9uILc0/Tpi7ODu2EII/AAAAAAAAGWM/k5vCMqlhcLA/s1600/sugi2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="316" src="http://3.bp.blogspot.com/-WP_3r9uILc0/Tpi7ODu2EII/AAAAAAAAGWM/k5vCMqlhcLA/s640/sugi2.png" width="640" /></a><a href="http://4.bp.blogspot.com/-P4gYoTawjWk/Tpi7OppvWCI/AAAAAAAAGWU/K1g9zoCfTs8/s1600/sugi3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="408" src="http://4.bp.blogspot.com/-P4gYoTawjWk/Tpi7OppvWCI/AAAAAAAAGWU/K1g9zoCfTs8/s640/sugi3.png" width="640" /></a> Presenting the effect of a continuous covariate on estimated survival 當模型內包含連續型變數時，需要額外多兩個步驟來計算n-year倖存率。首先，用PROC PHREG程序把基底函數(baseline function)等一干相關的估計量另存成兩個新檔 yrout & sample(紅色部分)： <pre><code>proc phreg data=sample; model combdays*combfv(0)=age male; baseline out=yrout lower=low_ci upper=up_ci survival=surv covariates=sample/alpha=0.05 nomean; run;</code></pre> 再依照想要分類的變數(male)，去和倖存時間變數去排序(combdays)後再另存新檔(maxday)： <pre><code>proc sort data=sample out=maxday (where=(combfv=1) keep=idn male combfv combdays); by male combdays; run;</code></pre> 然後利用下面兩個資料步驟程序去計算男人和女人的5-year倖存率： <pre><code>data _null_; set maxday; by male; if male=0 and last.male then call symput('lastday_f',combdays); if male=1 and last.male then call symput('lastday_m',combdays); run; %put &lastday_f &lastday_m; data yrest; set yrout; evtrate=(1-surv)*100; evtlow=(1-low_ci)*100; evtup=(1-up_ci)*100; if combdays=&lastday_m and male=1 then output; if combdays=&lastday_f and male=0 then output; run;</code></pre> 最後畫出的圖形如下所示： <a href="http://3.bp.blogspot.com/-Bc6oGdJryz0/Tpi7O0tnWXI/AAAAAAAAGWc/Absq1KL9l3E/s1600/sugi4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="362" src="http://3.bp.blogspot.com/-Bc6oGdJryz0/Tpi7O0tnWXI/AAAAAAAAGWc/Absq1KL9l3E/s640/sugi4.png" width="640" /></a> (PS) 原文內並沒有附此圖的繪製程式。 Contact information Lea Liu,  Maryland Medical Research Institute 600 Wyndhurst Ave. Baltimore, MD 21210 (410) 435-4200,  Email: <a href="mailto:lliu@mmri.org">lliu@mmri.org</a>,  Web: <a class="postlink" href="http://www.mmri.org">www.mmri.org</a><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6268919072942670865-4224919921175314040?l=sugiclub.blogspot.com' alt='' /></div>

		自动登录	找回密码
密码			立即注册