SAS中文论坛

 找回密码
 立即注册

扫一扫,访问微社区

查看: 951|回复: 0
打印 上一主题 下一主题

Obtain Trace of the Projection Matrix in a Linear Regression

[复制链接]

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
楼主
 楼主| 发表于 2011-10-7 02:33:26 | 只看该作者

Obtain Trace of the Projection Matrix in a Linear Regression

From oloolo's blog on SasProgramming


<p><a href="http://feedads.g.doubleclick.net/~a/C77t6HZdvLjjnk7zHhV184SB3yQ/0/da"><img src="http://feedads.g.doubleclick.net/~a/C77t6HZdvLjjnk7zHhV184SB3yQ/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/C77t6HZdvLjjnk7zHhV184SB3yQ/1/da"><img src="http://feedads.g.doubleclick.net/~a/C77t6HZdvLjjnk7zHhV184SB3yQ/1/di" border="0" ismap="true"></img></a></p>Recently, I am working on coding in SAS for a set of regularized regressions and need to compute trace of the projection matrix:<br />
S=X(X'X + \lambda I)^{-1}X'.<br />
<br />
Trace of the project matrix S is a key concept in modern regression analysis. For example, the effective degree of freedom of the model in a regularized linear regression is trace(S)/N [1]. For another example, in approximating leave-one-out-cross-validation using GCV, trace(S)/N is a key component (formula 7.52 of [1]). <br />
<br />
Check the reference and the cited works therein for more information.<br />
<br />
Ref:<br />
1. Hastie et al., The Elements of Statistical Learning, 2nd Ed.<br />
<br />
<pre style="background-color: #ebebeb; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: #000001; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>
/* Use the SocEcon example data created in
   Example A.1: A TYPE=CORR Data Set Produced by PROC CORR
   on page 8153 of SAS/STAT 9.22 User Doc
*/
data SocEcon;
input Pop School Employ Services House;
datalines;
5700 12.8 2500 270 25000
1000 10.9 600 10 10000
3400 8.8 1000 10 9000
3800 13.6 1700 140 25000
4000 12.8 1600 140 25000
8200 8.3 2600 60 12000
1200 11.4 400 10 16000
9100 11.5 3300 60 14000
9900 12.5 3400 180 18000
9600 13.7 3600 390 25000
9600 9.6 3300 80 12000
9400 11.4 4000 100 13000
;
run;

%let depvar=HOUSE;
%let covars=pop school employ services;
%let lambda=1;
/* Below is the way to obtain trace(S), where S is the project matrix in a (regularized) linar regression.
   For further information, check pp.68, pp.153 of Elements of Statistical Learning,2nd Ed.
*/

/* For details about TYPE=SSCP special SAS data, consult:
  Appendix A: Special SAS Data Sets, pp.8159 of SAS/STAT 9.22 User's Guide
*/
proc corr data=SocEcon sscp out=xtx(where=(_TYPE_='SSCP'))  noprint;
     var &covars;
run;


data xtx2;
     set xtx;
  array _n{*} _numeric_;
  array _i{*} i1-i5 (5*0);
  do j=1 to 5;
     if j=_n_ then _i[_n_]=&lambda;
  else _i[j]=0;
  end;
  _n[_n_]=_n[_n_]+1;
  drop j;
run;

/* Obtain the inverse of (XTX+\lambda*I)
   Note that we explicitly specified Intercept term in the
   covariate list and fit a model without implicit intercept
   term in the model.
*/
proc reg data=xtx2  
         outest=S0(type=SSCP
                   drop=i1-i5 _MODEL_  _DEPVAR_  _RMSE_)
         singular=1E-17;
     model i1-i5 = Intercept &covars / noint   noprint;
run;quit;

data S0;
     set S0;
  length _NAME_ $8;
  _NAME_=cats('X_', _n_);
run;

proc score data=SocEcon  score=S0  out=XS0(keep=X_:)  type=parms;
     var &covars;
run;     

data XS0X;
     merge XS0  X;
  array _x1{*} X_:;
  array _x0{*} intercept pop school employ services;
  do i=1 to dim(_X1);
     _x1[i]=_x1[i]*_x0[i];
  end;
  rowSum=sum(of _x1[*]);
  keep rowSum;
run;
proc means data=XS0X  noprint;
     var rowSum;
  output out=trace  sum(rowSum)=Trace;
run;
</code></pre><br />
<br />
Verify the result using R:<br />
<br />
<pre style="background-color: #ebebeb; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: #000001; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>
> socecon<-read.csv('c:/socecon.csv', header=T)
> x<-as.matrix(cbind(1, socecon[,-5]))
> xtx<-t(x)%*%x
> phi<-xtx+diag(rep(1, 5))
>
> # method 1.
> S<-x%*%solve(phi)*x
> sum(S)
[1] 4.077865
> # method 2.
> S<-(x%*%solve(phi))%*%t(x)
> sum(diag(S))
[1] 4.077865
>
</code></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/29815492-2415426124549242306?l=www.sas-programming.com' alt='' /></div><img src="http://feeds.feedburner.com/~r/SasProgramming/~4/YlSd0NVVhqA" height="1" width="1"/>
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|小黑屋|手机版|Archiver|SAS中文论坛  

GMT+8, 2025-5-7 22:57 , Processed in 0.127346 second(s), 20 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表