|
楼主

楼主 |
发表于 2012-4-13 05:53:12
|
只看该作者
Correlations of three variables
From Dapangmao's blog on sas-analysis
<b>Question</b><br />
There is <a href="http://mathforum.org/library/drmath/view/62860.html">an interesting question</a> in statistics -- <br />
<i>“There are 3 random variables X, Y and Z. The correlation between X and Y is 0.8 and the</i><br />
<i>correlation between X and Z is 0.8. What is the maximum and minimum correlation between Y and Z?”</i><br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-yt5Xp5amCzw/T4c94MyQQyI/AAAAAAAABBY/-aNUUFMwlyE/s1600/Presentation1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://2.bp.blogspot.com/-yt5Xp5amCzw/T4c94MyQQyI/AAAAAAAABBY/-aNUUFMwlyE/s400/Presentation1.png" width="400" /></a></div><b>Solutions</b><br />
1. Geometric illustration<br />
The value of corr(Y, Z) is the COS function of the angle between Y and Z. We already know the corr(X, Y) and corr(X, Z). In this particular case, the angle can be zero, which suggests Y and Z are identical and the max value of corr(Y, Z) is 1. The min value of corr(Y, Z) is caused by the biggest angle between Y and Z, which is 0.28. <br />
<br />
2. Positive semi-definiteness property of the correlation matrix<br />
Due to this feature, the determinant of the correlation matrix is greater than or equal to zero. Thus we will be able to construct a quadratic inequality to evaluate the boundaries, which is from 0.28 to 1. <br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
proc fcmp outlib=work.funcs.test1;
function corrdet(x, a, b);
return(-x**2 + 2*a*b*x - a**2 -b**2 +1);
endsub;
function solvecorr(ini, a, b);
array solvopts[5] initial abconv relconv
maxiter solvstat (.5 .001 1.0e-6 100);
initial = ini;
x = solve('corrdet', solvopts, 0, ., a, b);
return(x);
endsub;
quit;
options cmplib = work.funcs;
data one;
* Max value;
upper = solvecorr(1, 0.8, 0.<!-- s8) --><img src="{SMILIES_PATH}/icon_cool.gif" alt="8)" title="Cool" /><!-- s8) -->;
upper_check = corrdet(upper,0.8,0.<!-- s8) --><img src="{SMILIES_PATH}/icon_cool.gif" alt="8)" title="Cool" /><!-- s8) -->;
* Min value;
lower = solvecorr(-1, 0.8, 0.<!-- s8) --><img src="{SMILIES_PATH}/icon_cool.gif" alt="8)" title="Cool" /><!-- s8) -->;
lower_check = corrdet(lower,0.8,0.<!-- s8) --><img src="{SMILIES_PATH}/icon_cool.gif" alt="8)" title="Cool" /><!-- s8) -->;
run;
</code></pre><br />
<div class="separator" style="clear: both; text-align: center;"></div><br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-eAW2gur0Ww8/T4c-lLNrwTI/AAAAAAAABBw/D-Bfw0rdsCM/s1600/SGRender3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://1.bp.blogspot.com/-eAW2gur0Ww8/T4c-lLNrwTI/AAAAAAAABBw/D-Bfw0rdsCM/s400/SGRender3.png" width="400" /></a></div><br />
<br />
<b>Generalization</b><br />
We can generalize the question to all possibilities for corr(X, Y) and corr(X, Z). First we need to create two user-defined functions to solve the maximum and the minimum values. Then we will be able to draw the max values and min values in the same plot. <br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
proc fcmp outlib = work.funcs.test2;
function upper(a, b);
x = 4*(a**2)*(b**2) - 4*(a**2+b**2-1);
if x ge 0 then y = -0.5*(sqrt(x) - 2*a*b);
else y = .;
return(y);
endsub;
function lower(a, b);
x = 4*(a**2)*(b**2) - 4*(a**2+b**2-1);
if x ge 0 then y = -0.5*(-sqrt(x) - 2*a*b);
else y = .;
return(y);
endsub;
quit;
data two;
do xy = -.99 to .99 by 0.01;
do xz = -.99 to .99 by 0.01;
upper = upper(xy, xz);
lower = lower(xy, xz);
output;
end;
end;
run;
proc template;
define statgraph surface001;
begingraph;
layout overlay3d / cube = false rotate = 150 tilt = 30
xaxisopts = (label="Correlation between X and Y")
yaxisopts = (label="Correlation between X and Z")
zaxisopts = (label="Boundaries of correlation between Y and Z") ;
surfaceplotparm x = xy y = xz z = upper;
surfaceplotparm x = xy y = xz z = lower;
endlayout;
endgraph;
end;
run;
proc sgrender data = two template = surface001;
run;
</code></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3256159328630041416-6232642761063389931?l=www.sasanalysis.com' alt='' /></div><img src="http://feeds.feedburner.com/~r/SasAnalysis/~4/PBTSiCu2pQA" height="1" width="1"/> |
|