|
楼主

楼主 |
发表于 2012-1-11 15:26:43
|
只看该作者
Benchmarking of the CUSUM function in SAS/IML
From Dapangmao's blog on sas-analysis
<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-7ZSumDiRxpM/Tw0oNYlmCmI/AAAAAAAAA5Y/4Mm2JjvGUv4/s1600/SGPlot4.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="300" width="400" src="http://3.bp.blogspot.com/-7ZSumDiRxpM/Tw0oNYlmCmI/AAAAAAAAA5Y/4Mm2JjvGUv4/s400/SGPlot4.png" /></a></div><br />
Cumulative sums can be obtained in SAS’s DATA step by the RETAIN statement. As the codes below, a new variable of the cumulative values will be returned by DATA step’s implicit DO loop. <br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
data one;
do i = 1 to 1e6;
z = ranuni(0);
output;
end;
drop i;
run;
data two;
set one;
retain y 0;
y + z;
run;
</code></pre><br />
The same logic can be realized by an explicit DO loop in SAS/IML. However, SAS/IML has a function <a href="http://support.sas.com/documentation/cdl/en/imlug/59656/HTML/default/langref_sect50.htm">CUSUM</a> which is specially made for cumulative sums. To compare the efficiency of the two methods, I tried an experiment: calculating the cumulative sums for a random vector with incremental sizes from 1 million to 20 million by 1 million. The result shows that the CUSUM function is always 100 times faster than a raw DO loop. With the increase of the vector size, the gap gets wider. When the number of elements in the vector reaches 20 million, a Do loop would ask almost 40 seconds, while the CUSUM function requires only 0.35 seconds. Therefore, the vectorized CUSUM function is going to be a good candidate to do such jobs in SAS/IML.<br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
proc iml;
a = t(do(1e6, 2e7, 1e6));
timer = j(nrow(a), 2);
do p = 1 to nrow(a);
n = a[p];
z = t(ranuni(1:n));
t0 = time();
x = cusum(z);
timer[p, 1] = time() - t0;
y = j(n, 1, .);
t0 = time();
do i = 1 to n;
if i = 1 then y[i] = z[i];
else y[i] = y[i-1] + z[i];
end;
timer[p, 2] = time() - t0;
end;
t = a||timer;
create _1 from t;
append from t;
close _1;
quit;
data _2;
set _1;
length test $100.;
label col1 = "Number of observations"
time = "Time by seconds for cumulative summation";
test = "The CUSUM function"; time = col2; output;
test = "DO loop"; time = col3; output;
keep test time col1;
run;
proc sgplot data = _2;
series x = col1 y = time / curvelabel group = test;
yaxis grid;
run;
</code></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3256159328630041416-4932906300453961092?l=www.sasanalysis.com' alt='' /></div><img src="http://feeds.feedburner.com/~r/SasAnalysis/~4/jwho_9Idn-o" height="1" width="1"/> |
|