| 
 | 
楼主
 
 
 楼主 |
发表于 2012-1-11 15:26:43
|
只看该作者
 
 
 
Benchmarking of the CUSUM function in SAS/IML
From Dapangmao's blog on sas-analysis 
 
<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-7ZSumDiRxpM/Tw0oNYlmCmI/AAAAAAAAA5Y/4Mm2JjvGUv4/s1600/SGPlot4.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="300" width="400" src="http://3.bp.blogspot.com/-7ZSumDiRxpM/Tw0oNYlmCmI/AAAAAAAAA5Y/4Mm2JjvGUv4/s400/SGPlot4.png" /></a></div><br /> 
Cumulative sums can be obtained in SAS’s DATA step by the RETAIN statement. As the codes below, a new variable of the cumulative values will be returned by DATA step’s implicit DO loop.  <br /> 
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code> 
data one; 
   do i = 1 to 1e6; 
      z = ranuni(0); 
      output; 
   end; 
   drop i; 
run; 
 
data two; 
   set one; 
   retain y 0; 
   y + z; 
run; 
</code></pre><br /> 
The same logic can be realized by an explicit DO loop in SAS/IML. However, SAS/IML has a function <a href="http://support.sas.com/documentation/cdl/en/imlug/59656/HTML/default/langref_sect50.htm">CUSUM</a> which is specially made for cumulative sums. To compare the efficiency of the two methods, I tried an experiment: calculating the cumulative sums for a random vector with  incremental sizes from 1 million to 20 million by 1 million. The result shows that the CUSUM function is always 100 times faster than a raw DO loop. With the increase of the vector size, the gap gets wider. When the number of elements in the vector reaches 20 million, a Do loop would ask almost 40 seconds, while the CUSUM function requires only 0.35 seconds. Therefore, the vectorized CUSUM function is going to be a good candidate to do such jobs in SAS/IML.<br /> 
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code> 
proc iml; 
   a = t(do(1e6, 2e7, 1e6)); 
   timer =  j(nrow(a), 2); 
   do p = 1 to nrow(a); 
      n = a[p]; 
      z = t(ranuni(1:n)); 
       
      t0 = time();  
          x = cusum(z); 
      timer[p, 1] = time() - t0; 
 
      y = j(n, 1, .); 
      t0 = time();  
         do i = 1 to n; 
            if i = 1 then y[i] = z[i]; 
            else y[i] = y[i-1] + z[i]; 
         end; 
      timer[p, 2] = time() - t0; 
   end; 
 
   t =  a||timer; 
   create _1 from t; 
      append from t; 
   close _1; 
quit; 
 
data _2; 
   set _1; 
   length test $100.; 
   label col1 = "Number of observations"  
      time = "Time by seconds for cumulative summation"; 
   test = "The CUSUM function"; time = col2; output; 
   test = "DO loop"; time = col3; output; 
   keep test time col1; 
run; 
 
proc sgplot data = _2; 
   series x = col1 y = time / curvelabel group = test; 
   yaxis grid; 
run; 
</code></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3256159328630041416-4932906300453961092?l=www.sasanalysis.com' alt='' /></div><img src="http://feeds.feedburner.com/~r/SasAnalysis/~4/jwho_9Idn-o" height="1" width="1"/> |   
 
 
 
 |