SAS中文论坛

 找回密码
 立即注册

扫一扫,访问微社区

查看: 912|回复: 0
打印 上一主题 下一主题

6 ways to count odd numbers in SAS/IML

[复制链接]

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
楼主
 楼主| 发表于 2012-1-20 08:21:44 | 只看该作者

6 ways to count odd numbers in SAS/IML

From Dapangmao's blog on sas-analysis

<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-x85NkefV9Qg/Txia7oD1aCI/AAAAAAAAA6Q/gxYBbLWi8go/s1600/SGPlot17.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://2.bp.blogspot.com/-x85NkefV9Qg/Txia7oD1aCI/AAAAAAAAA6Q/gxYBbLWi8go/s400/SGPlot17.png" width="400" /></a></div><br />
SAS/IML has a number of vector-wise subscripts/operators/functions available, which can make many things easy. <a href="http://blogs.sas.com/content/iml/2011/10/10/sasiml-tip-sheets/">A cheat sheet</a> about them can be found at Rick Wicklin’s blog.<br />
<br />
To try out those wonderful features( and their combinations?), I designed a test to use them for the total of the odd numbers in a random numeric sequence.  A typical solution is always placing a cumulative counter in a looping structure for many programming languages. Therefore I just used such basic DO loop in SAS/IML as benchmark. <br />
<br />
At the beginning, the MOD function computes the modulo. Then the simplest ways is the SUM function to aggregate all odd number. Subscript, like [, +], serves the same purpose. The method of the CHOOSE + SUM functions quite assembles the DO loop and is the generalized form of the SUM only method(also brings overhead on resources).  The LOC function can subset or index a vector and is combined with two other functions. <br />
<br />
Observations: <br />
1. All vector-wise methods beat the DO loop, especially with big dataset. The simpler the method is, the faster the result is. <br />
2. The robust CHOOSE function. For binary conditions, the CHOOSE function can replace the if-else-them statements plus a DO loop and is much more efficient.  Rick Wicklin has <a href="http://blogs.sas.com/content/iml/2011/08/15/complex-assignment-statements-choose-wisely/">an article</a> about this function. <br />
3. The LOC function is very handy and plays a role like the WHERE statement in SAS’s DATA step or the which()/subset() functions in R. <br />
4. The SUM function seems slightly faster than the subscript&nbsp;[, +]&nbsp;&nbsp;for a vector. <br />
<br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
proc iml;
   a = t(do(1e6, 2e7, 1e6));
   timer = j(nrow(a), 6);
   do p = 1 to nrow(a);
      n = a[p];
   /* Simulate a numeric sequence */
      x = ceil(ranuni(1:n)*100000);
   
      /* 1 -- SUM function*/
      t0 = time();
         r1 = sum(mod(x, 2));
      timer[p, 1] = time() - t0;
   
      /* 2 -- Subscript + */         
      t0 = time();
         r2 = mod(x, 2)[ , +];
      timer[p, 2] = time() - t0;
   
      /* 3 -- SUM + CHOOSE functions*/  
      t0 = time();
         r3 = sum(choose(mod(x, 2), 1, 0));
      timer[p, 3] = time() - t0;
   
      /* 4 -- NCOL + LOC functions */        
      t0 = time();
         r4 = ncol(loc(mod(x, 2) = 1));
      timer[p, 4] = time() - t0;
   
      /* 5 -- DO loop */  
      t0 = time();
         r5 = 0;
         do i = 1 to ncol(x);
            if mod(x[i], 2) = 1 then r5 = r5 + 1;
         end;
      timer[p, 5] = time() - t0;
   
      /* 6 -- COUNTMISS + LOC functions */        
      t0 = time();
         x[loc(mod(x, 2) = 1)] = .;
         r6 = countmiss(x);
      timer[p, 6] = time() - t0;
   
      /* Validate all results */
      print r1 r2 r3 r4 r5 r6;
   end;
   t =  a||timer;
   create _1 from t;
      append from t;
   close _1;
quit;

data _2;
   set _1;
   length test $100.;
   label col1 = "Number of observations"
      time = "Time by seconds to count odd numbers";
   test = "SUM function"; time = col2; output;
   test = "Subscript + "; time = col3; output;
   test = "SUM + CHOOSE functions"; time = col4; output;
   test = "NCOL + LOC functions"; time = col5; output;
   test = "DO loop"; time = col6; output;
   test = "COUNTMISS + LOC functions"; time = col7; output;
   keep test time col1;
run;

proc sgplot data = _2;
   series x = col1 y = time / curvelabel group = test;
   yaxis grid;
run;
</code></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3256159328630041416-103437243345845873?l=www.sasanalysis.com' alt='' /></div><img src="http://feeds.feedburner.com/~r/SasAnalysis/~4/yC4c_QL14cc" height="1" width="1"/>
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|小黑屋|手机版|Archiver|SAS中文论坛  

GMT+8, 2025-5-6 23:30 , Processed in 0.065924 second(s), 20 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表