|
楼主

楼主 |
发表于 2012-1-20 08:21:44
|
只看该作者
6 ways to count odd numbers in SAS/IML
From Dapangmao's blog on sas-analysis
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-x85NkefV9Qg/Txia7oD1aCI/AAAAAAAAA6Q/gxYBbLWi8go/s1600/SGPlot17.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://2.bp.blogspot.com/-x85NkefV9Qg/Txia7oD1aCI/AAAAAAAAA6Q/gxYBbLWi8go/s400/SGPlot17.png" width="400" /></a></div><br />
SAS/IML has a number of vector-wise subscripts/operators/functions available, which can make many things easy. <a href="http://blogs.sas.com/content/iml/2011/10/10/sasiml-tip-sheets/">A cheat sheet</a> about them can be found at Rick Wicklin’s blog.<br />
<br />
To try out those wonderful features( and their combinations?), I designed a test to use them for the total of the odd numbers in a random numeric sequence. A typical solution is always placing a cumulative counter in a looping structure for many programming languages. Therefore I just used such basic DO loop in SAS/IML as benchmark. <br />
<br />
At the beginning, the MOD function computes the modulo. Then the simplest ways is the SUM function to aggregate all odd number. Subscript, like [, +], serves the same purpose. The method of the CHOOSE + SUM functions quite assembles the DO loop and is the generalized form of the SUM only method(also brings overhead on resources). The LOC function can subset or index a vector and is combined with two other functions. <br />
<br />
Observations: <br />
1. All vector-wise methods beat the DO loop, especially with big dataset. The simpler the method is, the faster the result is. <br />
2. The robust CHOOSE function. For binary conditions, the CHOOSE function can replace the if-else-them statements plus a DO loop and is much more efficient. Rick Wicklin has <a href="http://blogs.sas.com/content/iml/2011/08/15/complex-assignment-statements-choose-wisely/">an article</a> about this function. <br />
3. The LOC function is very handy and plays a role like the WHERE statement in SAS’s DATA step or the which()/subset() functions in R. <br />
4. The SUM function seems slightly faster than the subscript [, +] for a vector. <br />
<br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
proc iml;
a = t(do(1e6, 2e7, 1e6));
timer = j(nrow(a), 6);
do p = 1 to nrow(a);
n = a[p];
/* Simulate a numeric sequence */
x = ceil(ranuni(1:n)*100000);
/* 1 -- SUM function*/
t0 = time();
r1 = sum(mod(x, 2));
timer[p, 1] = time() - t0;
/* 2 -- Subscript + */
t0 = time();
r2 = mod(x, 2)[ , +];
timer[p, 2] = time() - t0;
/* 3 -- SUM + CHOOSE functions*/
t0 = time();
r3 = sum(choose(mod(x, 2), 1, 0));
timer[p, 3] = time() - t0;
/* 4 -- NCOL + LOC functions */
t0 = time();
r4 = ncol(loc(mod(x, 2) = 1));
timer[p, 4] = time() - t0;
/* 5 -- DO loop */
t0 = time();
r5 = 0;
do i = 1 to ncol(x);
if mod(x[i], 2) = 1 then r5 = r5 + 1;
end;
timer[p, 5] = time() - t0;
/* 6 -- COUNTMISS + LOC functions */
t0 = time();
x[loc(mod(x, 2) = 1)] = .;
r6 = countmiss(x);
timer[p, 6] = time() - t0;
/* Validate all results */
print r1 r2 r3 r4 r5 r6;
end;
t = a||timer;
create _1 from t;
append from t;
close _1;
quit;
data _2;
set _1;
length test $100.;
label col1 = "Number of observations"
time = "Time by seconds to count odd numbers";
test = "SUM function"; time = col2; output;
test = "Subscript + "; time = col3; output;
test = "SUM + CHOOSE functions"; time = col4; output;
test = "NCOL + LOC functions"; time = col5; output;
test = "DO loop"; time = col6; output;
test = "COUNTMISS + LOC functions"; time = col7; output;
keep test time col1;
run;
proc sgplot data = _2;
series x = col1 y = time / curvelabel group = test;
yaxis grid;
run;
</code></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3256159328630041416-103437243345845873?l=www.sasanalysis.com' alt='' /></div><img src="http://feeds.feedburner.com/~r/SasAnalysis/~4/yC4c_QL14cc" height="1" width="1"/> |
|