SAS中文论坛

 找回密码
 立即注册

扫一扫,访问微社区

查看: 685|回复: 0
打印 上一主题 下一主题

Rick Wicklin’s 195th blog post

[复制链接]

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
楼主
 楼主| 发表于 2011-10-21 09:53:30 | 只看该作者

Rick Wicklin’s 195th blog post

From Dapangmao's blog on sas-analysis

<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-Ysz2ZT7dMSg/TqB77JNnJpI/AAAAAAAAAy4/f2GePC1PpAI/s1600/New%2BMicrosoft%2BPowerPoint%2BPresentation.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://2.bp.blogspot.com/-Ysz2ZT7dMSg/TqB77JNnJpI/AAAAAAAAAy4/f2GePC1PpAI/s400/New%2BMicrosoft%2BPowerPoint%2BPresentation.jpg" width="400" /></a></div>Today I ran a SAS routine to check the KPIs for a few websites I am interested in. I accidentally found the total number of posts on <a href="http://blogs.sas.com/content/iml/">Rick Wicklin’s blog </a>is going to approach 200 pretty soon.  I followed his blog since its creation. It is an amazing number in a little more than one year. Rick is a unique blogger: he is a statistician who does programming; he is a programmer who plots data; he is a data analyst who is a good writer. As for me, it’s meaningful to summarize what I have learned from his blog.<br />
<b>Data extracted from The Do Loop</b><br />
SAS official blogs have been restructured this summer. Since I can’t find the previous XML button on the website, I rewrote a program to directly extract HTML data to drive the KPI.  <a href="http://www.blogger.com/%20http://www.jiangtanghu.com/blog/2011/07/20/retrieve-blogs-using-sas/">Jiangtang Hu</a> also created a program to extract data from The Do Loop, and mentioned that Rick is an incredibly productive writer. <br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
%macro extract(page = );
   options mlogic mprint;
   %do index = 1 %to &amp;page;
      filename raw url "http://blogs.sas.com/content/iml/page/&amp;index/";
      data _tmp01;
        infile raw lrecl= 550 pad ;
        input record $550. ;
        if find(record, 'id="post') gt 0 or find(record, 'class="post') gt 0;
      run;
      data _tmp02;
         set _tmp01;
         _n + 1;
         _j = int((_n+2) / 3);
      run;
      proc transpose data=_tmp02 out=_tmp03;
         by _j;
         var record;
      run;
      data _&amp;index;
         set _tmp03;
         array out[3] $100. title time pageview;
         array in[3] col1-col3;
         do i = 1 to 3;
            if i = 1 then do; _str1 = 'rel="bookmark"&gt;'; _str2 = "'; _str2 = ""; end;
            if i = 3 then do; _str1 = '="postviews"&gt;'; _str2 = ""; end;
            _start = find(in[i], _str1);
            _len = length(compress(_str1));
             _start = find(in[i], compress(_str1)) + _len ;
            _end = find(in[i], _str2, _start);
            out[i] = substr(in[i] , _start  , _end - _start);
         end;
         drop _: col: i;
      run;
   %end;
   data out;
       set %do n = 1 %to &amp;page;
               _&amp;n
           %end;;
   run;
   proc datasets nolist;
      delete _:;
   quit;
%mend;
%extract(page = 20);
data out1;
   set out nobs = nobs;
   j + 1;
   n = nobs - j + 1;
   length level $20.;
   label pageview1 = 'PAGEVIEW' time1 = 'TIME' n = 'TOTOL POSTS';
   pageview1 = input(pageview, 5.);
   _month = scan(time, 1);
   _date = scan(time, 2);
   _year = scan(time, 3);
   time1 = input(cats(_date, substr(_month, 1, 3), _year), date9. );
   weekday = weekday(time1);
   drop _:;
   format time1 date9.;
run;

ods html style = htmlbluecml;
proc sql noprint;
   select count(*), sum(pageview1) into: nopost, :noview
   from out1
;quit;
proc gkpi mode=basic;
   dial actual = &amp;nopost bounds = (0 100 200 300 400) /
   target=200 nolowbound  
   afont=(f="Garamond" height=.6cm)
   bfont=(f="Garamond" height=.7cm) ;
proc gkpi mode=basic;
   dial actual = &amp;noview bounds = (0 2e4 4e4 6e4 8e4) /
   afont=(f="Garamond" height=.6cm)
   bfont=(f="Garamond" height=.7cm) ;
quit;
</code></pre><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-cmHaA7iQ70Q/TqB874zewxI/AAAAAAAAAzE/lJYi1O_3AlQ/s1600/2011-10-20_135100.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="217" src="http://4.bp.blogspot.com/-cmHaA7iQ70Q/TqB874zewxI/AAAAAAAAAzE/lJYi1O_3AlQ/s400/2011-10-20_135100.jpg" width="400" /></a></div><b>What I learned</b><br />
I accumulated all the 195 titles, replaced/removed some words and processed them with <a href="http://www.wordle.net/">Wordle</a>. As I expected, Rick’s blog is mainly about ‘Matrix’, ‘Statistics’ and ‘Data’. It is interesting to learn how to create ‘Function’ in SAS/IML, which involves a lot of programming skills. I also enjoyed his topics about ‘Simulating’ and ‘Computing’ with ‘Random’ numbers. He also has exciting articles about how to deal with ‘Missing’ values and ‘Curve’. <br />
<br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
data word_remove;
   input word : $15. @@;
   cards;
sas iml using use creating create proc blog vesus
;;;

proc sql noprint;
   select quote(upcase(compress(word))) into :wordlist separated by ','
   from word_remove
;quit;

data _null_;
   set out(keep=title);
   title =tranwrd(upcase(title), 'MATRICES', 'MATRIX');
   title =tranwrd(upcase(title), 'FUNCTIONS', 'FUNCTION');
   title =tranwrd(upcase(title), 'STATISTICAL', 'STATISTICS');
   length i $8.;
   do i = &amp;wordlist;
      title =tranwrd(upcase(title), compress(i), ' ');
   end;
   file 'c:\tmp\output1.txt';
   put title;
run;
</code></pre><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-ghES6kywLS4/TqB9LZhUh2I/AAAAAAAAAzQ/HTlqndfYlY0/s1600/SGPlot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://4.bp.blogspot.com/-ghES6kywLS4/TqB9LZhUh2I/AAAAAAAAAzQ/HTlqndfYlY0/s400/SGPlot.png" width="400" /></a></div><b>When the number reaches 200</b><br />
Except the holidays (those gaps in the finger plot above), Rick keeps a constant rate in writing articles (approximately 3 posts a week). <br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-g8o_LMsp4wY/TqB9W8s0elI/AAAAAAAAAzc/7CLGjPAuBAY/s1600/SGPlot1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://2.bp.blogspot.com/-g8o_LMsp4wY/TqB9W8s0elI/AAAAAAAAAzc/7CLGjPAuBAY/s400/SGPlot1.png" width="400" /></a></div><br />
No double the OLS regression gives a straight line. It seems that the total number will hit the 200 target pretty soon: next next week I believe. <br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
proc sgplot data=out1;
   needle x = time1 y = n;
   yaxis grid max = 300;
run;

proc sgplot data = out1;
   reg x =time1 y = n;
   refline 200/ axis=y ;
   yaxis max = 300;
run;
</code></pre><br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-1CCu8qPllPo/TqB9xmuAOhI/AAAAAAAAA0A/qGnPioCCKMo/s1600/SGPanel11.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="http://1.bp.blogspot.com/-1CCu8qPllPo/TqB9xmuAOhI/AAAAAAAAA0A/qGnPioCCKMo/s400/SGPanel11.png" width="400" /></a></div><b>What a SAS user likes to know</b><br />
From my experience, clicks in a web browser are mostly originated form search engines, while a regular reader would like to use feeds instead.  The page views recorded on the website of The Do Loop can reflect what SAS users try to find. Rick follows <a href="http://blogs.sas.com/content/iml/2010/09/03/hello-world/">his pattern</a> -- introductory tips on Monday, intermediate techniques for Wednesday, and topics for experienced programmers Friday. If we separate the page view trends at the three levels, we can see that the intermediate and advanced posts attract more page views than the basic ones. <br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
data out2;
   set out1;
      if weekday = 2 then level = '1-Basic';
      else if weekday in (3, 4) then level = '2-Intermediate';
      else level = '3-Advanced';
   output;
   set out1;
      level = '4-Overall';
   output;
run;

proc sgpanel data = out2;
   panelby level / spacing=5 columns = 2 rows = 2 novarname;
   series x = time1 y = pageview1;
   rowaxis grid; colaxis grid;
run;
</code></pre><b>Conclusion</b><br />
I agree with <a href="http://blogs.sas.com/iml/index.php?/archives/124-Blogging,-Programming,-and-Johari-Windows.html">what Rick Wicklin said</a>: blogging helps us to become more aware of what we know and what we don't know. I benefited a lot from<a href="http://www.amazon.com/Statistical-Programming-SAS-IML-Software/dp/1607646633/ref=sr_1_1?ie=UTF8&amp;qid=1319141042&amp;sr=8-1"> his book</a> and his resourceful blog in the past year. Cheers on Rick’s incoming 200th post!<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3256159328630041416-9174609786846146950?l=www.sasanalysis.com' alt='' /></div><img src="http://feeds.feedburner.com/~r/SasAnalysis/~4/CbFZV3ITYqQ" height="1" width="1"/>
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|小黑屋|手机版|Archiver|SAS中文论坛  

GMT+8, 2025-5-7 06:22 , Processed in 0.066291 second(s), 20 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表