SAS中文论坛

 找回密码
 立即注册

扫一扫,访问微社区

查看: 842|回复: 0
打印 上一主题 下一主题

Those most productive R developers

[复制链接]

49

主题

76

帖子

1462

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1462
楼主
 楼主| 发表于 2011-12-13 00:21:13 | 只看该作者

Those most productive R developers

From Dapangmao's blog on sas-analysis

<div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-vizqAPGWZHU/TuYmN4V4xLI/AAAAAAAAA44/M2IUCjG6kxk/s1600/Rlist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="226" src="http://4.bp.blogspot.com/-vizqAPGWZHU/TuYmN4V4xLI/AAAAAAAAA44/M2IUCjG6kxk/s400/Rlist.png" width="400" /></a></div><br />
<br />
The number of R packages on CRAN is 3,483 on 2011-12-12. The growth of R package in the past years can be fitted by a quadratic regression perfectly. <br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-_ARgeYRPRmM/TuYl7sxPzXI/AAAAAAAAA4s/ml-bJamlnNk/s1600/SGPlot1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://2.bp.blogspot.com/-_ARgeYRPRmM/TuYl7sxPzXI/AAAAAAAAA4s/ml-bJamlnNk/s400/SGPlot1.png" width="400" /></a></div><br />
I am always interested in who are maintaining those packages. Then I wrote an R script to extract the package head information from CRAN’s website and stored them in a SQLite database.  Most R developers are maintaining 1-3 R packages. Some of them are really productive. The top 15 R developers are listed below:<br />
<br />
author package<br />
1           Kurt Hornik  <kurt.hornik at="" r-project.org="">      23<br />
2      Martin Maechler  <maechler at="" stat.math.ethz.ch="">      23<br />
3              Hadley Wickham  <h.wickham at="" gmail.com="">      21<br />
4  Rmetrics Core Team  <rmetrics-core at="" r-project.org="">      19<br />
5       Achim Zeileis  <achim.zeileis at="" r-project.org="">      17<br />
6              Henrik Bengtsson  <henrikb at="" braju.com="">      17<br />
7              Paul Gilbert  <pgilbert.ttv9z at="" ncf.ca="">      17<br />
8              Brian Ripley  <ripley at="" stats.ox.ac.uk="">      14<br />
9                   Roger D. Peng  <rpeng at="" jhsph.edu="">      13<br />
10  Torsten Hothorn  <torsten.hothorn at="" r-project.org="">      13<br />
11       Karline Soetaert  <k.soetaert at="" nioo.knaw.nl="">      12<br />
12      Philippe Grosjean  <phgrosjean at="" sciviews.org="">      12<br />
13      Robin K. S. Hankin  <hankin.robin at="" gmail.com="">      12<br />
14          Charles J. Geyer  <charlie at="" stat.umn.edu="">      11<br />
15         Matthias Kohl  <matthias.kohl at="" stamats.de="">      11</matthias.kohl></charlie></hankin.robin></phgrosjean></k.soetaert></torsten.hothorn></rpeng></ripley></pgilbert.ttv9z></henrikb></achim.zeileis></rmetrics-core></h.wickham></maechler></kurt.hornik><br />
<kurt.hornik at="" r-project.org=""><maechler at="" stat.math.ethz.ch=""><h.wickham at="" gmail.com=""><rmetrics-core at="" r-project.org=""><achim.zeileis at="" r-project.org=""><henrikb at="" braju.com=""><pgilbert.ttv9z at="" ncf.ca=""><ripley at="" stats.ox.ac.uk=""><rpeng at="" jhsph.edu=""><torsten.hothorn at="" r-project.org=""><k.soetaert at="" nioo.knaw.nl=""><phgrosjean at="" sciviews.org=""><hankin.robin at="" gmail.com=""><charlie at="" stat.umn.edu=""><matthias.kohl at="" stamats.de=""><br />
I am also interested in which R packages are most significant by the dependency relationship. Hope I can dig out some clue about it later. <br />
</matthias.kohl></charlie></hankin.robin></phgrosjean></k.soetaert></torsten.hothorn></rpeng></ripley></pgilbert.ttv9z></henrikb></achim.zeileis></rmetrics-core></h.wickham></maechler></kurt.hornik><br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
### A script of R to extract R package information and
### build a SQLite databse by <!-- e --><a href="mailto:hchao8@gmail.com">hchao8@gmail.com</a><!-- e -->
library(ggplot2)
library(XML)
library(RSQLite)

# Create and connect a SQLite database
conn &lt;- dbConnect("SQLite", dbname = "c:/Rpackage.db")

# Extract names of R packages available from web
allPackageURL &lt;-
  "http://cran.r-project.org/web/packages/available_packages_by_name.html"
allPackage &lt;- na.omit(melt(readHTMLTable(allPackageURL))[, c("V1")])

# Extract individual package information from web and store data in SQLite
for (i in 1:length(allPackage)){
  packageName &lt;- allPackage[i]
  packageURL &lt;- paste("http://cran.r-project.org/web/packages/",packageName,
                      "/index.html", sep="")
  y &lt;- melt(readHTMLTable(packageURL))
  y$L1 &lt;- packageName
  if(dbExistsTable(conn, "Rpackage")) {
     dbWriteTable(conn, "Rpackage", y, append = TRUE)
  } else {
     dbWriteTable(conn, "Rpackage", y)
  }
}
# Pull out maintainer information from SQLite database
all &lt;- fetch(dbSendQuery(conn, "
          select v2 as author, count(v2) as package
          from rpackage
          where v1 = 'Maintainer:'
          group by v2
          order by package desc
          ;"))

# Disconnect SQLite database
dbDisconnect(conn)

# Draw a histogram
qplot(package, data = all, binwidth = 1, ylab = "Frequency",
      xlab = "R packages maintained by individual developer")
ggsave("c:/Rlist.png")

# Find 15 most productive developers
head(all, 15)
</code></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3256159328630041416-2777381490173815345?l=www.sasanalysis.com' alt='' /></div><img src="http://feeds.feedburner.com/~r/SasAnalysis/~4/aQAEtqZbgSw" height="1" width="1"/>
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|小黑屋|手机版|Archiver|SAS中文论坛  

GMT+8, 2025-5-8 00:08 , Processed in 0.119055 second(s), 20 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表