|
楼主

楼主 |
发表于 2011-12-13 00:21:13
|
只看该作者
Those most productive R developers
From Dapangmao's blog on sas-analysis
<div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-vizqAPGWZHU/TuYmN4V4xLI/AAAAAAAAA44/M2IUCjG6kxk/s1600/Rlist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="226" src="http://4.bp.blogspot.com/-vizqAPGWZHU/TuYmN4V4xLI/AAAAAAAAA44/M2IUCjG6kxk/s400/Rlist.png" width="400" /></a></div><br />
<br />
The number of R packages on CRAN is 3,483 on 2011-12-12. The growth of R package in the past years can be fitted by a quadratic regression perfectly. <br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/-_ARgeYRPRmM/TuYl7sxPzXI/AAAAAAAAA4s/ml-bJamlnNk/s1600/SGPlot1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://2.bp.blogspot.com/-_ARgeYRPRmM/TuYl7sxPzXI/AAAAAAAAA4s/ml-bJamlnNk/s400/SGPlot1.png" width="400" /></a></div><br />
I am always interested in who are maintaining those packages. Then I wrote an R script to extract the package head information from CRAN’s website and stored them in a SQLite database. Most R developers are maintaining 1-3 R packages. Some of them are really productive. The top 15 R developers are listed below:<br />
<br />
author package<br />
1 Kurt Hornik <kurt.hornik at="" r-project.org=""> 23<br />
2 Martin Maechler <maechler at="" stat.math.ethz.ch=""> 23<br />
3 Hadley Wickham <h.wickham at="" gmail.com=""> 21<br />
4 Rmetrics Core Team <rmetrics-core at="" r-project.org=""> 19<br />
5 Achim Zeileis <achim.zeileis at="" r-project.org=""> 17<br />
6 Henrik Bengtsson <henrikb at="" braju.com=""> 17<br />
7 Paul Gilbert <pgilbert.ttv9z at="" ncf.ca=""> 17<br />
8 Brian Ripley <ripley at="" stats.ox.ac.uk=""> 14<br />
9 Roger D. Peng <rpeng at="" jhsph.edu=""> 13<br />
10 Torsten Hothorn <torsten.hothorn at="" r-project.org=""> 13<br />
11 Karline Soetaert <k.soetaert at="" nioo.knaw.nl=""> 12<br />
12 Philippe Grosjean <phgrosjean at="" sciviews.org=""> 12<br />
13 Robin K. S. Hankin <hankin.robin at="" gmail.com=""> 12<br />
14 Charles J. Geyer <charlie at="" stat.umn.edu=""> 11<br />
15 Matthias Kohl <matthias.kohl at="" stamats.de=""> 11</matthias.kohl></charlie></hankin.robin></phgrosjean></k.soetaert></torsten.hothorn></rpeng></ripley></pgilbert.ttv9z></henrikb></achim.zeileis></rmetrics-core></h.wickham></maechler></kurt.hornik><br />
<kurt.hornik at="" r-project.org=""><maechler at="" stat.math.ethz.ch=""><h.wickham at="" gmail.com=""><rmetrics-core at="" r-project.org=""><achim.zeileis at="" r-project.org=""><henrikb at="" braju.com=""><pgilbert.ttv9z at="" ncf.ca=""><ripley at="" stats.ox.ac.uk=""><rpeng at="" jhsph.edu=""><torsten.hothorn at="" r-project.org=""><k.soetaert at="" nioo.knaw.nl=""><phgrosjean at="" sciviews.org=""><hankin.robin at="" gmail.com=""><charlie at="" stat.umn.edu=""><matthias.kohl at="" stamats.de=""><br />
I am also interested in which R packages are most significant by the dependency relationship. Hope I can dig out some clue about it later. <br />
</matthias.kohl></charlie></hankin.robin></phgrosjean></k.soetaert></torsten.hothorn></rpeng></ripley></pgilbert.ttv9z></henrikb></achim.zeileis></rmetrics-core></h.wickham></maechler></kurt.hornik><br />
<pre style="background-color: #ebebeb; border: 1px dashed rgb(153, 153, 153); color: #000001; font-size: 14px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code>
### A script of R to extract R package information and
### build a SQLite databse by <!-- e --><a href="mailto:hchao8@gmail.com">hchao8@gmail.com</a><!-- e -->
library(ggplot2)
library(XML)
library(RSQLite)
# Create and connect a SQLite database
conn <- dbConnect("SQLite", dbname = "c:/Rpackage.db")
# Extract names of R packages available from web
allPackageURL <-
"http://cran.r-project.org/web/packages/available_packages_by_name.html"
allPackage <- na.omit(melt(readHTMLTable(allPackageURL))[, c("V1")])
# Extract individual package information from web and store data in SQLite
for (i in 1:length(allPackage)){
packageName <- allPackage[i]
packageURL <- paste("http://cran.r-project.org/web/packages/",packageName,
"/index.html", sep="")
y <- melt(readHTMLTable(packageURL))
y$L1 <- packageName
if(dbExistsTable(conn, "Rpackage")) {
dbWriteTable(conn, "Rpackage", y, append = TRUE)
} else {
dbWriteTable(conn, "Rpackage", y)
}
}
# Pull out maintainer information from SQLite database
all <- fetch(dbSendQuery(conn, "
select v2 as author, count(v2) as package
from rpackage
where v1 = 'Maintainer:'
group by v2
order by package desc
;"))
# Disconnect SQLite database
dbDisconnect(conn)
# Draw a histogram
qplot(package, data = all, binwidth = 1, ylab = "Frequency",
xlab = "R packages maintained by individual developer")
ggsave("c:/Rlist.png")
# Find 15 most productive developers
head(all, 15)
</code></pre><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3256159328630041416-2777381490173815345?l=www.sasanalysis.com' alt='' /></div><img src="http://feeds.feedburner.com/~r/SasAnalysis/~4/aQAEtqZbgSw" height="1" width="1"/> |
|