The other day on the isostat mailing list Doug Andrews asked the following question:
Which R packages do you consider the most helpful and essential for undergrad stat ed? I ask in great part because it would help my local IT guru set up the way our network makes software available in our computer classrooms, but also just from curiosity.
Doug asked for a top 10 list, and a few people have already chimed in with great suggestions. I thought those not on the list might also have good ideas, so, with Doug’s permission, I’m reposting the question here.
Here is my top 10 (ok, 12) list: (Links go to vignettes or pages I find to be quickest / most useful references for those packages, but if you know of better resources, let me know and I’ll update.)
-
[knitr](http://yihui.name/knitr/)
/[rmarkdown](http://rmarkdown.rstudio.com/)
- for reproducible data analysis with literate programming, great set of tools that students can use from day 1 in intro stats all the way through to writing their undergrad theses -
[dplyr](https://github.com/hadley/dplyr)
- for most data manipulation tasks, with the added benefit of piping (via magrittr) -
[ggplot2](http://docs.ggplot2.org/current/)
- easy faceting allows for graphing multivariate relationships more easily than with base R (lattice is also good for that, but IMO ggplot2 graphics look more modern and lattice has a much steeper learning curve) -
[openintro](https://cran.r-project.org/web/packages/openintro/openintro.pdf)
- or packages that come with the textbooks you use, great for pulling up any dataset from the text and building on it in class (a new version coming soon to fully complement 3rd edition of OpenIntro Statistics) -
[mosaic](http://mosaic-web.org/r-packages/)
- for consistent syntax for functions used in intro stat -
[googlesheets](https://github.com/jennybc/googlesheets)
- for loading data directly from Google spreadsheets -
[lubridate](https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html)
- if you ever need to work with any date fields -
[stringr](https://cran.r-project.org/web/packages/stringr/vignettes/stringr.html)
- for text parsing and manipulation -
[rvest](http://blog.rstudio.org/2014/11/24/rvest-easy-web-scraping-with-r/)
- for scraping data off the web -
[readr](https://github.com/hadley/readr)
/[data.table](https://github.com/Rdatatable/data.table/wiki)
- for loading large datasets & defaultstringsAsFactors = FALSE
And the following suggestions from Randall Prium complement this list nicely:
-
[readxl](https://github.com/hadley/readxl)
- for reading Excel data -
[tidyr](https://github.com/hadley/tidyr)
- for converting between wide and long formats and for the very usefulextract_numeric()
-
[ggvis](http://ggvis.rstudio.com/)
-ggplot2
“done right” and tuned for interactive graphics -
[htmlwidgets](http://www.htmlwidgets.org/)
- this is actually a collection of packages for plots: seeleaflet
for maps anddygraphs
for time series, for example
Note that most of these packages are for data manipulation and visualization. Methods specific packages that are useful / essential for a particular undergraduate program might depend on the focus of that program. Some packages that so far came up in the discussion are:
-
[lme4](https://cran.r-project.org/web/packages/lme4/lme4.pdf)
- for mixed models -
[pwr](https://cran.r-project.org/web/packages/pwr/pwr.pdf)
- for showing sample size and power calculations
This blog post is meant to provide a space for continuing this discussion, so I’ll ask the question one more time: Which R packages do you consider the most helpful and essential for undergrad stat ed? Please add your responses to the comments.
PS: Thanks to Michael Lopez for suggesting that I post this list somewhere. PPS: I should really be working on my fast-approaching JSM talk.