Differences between revisions 5 and 21 (spanning 16 versions)
Revision 5 as of 2010-07-26 19:00:46
Size: 1226
Editor: tracker
Comment:
Revision 21 as of 2017-04-24 13:57:43
Size: 7446
Editor: slate
Comment:
Deletions are marked like this. Additions are marked like this.
Line 9: Line 9:
Install these packages from CRAN. Always check the "install dependencies" box. Install these packages from CRAN (using the package manager GUI in R/RStudio, or `install.packages()`). If you're using the GUI, always check the "install dependencies" box.
Line 11: Line 11:
 * DPpackage
 * ggplot2
 * gsubfn
 * hexbin
 * languageR
 * lme4
 * MCMCglmm
 * multcomp
 * plyr
 * reshape
=== Hadleyverse ===

Hadley Wickham has done more than just about anyone to make R more powerful, expressive, and easy to use for common data analysis tasks. These are just the packages that are most often useful for the kind of stuff we do, but if there's a task you are frustrated by in R, Hadley's probably written a package to make it easier.

 * `ggplot2` — Data visualization using grammar of graphics.
 * `dplyr` — Data manipulation pipelines made easy. Noticeably distinct from its spiritual predecessor `plyr`. `dplyr` and `plyr` conflict so don't load both at the same time.
 * `tidyr` — Data cleaning and [[http://blog.rstudio.org/2014/07/22/introducing-tidyr/|tidying]], including reshaping from wide to long (`spread`) and long to wide (`gather`) (replaces `reshape`/`reshape2`). Also has very useful functions like `separate`, for splitting up columns with values like 'beach_b_10' into separate columns with 'beach', 'b', and '10'.
 * `devtools` — Automate common package development workflows. Most useful for [[http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/|writing custom packages]] but also provides idiot-proof installation packages from source on github/bitbucket or arbitrary URLs via `devtools::install_github` etc. (see below).
 * `stringr` and `lubridate` — Process strings and dates/times with less pain.

=== Everything else ===

 * `knitr` — Literate programming for R. Mix code with text (markdown or LaTeX), and `knitr::knit` will run the code, format the output all purdy, and generate an HTML/PDF report. See [[LabmeetingSP13w13|notes from previous lab meeting on using knitr]].
 * `lme4` — Mixed effects modeling.
 * `multcomp` — Confidence intervals and stuff I think.
 * `gsubfn` — More powerful string replacement.
 * `hexbin` — Tired of your boring old square bins? Try some exciting hexbins! Now with two extra sides!
 * `languageR` — Lots of language-specific datasets and code to go along with Baayan's book, "Analyzing Linguistic Data: A practical introduction to statistics".

=== For heavy Bayesian lifting ===

You don't need these unless you want to do any kind of Bayesian modeling.

 * `MCMCglmm` — Does what it says on the tin: Bayesian inference via MCMC for generalized linear mixed models. Much more flexible and powerful than `lme4`, but with a steep learning curve.
 * `rstan` — [[http://mc-stan.org/rstan.html|R interface]] to the [[http://mc-stan.org/|Stan modeling language]]. Good for very efficient sampling of hierarchical models. Doesn't exactly supersede JAGS/BUGS, which are often easier to use and more appropriate for simple models or models where you need to sample categorical variables. '''Note: this package builds a TON of stuff from source and takes a long time to install'''.
 * `glmer2stan` and `rethinking` from [[https://github.com/rmcelreath]]. The first compiles glmer-style mixed model formulas into Stan code (see [[https://hlplab.wordpress.com/2013/12/13/going-full-bayesian-with-mixed-effects-regression-models/|this blog post]]). The second is a more mature and flexible (and actively developed) package (and textbook) that includes `map2stan()` for compiling graphical model-style model specifications (like from JAGS/BUGS) into Stan code. Both need to be install from Github, so use `devtools::install_github('rmcelreath/map2stan')` etc. (as discussed below).
 * `DPpackage` — Functions for Bayesian inference via simulation in nonparametric/semiparametric models (e.g. the eponymous Dirichlet Process or "DP").
 * `mvtnorm` — Multivariate normal and t distribution, probability, and sampling functions.
Line 23: Line 42:
{{{
install.packages(c("gsubfn","hexbin","languageR","lme4","MCMCglmm","multcomp","plyr","reshape"))
{{{#!highlight r numbers=disable
install.packages(c("tidyverse","devtools","DPpackage","gsubfn","hexbin","languageR","lme4","MCMCglmm","multcomp","ez"))
}}}
tidyverse includes: broom, dplyr, forcats, ggplot2, haven, httr, hms, jsonlite, lubridate, magrittr, modelr, purrr, readr, readxl, stringr, tibble, rvest, tidyr, xml2

== From github (source) ==

Sometimes a package isn't available on CRAN (usually temporarily) as a binary for your platform. Or it's not on CRAN at all, but is hosted on Github or Bitbucket or something. In both of these cases, you'll need to install from source.

=== One-time set up: developer tools ===

If you need to build a package from source, make sure you have the developer tools for your OS installed. For MacOS, they are available on the App Store if you have the most up-to-date version of MacOS, or from the [[https://developer.apple.com/downloads/index.action?q=xcode|developer site]] (where you'll need to register for a free account first) if you have anything less than the most up-to-date version. I ''think'' You only need to install the "Command Line Tools", not XCode itself.

You may also need Fortran, which you can install very easily using [[http://brew.sh/|homebrew]], e.g., `brew install gfortran` (recommended) or from the [[http://cran.r-project.org/bin/macosx/tools/|MacOS tools]] page on CRAN. [[http://scicomp.stackexchange.com/a/2470|This StackExchange answer]] is a good discussion of the pros and cons of various ways to install Fortran on MacOS.

The last thing you'll need is to install `devtools` with `install.packages('devtools')` in R.

=== Installing packages from source ===

Let's say I want to install `dplyr` from the github source. I google it and find that it's hosted at [[http://github.com/hadley/dplyr]]. Then, in R:

{{{#!highlight r numbers=disable
library(devtools)
devtools::install_github('hadley/dplyr')
Line 27: Line 68:
Piece of cake.

`devtools` includes a whole family of functions for installing source from pretty much anywhere you might find it. If, for instance, you want to install from the source archive on CRAN (e.g., [[http://cran.r-project.org/src/contrib/dplyr_0.4.1.tar.gz]]), you can use the `install_url` command:

{{{#!highlight r numbers=disable
library(devtools)
devtools::install_url('http://cran.r-project.org/src/contrib/dplyr_0.4.1.tar.gz')
}}}


Line 28: Line 80:
When you install those packages you will get this warning: ```Warning: dependencies ‘marray’, ‘affy’, ‘Biobase’, ‘Rgraphviz’, ‘’ are not available```. To fix it, install the standard packages from [http://www.bioconductor.org/docs/install/ Bioconductor] by doing the following at the R prompt:
Line 30: Line 81:
{{{ '''I have no idea whether this is still necessary but I'm leaving it here for posterity's sake — Dave'''

When you install the packages above, you may get this warning: ```Warning: dependencies ‘marray’, ‘affy’, ‘Biobase’, ‘Rgraphviz’, ‘’ are not available```. To fix it, install the standard packages from [[http://www.bioconductor.org/docs/install/|Bioconductor]] by doing the following at the R prompt:

{{{#!highlight r numbers=disable
Line 36: Line 91:
{{{ {{{#!highlight r numbers=disable
Line 39: Line 94:
adjusting the version number as appropriate. adjusting the version number as appropriate. (n.b. You must have Administrator privileges to install anything under `C:/Program Files/`)

R Packages

1. From CRAN

Install these packages from CRAN (using the package manager GUI in R/RStudio, or install.packages()). If you're using the GUI, always check the "install dependencies" box.

1.1. Hadleyverse

Hadley Wickham has done more than just about anyone to make R more powerful, expressive, and easy to use for common data analysis tasks. These are just the packages that are most often useful for the kind of stuff we do, but if there's a task you are frustrated by in R, Hadley's probably written a package to make it easier.

  • ggplot2 — Data visualization using grammar of graphics.

  • dplyr — Data manipulation pipelines made easy. Noticeably distinct from its spiritual predecessor plyr. dplyr and plyr conflict so don't load both at the same time.

  • tidyr — Data cleaning and tidying, including reshaping from wide to long (spread) and long to wide (gather) (replaces reshape/reshape2). Also has very useful functions like separate, for splitting up columns with values like 'beach_b_10' into separate columns with 'beach', 'b', and '10'.

  • devtools — Automate common package development workflows. Most useful for writing custom packages but also provides idiot-proof installation packages from source on github/bitbucket or arbitrary URLs via devtools::install_github etc. (see below).

  • stringr and lubridate — Process strings and dates/times with less pain.

1.2. Everything else

  • knitr — Literate programming for R. Mix code with text (markdown or LaTeX), and knitr::knit will run the code, format the output all purdy, and generate an HTML/PDF report. See notes from previous lab meeting on using knitr.

  • lme4 — Mixed effects modeling.

  • multcomp — Confidence intervals and stuff I think.

  • gsubfn — More powerful string replacement.

  • hexbin — Tired of your boring old square bins? Try some exciting hexbins! Now with two extra sides!

  • languageR — Lots of language-specific datasets and code to go along with Baayan's book, "Analyzing Linguistic Data: A practical introduction to statistics".

1.3. For heavy Bayesian lifting

You don't need these unless you want to do any kind of Bayesian modeling.

  • MCMCglmm — Does what it says on the tin: Bayesian inference via MCMC for generalized linear mixed models. Much more flexible and powerful than lme4, but with a steep learning curve.

  • rstanR interface to the Stan modeling language. Good for very efficient sampling of hierarchical models. Doesn't exactly supersede JAGS/BUGS, which are often easier to use and more appropriate for simple models or models where you need to sample categorical variables. Note: this package builds a TON of stuff from source and takes a long time to install.

  • glmer2stan and rethinking from https://github.com/rmcelreath. The first compiles glmer-style mixed model formulas into Stan code (see this blog post). The second is a more mature and flexible (and actively developed) package (and textbook) that includes map2stan() for compiling graphical model-style model specifications (like from JAGS/BUGS) into Stan code. Both need to be install from Github, so use devtools::install_github('rmcelreath/map2stan') etc. (as discussed below).

  • DPpackage — Functions for Bayesian inference via simulation in nonparametric/semiparametric models (e.g. the eponymous Dirichlet Process or "DP").

  • mvtnorm — Multivariate normal and t distribution, probability, and sampling functions.

A quicker way to do it is to copy and paste the following line at your R prompt:

install.packages(c("tidyverse","devtools","DPpackage","gsubfn","hexbin","languageR","lme4","MCMCglmm","multcomp","ez"))

tidyverse includes: broom, dplyr, forcats, ggplot2, haven, httr, hms, jsonlite, lubridate, magrittr, modelr, purrr, readr, readxl, stringr, tibble, rvest, tidyr, xml2

2. From github (source)

Sometimes a package isn't available on CRAN (usually temporarily) as a binary for your platform. Or it's not on CRAN at all, but is hosted on Github or Bitbucket or something. In both of these cases, you'll need to install from source.

2.1. One-time set up: developer tools

If you need to build a package from source, make sure you have the developer tools for your OS installed. For MacOS, they are available on the App Store if you have the most up-to-date version of MacOS, or from the developer site (where you'll need to register for a free account first) if you have anything less than the most up-to-date version. I think You only need to install the "Command Line Tools", not XCode itself.

You may also need Fortran, which you can install very easily using homebrew, e.g., brew install gfortran (recommended) or from the MacOS tools page on CRAN. This StackExchange answer is a good discussion of the pros and cons of various ways to install Fortran on MacOS.

The last thing you'll need is to install devtools with install.packages('devtools') in R.

2.2. Installing packages from source

Let's say I want to install dplyr from the github source. I google it and find that it's hosted at http://github.com/hadley/dplyr. Then, in R:

library(devtools)
devtools::install_github('hadley/dplyr')

Piece of cake.

devtools includes a whole family of functions for installing source from pretty much anywhere you might find it. If, for instance, you want to install from the source archive on CRAN (e.g., http://cran.r-project.org/src/contrib/dplyr_0.4.1.tar.gz), you can use the install_url command:

library(devtools)
devtools::install_url('http://cran.r-project.org/src/contrib/dplyr_0.4.1.tar.gz')

3. From Bioconductor

I have no idea whether this is still necessary but I'm leaving it here for posterity's sake — Dave

When you install the packages above, you may get this warning: Warning: dependencies ‘marray’, ‘affy’, ‘Biobase’, ‘Rgraphviz’, ‘’ are not available. To fix it, install the standard packages from Bioconductor by doing the following at the R prompt:

source("http://bioconductor.org/biocLite.R")
biocLite(lib='/Library/Frameworks/R.framework/Resources/library/')

adjusting lib as appropriate for your OS. The example above is for Mac OS X. For Windows use:

biocLite(lib='C:\\Program Files\\R\\R-2.11.1\\library')

adjusting the version number as appropriate. (n.b. You must have Administrator privileges to install anything under C:/Program Files/)

Rpackages (last edited 2017-05-02 14:41:12 by dhcp-10-5-7-149)

MoinMoin Appliance - Powered by TurnKey Linux