Knitr: automatic report generation

Quickstart

If you don't have it already, download RStudio and watch the knitr video tutorial.

What does it do?

Knitr is an R package which allows direct embedding of code (R, Python, etc.) inside documents (LaTeX, HTML, Markdown, etc.). At the most basic level, this means no more copy and pasting or retyping of generated quantities, no more \includegraphics{} and pdf() or ggsave() calls in R, and no more typing scads of LMER coefficients.

What Knitr actually does is to scan the document, extract all the code chunks, evaluate them, format the results in a nice way (that you can control), and insert them back into the document, which can then be compiled in whatever way is appropriate (pdflatex for a LaTeX document, or a web browser/markdown interpreter for HTML or markdown output).

Knitr is based on Sweave, but is a lot better. There's automatic cacheing (so that a document can be re-knit without re-running code that hasn't changed since the last knitting, like time-consuming data analysis code), more transparency (can include all input and output in the final document, as if you ran the code chunks in an R console), pretty formatting of R code and results in LaTeX, and a more modular interface which is easy to hack and expand.

Why would I want to do that??

Generally, including the code which generated your output (figures, tables, p-values, etc.) makes it easier for other people (including future-you) to replicate and check the analyses, and I've found that it cuts down on my tendency to lose code.

  1. Flexibility. Including the code that produced the output in the output file itself makes it really easy to update analyses when more data becomes available, although you still would have to re-write the text if anything substantial changes :)

  2. Transparency. It also makes it easier for other people (especially including future-you) to understand and reproduce your analysis, and can help you check and correct problems or bugs in your analysis.

  3. Stability. Keeping everything together in one document also helps prevent lost code for generating figures/tables/analyses in a separate output document (like a previous manuscript).

It's also just really easy to do, especially if you're already using LaTeX. Even if you're not, you can create markdown documents which are easy to read and can be quickly converted into HTML.

How do I do it?

The quickest way to get started is to use RStudio. After you've installed the knitr package, RStudio has built in support for knitr processing of LaTeX, HTML, and Markdown. Open up preferences, click on 'Sweave', and change 'Weave Rnw files using ...' to knitr. There's also a video tutorial on the knitr homepage showing how to do this.

If you don't want to use RStudio, you can either call knit('filename.Rnw') directly in an R console, which will produce a .tex file, or you can use this shell script that I created which will automatically do that and then pdflatex the resulting .tex file: knit.sh

I recommend using Rstudio (or another IDE, like emacs+ESS+AUCtex), because you can run the R code in the console as you go, instead of having to knit/latex the whole document to see what happens.

For LaTeX+R, the basic (default) syntax is just like Sweave. R code blocks are inserted as so:

Here's some latex stuff.  Then, an R chunk: 
<<block-name, option1='value', option2=T, option3=4>>=
x <- rnorm(10, 0, 1)
print('hello, world!')
@
Now, back to more latex stuff.

You can also insert code inline, using the \Sexpr{} syntax:

This is latex stuff, but if I want to get the value of $x$ then I can say \Sexpr{x[1]} 

When such a document is 'knit', all of these code chunks will be pulled out, evaluated, and re-inserted, after proper formatting. The formatting (among other things) is controlled by chunk options, which go inside the delimiters <<>>=. Here are some of the ones I use most often (and they are all documented very well on the knitr page):

Cacheing

One of the best features of knitr is that it automatically caches the results of each chunk, which means that time-consuming chunks (like a sampler or a call to lmer) don't need to be re-run every time the document is knit. If a chunk is changed at all since the last time it was evaluated, it will be re-run, and the new results stored in the cache. Furthermore, knitr will try to determine which other chunks each chunk depends on; if any of those chunks have changed in the meantime, the chunk will be re-run. You can also manually specify the dependencies, too, using the chunk option dependson=c('chunk1', 'chunk2').

Note that knitr defaults to not cacheing. You can turn on cacheing for a single chunk by setting cache=TRUE as a chunk option, or turn it on as a default by putting opts_chunk$set(cache=TRUE) in a code chunk and setting cache=FALSE for any chunks that you want not cached.

I've found that cacheing (especially using auto-dependency) can be a bit glitchy. If you want to totally re-run everything, just delete (or rename, to be safe) the cache/ subdirectory and knit again.

Figures

Knitr will include any figures that are produced by a chunk in the output. They will be saved to a subdirectory (defaults to figures/), and given the same name as the chunk, and numbered if there are multiple images produced by the chunk. So:

<<some-figs>>=
x <- rnorm(1000)
hist(x)
qqnorm(x)
@

will create two files: figures/some-figs1.pdf and some-figs2.pdf, the first of which is the histogram and the second of which is the QQ plot. The size of the images produced, and how they are inserted into the document, are controlled by chunk options:

(There are more, of course).

A note for window users

It seems knitr requires some packages from the l3kernel, l3packages (and perhaps also l3experimental. So you need to download those packages. Miktex usually does that for you, but these are experimental packages at the moment so you need to create the relevant .sty files yourself after downloading the source code.

A couple of use cases

Lab notebook

Besides the obvious case of doing an entire camera-ready publication in LaTeX+R using knitr, it can be useful for keeping track of analyses for a particular data set. I have only started to do this, but here's a quick (and pretty ugly/not very literate) example:

fs_notebook.Rnw fs_notebook.pdf

Stats homework

Here's a statistics problem set that is run using knitr. It has examples of

kleinschmidt-dave-hw5.Rnw kleinschmidt-dave-hw5.pdf

Poster/talk content

If you use LaTeX and Beamer to produce slides, it's very easy to include working R code and show the actual output in your slides. Here's an example, from a tutorial on using plyr for data analysis etc. at the LSA workshop in summer 2013:

lsa13-plyr-reshape.Rnw lsa13-plyr-reshape.pdf

I've also used knitr as a convenient way to sketch ideas for poster content and keep track of figures in a single document. The finished poster is here.

poster-content.Rnw poster-content.pdf

Here's the knitr document that I used to generate all the schematics for my lunch talk:

plots.Rnw plots.pdf

LabmeetingSP13w13 (last edited 2013-10-16 03:31:19 by cpe-74-74-158-116)

MoinMoin Appliance - Powered by TurnKey Linux