Size: 5722
Comment:
|
← Revision 46 as of 2008-11-09 04:14:14 ⇥
Size: 9758
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#acl HlpLabGroup:read,write,delete,revert All:read | #acl HlpLabGroup,TanenhausLabGroup:read,write,delete,revert,admin All:read |
Line 23: | Line 23: |
== Materials == * attachment:session_0.R * attachment:sleepstudy.csv * attachment:cake.tab |
|
Line 27: | Line 32: |
|| Wiki |||| [http://en.wikipedia.org/wiki/Probability_distribution Probability distributions] || | || Wiki |||| intros to [http://en.wikipedia.org/wiki/Probability_theory probability theory] and [http://en.wikipedia.org/wiki/Probability_distribution distributions] ([http://en.wikipedia.org/wiki/Bernoulli_distribution Bernoulli] and [http://en.wikipedia.org/wiki/Normal_distribution normal] distribution)|| |
Line 39: | Line 44: |
Absolute novices to R should start by reading at least the parts of Dalgaard's Ch1 indicated above (but really Baayen is pretty easy even without that). Everyone should read Baayen's R intro chapter. Then read the wiki entrie givne above, followed by a quick walk through Dalgaard's Ch2 on probability distributions in R. | Absolute novices to R should start by reading at least the parts of Dalgaard's Ch1 indicated above (but really Baayen is pretty easy even without that). Everyone should read Baayen's R intro chapter. Then read the top part of the wiki entries given above (they get pretty technical after some time, so just read the intro; for a nice condensed intro to probability theory, I recommend [http://nlp.stanford.edu/fsnlp/ Manning and Schuetze 1999:Ch1.2]). Then do a quick walk through Dalgaard's Ch2 on probability distributions in R to play around with some distributions and to get familiar with R. Plan for 1-2 hours of reading and typing (in R) just for the Baayen and Dalgaard chapters. |
Line 47: | Line 52: |
* term1 * term2 * ... |
* {{{file.choose}}} (where do you put this & what exactly happens?) * {{{file.choose()}}} uses the graphical interface to choose a file. Try {{{read.csv(file=file.choose())}}} * {{{ls()}}} v {{{objects()}}} * They are the same thing. Try {{{?ls()}}} and notice that they have the same arguments with the same defaults. Also, {{{?objects}}} takes you to the same page. * {{{tapply()}}} v {{{aggregate()}}} * As long as the data that you want to do computations on is in a vector, {{{tapply()}}} can probably be made to do what you want to do. One difference is that {{{aggregate()}}} is a generic function which can be specialized to deal with different data structures. This means that there could be different versions of {{{aggregate()}}} for {{{lme4}}} models and {{{Design}}} models. In practice, this really just means that whatever type of data structure you apply {{{aggregate()}}} to, you'll probably get the same data structure back. With {{{tapply()}}}, you always get a list back. ''NB'': use {{{unlist()}}} on the results of {{{tapply()}}} to get them back into a useful form. * I don't use either very much. Instead, I use {{{summarize()}}} from the {{{Design}}} package. The syntax is similar to {{{aggregate()}}}, but it has one crucial ability that {{{aggregate()}}} does not. When you want to apply a user-made function to each element of a data set, {{{aggregate()}}} can only return a single scalar result for each row. {{{summarize()}}} is smart enough to handle functions that return a list of results, putting each element of the vector into its own column in the results. {{{tapply}}} can return lists as well, but generally only works on vectors, not on data frames.[[BR]] {{{ > with(sleepstudy, aggregate(Reaction, by=list(Subject), FUN = function (x) {c(is.character(x), mean(x))})) Error in aggregate.data.frame(as.data.frame(x), ...) : 'FUN' must always return a scalar }}} {{{ > with(sleepstudy, summarize(Reaction, by=llist(Subject), FUN= function (x) { llist(mean (x), is.character(x)) })) Subject Reaction is.character.x. 1 308 342.1338 0 2 309 215.2330 0 3 310 231.0013 0 4 330 303.2214 0 5 331 309.4361 0 6 332 307.3021 0 7 333 316.1583 0 8 334 295.3021 0 9 335 250.0700 0 10 337 375.7210 0 11 349 275.8345 0 12 350 313.6027 0 13 351 290.0978 0 14 352 337.4215 0 15 369 306.0346 0 16 370 291.7018 0 17 371 294.9840 0 18 372 317.8861 0 }}} Also try {{{with(sleepstudy, tapply(X=Reaction, INDEX=c(Subject), FUN = function (x) { list( mean(x), is.character(x)) } )) }}} to see the same computation carried out using {{{tapply()}}}. === Questions === * Q: A clarification of the stance on multiple comparisons in G&H (p. 22). Why do they think it's nothing to worry about? * A: [http://www.stat.columbia.edu/~cook/movabletype/archives/2008/03/why_i_dont_usua_1.html Here] is Gelman's elaboration on the matter. The paper is readable and not too long. |
Line 103: | Line 149: |
---- [[AttachInfo]] [[AttachList]] |
I'm going to be running this section of the course. All questions and comments should either be posted here, or sent directly to me. -- AustinFrank DateTime(2008-05-20T18:19:39Z)
Make sure to read the main page for this tutorial prior to reading this one. In particular, please read the "How to read" section. In a crash course approach like the one we will be taking here, it is inevitable that you will encounter terminology that you are unfamiliar with. Please collect those and feel free to post on this wiki. Each session page has a section with Notes on the readings (by Austin and me) and Additional terminology (where you can add terms you want clarified - just edit the page). -- Florian Jaeger DateTime(2008-05-21T17:21:00Z)
Session 0: Basics (with optional R primer)
11:00, May 27 2008
This optional meeting of the course will be an R primer. We'll be focusing on obtaining a basic level of familiarity required to participate in the course. We won't cover everything you'll need to know for the course, but hopefully we'll cover enough that you will be able to learn new material on your own.
Whether you attend the R primer or not, you are responsible for understanding the content of these readings before the first session.
If there are things you would like to cover, please make note of them below. NB: As of now there are no plans to teach graphing functions during the primer. It's possible that we could have a special add-on session to discuss different graphics packages in R if there's sufficient interest. Let AustinFrank know if you want to participate in such an event.
This page has three sections: Readings and notes on them, Assignments, and a list of topics that will be covered in class (to be edited by you).
Materials
- attachment:session_0.R
- attachment:sleepstudy.csv
- attachment:cake.tab
Reading
Understanding of this material will be assumed throughout the course. Please read these introductory materials and make sure you understand them before beginning the readings for the first session. Here and for the other sessions, we sometimes assign parts of chapters for reading, so please check the page numbers for the suggested readings (or risk being confused).
Baa08 |
Chapter 1 (pp. 1-20) |
Intro to R |
Wiki |
intros to [http://en.wikipedia.org/wiki/Probability_theory probability theory] and [http://en.wikipedia.org/wiki/Probability_distribution distributions] ([http://en.wikipedia.org/wiki/Bernoulli_distribution Bernoulli] and [http://en.wikipedia.org/wiki/Normal_distribution normal] distribution) |
|
Dal04 |
Chapter 2 (pp. 45-55) |
Probability distributions in R |
G&H07 |
Chapter 2 (pp. 13-26) |
Terminological convention and intro to probability theory |
For absolute beginners, this is also very useful:
Dal04 |
Chapter 1.1 - 1.2 |
Basics of R |
|
Chapter 1.5 - 1.5 |
|
Additionally, feel free to download and print out this [attachment:R-Refcard.pdf reference card] for R. While it's a few years old, the basics it covers have not changed.
Notes on the readings
Absolute novices to R should start by reading at least the parts of Dalgaard's Ch1 indicated above (but really Baayen is pretty easy even without that). Everyone should read Baayen's R intro chapter. Then read the top part of the wiki entries given above (they get pretty technical after some time, so just read the intro; for a nice condensed intro to probability theory, I recommend [http://nlp.stanford.edu/fsnlp/ Manning and Schuetze 1999:Ch1.2]). Then do a quick walk through Dalgaard's Ch2 on probability distributions in R to play around with some distributions and to get familiar with R. Plan for 1-2 hours of reading and typing (in R) just for the Baayen and Dalgaard chapters.
The G&H chapter is not an introduction to probability theory, but rather a summary of notational conventions for that book with brief explanations of the concepts. This chapter is definitely not the strong part of the book and not very insightful (and sometimes close to wrong in its simplification), but you should read through it, being fine with the fact that probably it will somewhat confuse you. Later you can go back to this chapter to review conventions used in G&H.
Additional terminology
Feel free to add terms you want clarified in class:
file.choose (where do you put this & what exactly happens?)
file.choose() uses the graphical interface to choose a file. Try read.csv(file=file.choose())
ls() v objects()
They are the same thing. Try ?ls() and notice that they have the same arguments with the same defaults. Also, ?objects takes you to the same page.
tapply() v aggregate()
As long as the data that you want to do computations on is in a vector, tapply() can probably be made to do what you want to do. One difference is that aggregate() is a generic function which can be specialized to deal with different data structures. This means that there could be different versions of aggregate() for lme4 models and Design models. In practice, this really just means that whatever type of data structure you apply aggregate() to, you'll probably get the same data structure back. With tapply(), you always get a list back. NB: use unlist() on the results of tapply() to get them back into a useful form.
I don't use either very much. Instead, I use summarize() from the Design package. The syntax is similar to aggregate(), but it has one crucial ability that aggregate() does not. When you want to apply a user-made function to each element of a data set, aggregate() can only return a single scalar result for each row. summarize() is smart enough to handle functions that return a list of results, putting each element of the vector into its own column in the results. tapply can return lists as well, but generally only works on vectors, not on data frames.BR
> with(sleepstudy, aggregate(Reaction, by=list(Subject), FUN = function (x) {c(is.character(x), mean(x))})) Error in aggregate.data.frame(as.data.frame(x), ...) : 'FUN' must always return a scalar
> with(sleepstudy, summarize(Reaction, by=llist(Subject), FUN= function (x) { llist(mean (x), is.character(x)) })) Subject Reaction is.character.x. 1 308 342.1338 0 2 309 215.2330 0 3 310 231.0013 0 4 330 303.2214 0 5 331 309.4361 0 6 332 307.3021 0 7 333 316.1583 0 8 334 295.3021 0 9 335 250.0700 0 10 337 375.7210 0 11 349 275.8345 0 12 350 313.6027 0 13 351 290.0978 0 14 352 337.4215 0 15 369 306.0346 0 16 370 291.7018 0 17 371 294.9840 0 18 372 317.8861 0
Also try with(sleepstudy, tapply(X=Reaction, INDEX=c(Subject), FUN = function (x) { list( mean(x), is.character(x)) } )) to see the same computation carried out using tapply().
Questions
Q: A clarification of the stance on multiple comparisons in G&H (p. 22). Why do they think it's nothing to worry about?
A: [http://www.stat.columbia.edu/~cook/movabletype/archives/2008/03/why_i_dont_usua_1.html Here] is Gelman's elaboration on the matter. The paper is readable and not too long.
Assignments
Make sure you have the latest version of R (version 2.7) installed on a laptop that you can use during class. AndrewWatts wrote some very useful [http://linginst07.stanford.edu/florianR/software/ instructions for installing R] last summer. You should still be able to follow those steps, but make sure you download and install version 2.7.
R Primer
Suggested topics
If you have any material that you would like to cover that isn't included in the list below, please make note of it here.
Topics
Interacting with R and R files
- Using a command line
- Installing packages
install.package(), update.package()
- Using the R workspace
ls(), rm(), setwd(), getwd(), library()
- Using an R script file
- Saving R objects
save(), save.image()
Getting help
- Function-specific help
?(), help()
- Searching the help
apropos(), help.search(), RSiteSearch()
Loading data
- general purpose functions
scan()
- specific formats
read.csv(), read.delim(), library(foreign)
General data structures
- vectors / arrays
c()
- matrices
cbind(), rbind(), table()
- lists
list(), unlist()
- data frames
data.frame()
- general structure manipulation and interaction
[], [[]], $, subset(), str(), cut(), repr()
Basic descriptive statistics
- summary stats
mean(), sd(), var(), quantile()
Probability distributions
- random sampling
runif(), rnorm(), rbinom()