Session 1: Linear regression
May 29 2008
This session will cover the basics of linear regression. See below for a [#Topics list of topics]. Please make sure to do the readings, and post any terminology you'd like to be clarified or other questions you have below. You can also suggest further topics, but keep in mind that Session2 also covers aspects of linear regression modeling, specifically typical issues that come up during the modeling. The goal of this first session is to go through the basic steps of building a linear regression model and understanding the output of it. Session 2 is on validating how good this model is.
We've also posted some [#assignments assignments] below that you should hand in by Friday, so that we can post them on this wiki page. There is only one way to learn how to use the methods we will talk about and that is to apply them yourself to a data set that you understand. The tutorial is intended to get you to the level where you can do that.
Reading
| G&H07 | Chapter 3 (pp. 29-49) | Linear regression: the basics | 
| Baa08 | Section 4.3.2 (pp. 91 - 105) | Functional relations: linear regression | 
| 
 | Sections 6 - 6.2.1 (pp. 181-198) | Regression Modeling (Introduction and Ordinary Least Squares Regression) | 
| 
 | Section 6.6 (pp. 258-259) | General considerations | 
Notes on the readings
If you'd like to follow along, the dataset used in the G&H07 reading can be found here: [http://www.stat.columbia.edu/~gelman/arm/examples/child.iq/]. To use the file, you will need to load the "foreign" package, then use the read.dta() function. Eg:
- library("foreign") BR kidiq <- read.dta(file="kidiq.dta") 
Additional terminology
Feel free to add terms you want clarified in class:
Questions
- Q:
Assignments
Send your solutions to Andrew Watts, who will upload them here. Please send them by Friday 3:30pm.
| G&H07 | Section 3.9 (pp. 50-51) | Exercises 3 and 5 | 
| Baa08 | Section 4.7 (p. 126) | Exercises 3 and 7* | 
* (for Exercise 7, Baayen treats linear regression using lm or ols as the same as analysis of covariance (see section 4.4.1 (pp. 117-119))).
Suggested topics
If you have any material that you would like to cover that isn't included in the list below, please make note of it here.
Topics
Interacting with R and R files
- Quick recap: Formulating your research questions; Hypothesis testing; a "model"
- Understanding your data set, predictors, and outcome, available information - str(), summary(), names()BR 
 
- Understanding the distributions of your variables
- Understanding dependencies between your variables - pairs(), cor(), abline(), loess()BR 
 
- The Linear Model (LM)
- Building a linear model (for data analysis) - lm(), ols()BR Structure and class of these objects - coef(), display(), summary(), resid() 
 
 
- Interpreting the output of a linear model
- Using a model to predict unseen data - predict()BR 
 
- Understanding the influence of individual cases, identifying outliers
