Differences between revisions 15 and 16
Revision 15 as of 2008-06-05 17:34:35
Size: 2642
Editor: wireless
Comment:
Revision 16 as of 2008-06-05 18:29:07
Size: 3550
Editor: wireless
Comment:
Deletions are marked like this. Additions are marked like this.
Line 31: Line 31:

 * Q: What is really happening behind the scenes when we call the {{{data()}}} command?
 * A: Even after reading {{{?data}}}, it's not totally clear what's happening. It turns out that most packages distributed with R use a database to store their data (this consists of three files (foo.rdb, foo.rds, foo.rdx)). The internal mechanism used to deal with a database like this is the function {{{lazyLoad()}}}. Calling {{{lazyLoad()}}} on a database has the effect of putting all of the objects in the database into your workspace, but the objects aren't actually filled with all of their values until you use them. This means that you can load all of the datasets from a large library like {{{languageR}}} without using up all of your memory. The underlying action take by a call to {{{data()}}} is something like
{{{
lazyLoad(file.path(system.file(package="languageR"), "data", "Rdata"))
}}}

Session 2: Issues in linear regression

June 5 2008

Materials

  • attachment:lexdecRT.R
  • attachment:BaayenETAL06.pdf

Reading

G&H07

Chapter 4 (pp. 53-74)

Linear regression: before and after fitting the model

Baa08

Sections 6.2.2-6.2.4 (pp. 198-212)

Collinearity, Model criticism, and Validation

Section 6.4 (pp. 234-239)

Regression with breakpoints

Notes on the readings

Additional terminology

Feel free to add terms you want clarified in class:

Questions

  • Q: Determining the significance of a coefficient: one-tailed or two-tailed t test?
  • A: It's a two-tailed test because we cannot a-priori assume which direction the coefficient will go. I guess if one had a really strong theoretical reason

to assume one direction, you could do a one-tailed test (which is less conservative).

  • Q: What is really happening behind the scenes when we call the data() command?

  • A: Even after reading ?data, it's not totally clear what's happening. It turns out that most packages distributed with R use a database to store their data (this consists of three files (foo.rdb, foo.rds, foo.rdx)). The internal mechanism used to deal with a database like this is the function lazyLoad(). Calling lazyLoad() on a database has the effect of putting all of the objects in the database into your workspace, but the objects aren't actually filled with all of their values until you use them. This means that you can load all of the datasets from a large library like languageR without using up all of your memory. The underlying action take by a call to data() is something like

lazyLoad(file.path(system.file(package="languageR"), "data", "Rdata"))

Suggested topics

If you have any material that you would like to cover that isn't included in the list below, please make note of it here.

Anchor(assignments)

Assignments

Upload your solutions to this page by 10pm.

G&H07

Section 4.9 (p.76)

Exercise 4

Baa08

Section 6.7 (p. 260)

Exercise 1, 8

In addition to the book problems, we will distribute a data set from the ongoing ngrams project.

AttachList

Anchor(Topics)

Topics

  • More on outliers
    • detect outliers
      • boxplot(), scatterplots plot(), identify()BR

    • dealing with outliers
      • exclusion subset()BR robust regression (based on t-distribution): tlm() (in package hatt)BR

  • Overly influential cases (can be, but don't have to be outliers)
    • lm.influence(), also library(Rcmdr)BR

  • Collinearity:
    • tests: vif(), kappa(), summary of correlations between fixed effects in lmer() BR

    • countermeasures:
      • centering and/or standardizing scale()} BR

      • use of residuals resid(lm(x1 ~ x2, data)) BR

      • principal component analysis (PCA) princomp() BR

  • Model evaluation: Where is the model off?
    • case-by-case: residuals(), predict()BR

    • based on predictor: residuals() against predictors, calibrate()BR

    • overall: validate()BR

  • Corrections:
    • correcting for clusters (violation of assumption of independence): bootcov()BR

HLPMiniCourseSession2 (last edited 2008-11-09 02:03:35 by cpe-67-240-134-21)

MoinMoin Appliance - Powered by TurnKey Linux