Differences between revisions 9 and 10

Standards in fitting, evaluating, and interpreting regression models

As part of Workshop on common issues and standard in ordinary and multilevel regression modeling March 25, 2009, UC Davis

== To do==

We need examples for many of the topics below. best examples that we can carry through.
maybe we can distribute visualization ideas across the different sections rather than having them separately at the end.
we may need to cut substantially, but I threw in all idea that came to mind for now.

Interpreting a simple model (1 minutes)

Quick example of very naive model interpretation. We will use R throughout to give examples.

What's the overall output (fitted) of the model and what does it mean?
Usually we're not interested in the fitted values per se but in the structure of the model that they stem from. Describe how to read model summary:
- Simple example from linear mixed model (build on Roger's data?)
  - What do coefficients in model summary mean?
  - What does intercept mean?
    - depends on coding!
      - compare contrast vs. dummy coding in balanced data
      - generalize to centering
    - in interpreting effects need to be uncentered
    - ok, now we have interpretable effects
- discuss issues of scale and interpretation. Give a simple example of a back-translation into the original space. Show the formula (e.g. for effects on log duration). Describe both predictor and outcome transforms.
- Be clear about what's nice about having coefficients: directionality and shape of effect can be tested and immediately interpreted. Usually, psycholinguistic theories make predictions about directionality of effect (not just that it matters).

Evaluating a model I - coefficients (XX minutes)

But can we trust these coefficients?

Explain effects of collinearity in SE(beta) and on interpretation of beta.
How to detect collinearity
- vif()
- cor, cor(, method="spearman") , pairs
- fixed effect correlations in mixed models
- doesn't always matter: e.g. if controls that we don't care about!
How to remove collinearity:
- centering: removes collinearity with intercept and higher order terms, including interactions
- residualization and how to interpret the results
- PCA or alikes (disadvantage: interpretability, though sometimes interpretability is still ok; great for controls)
- stratification (using subsets of data)
Use of model comparison tests
- are robust against collinearity
- great to assess significance of interactions, non-linear components, etc.; especially useful for controls where the directionality does not matter that much.
- do not provide directionality of effect
- not completely ok for all types of models --> refer to conceptual background section and to Harald's section

Evaluating a model II - overall quality (XX minutes)

Collinearity aside, can we trust the model overall? Models are fit under assumptions. Are those met? If not, do we have to worry about the violations? The world is never perfect, but when should I really be cautious?

Overly influential cases:
- outlier handling
- tests like dfbeta, dfbetas, cooks' etc. -- how much does it matter? Give an example of where a few points can drive a result.
overfitting and overdispersion:
- explain concepts (explain for which models they apply)
- how can you test?
  - a priori considerations about DFs in model (linear vs. logit model rule of thumb)
  - overfitting is rarely a problem for laboratory experiment data
  - residual plots
  - predicted vs. observed (bootstrapped/cross-validated!)
- what is it and what do multilevel models do to overcome it?
are assumption of model met?
- if not, does it matter?
- how can I assess this? e.g. residuals for linear models, overdispersion parameter

Comparing effect sizes (12 minutes)

Now that we know whether we can trust s model, how can we assess and compare effect sizes? Discuss different ways of how to talk about "effect size": What do the different measures assess? What are their trade-offs? When do different measures lead to different conclusions (if one is not careful enough)? Also mention differences between different types of models (e.g. ordinary vs. multilevel; linear vs. logit) in terms of available measures of fit; test of significance; etc.

coefficient-based:
- absolute coefficient size (related to range of predictor) -- talking about effect ranges.
- relative coefficient size (related to its standard error)
- p-values based on coefficient's standard error:
  - different tests (t,z) --> refer to section on problems with t-test for mixed models; necessity to of mcmc sampling
model improvement
- partial R-square
- deviance
- BIC, AIC
- p-values based on model comparison (-2 * likelihood ratio):
  - not all models are fit with ML (mention mixed linear and mixed logit models) --> refer to Harald's section
  - different fits needed for variance and point estimates (because ML-estimates of variance are biased) --> refer to Harald's section?
  - robustness against collinearity
accuracy: discuss problems (dependency on marginal distribution of outcome)

Visualizing effects (3 minutes)

Discussion of some issues (see below) and some examples.

Plotting coefficient summaries
- issues with scales, back-transformation, and standardization
- Examples (just some plots, along with function names):
  - [http://idiom.ucsd.edu/~rlevy/papers/doyle-levy-2008-bls.pdf p.8 for fixed effect summary, p.9 for random effect summary]
  - I have some functions, too.
Plotting individual effect shapes with confidence intervals:
- back-transforming predictor to original scale (inverting: rcs, pol, centering, scaling, log, etc.)
- choosing scale for outcome (inverting log-transformed scales or not; probabilities vs. log-odds for logit models)
- Examples (just some plots, along with function names):
  - For ordinary regression models: plot.Design()
  - For mixed models: plotLMER.fnc(), my.plot.glmer(); see [http://hlplab.wordpress.com/2009/01/19/plotting-effects-for-glmer-familybimomial-models/ example]
Plotting model fit:
- Calibration plot:
  - issues with calibration plots for logit models
  - Examples (just some plots, along with function names):
    - For ordinary models: plot.calibration.Design()
    - For mixed models: my.plot.glmerfit(); see [http://hlplab.wordpress.com/2009/01/19/visualizing-the-quality-of-an-glmerfamilybinomial-model/ example]
- Visualization of predictors contribution to model (model comparison): plot.anova.Design()

Publishing model (2 minutes)

Summary of what readers (and reviewers) need to know

Create downloadable cheat sheet?

With step-by-step guidelines of some things one should make sure to do when developing a model? Does Harell have something like this?
see also" [http://idiom.ucsd.edu/~rlevy/teaching/fall2008/lign251/one_page_of_main_concepts.pdf roger's summary]

Preparatory readings?

Victor's slides?

-  ⇤ ← Revision 9 as of 2009-01-28 03:53:26 → 
  Size: 3794
  Editor: cpe-67-240-134-21
  Comment:
+   ← Revision 10 as of 2009-01-28 04:49:38 → ⇥
  Size: 7551
  Editor: cpe-67-240-134-21
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 12:
-== Common issues in regression modeling and some solutions ==
    * common issues in regression modeling
       * collinearity
       * overfitting
       * overly influential cases
       * overdispersion?
       * model quality (e.g. residuals for linear models)
       * building a model: adding/removing variables (also: interactions)
    * some solutions to these problems for common model types
      * outlier handling
      * centering
      * removing collinearity (e.g. PCA, residualization)
      * stratification (using subsets of data)
    * interpreting the model, making sure the model answers the question of interest:
      * testing significance (SE-based tests vs. model comparison)
      * interpration of model output, e.g. interpretation of coefficients
      * (also: coding of variables)
      * follow-up tests
+== '''To do'''==
 1. We need examples for many of the topics below. best examples that we can carry through. 
 2. maybe we can distribute visualization ideas across the different sections rather than having them separately at the end.
 3. we may need to cut substantially, but I threw in all idea that came to mind for now.
-Line 31:
+Line 17:
-== Some suggestions on how to present model results ==
 * What do readers ''need'' to know? 
 * What do reviewers ''need'' to know?
 * How to talk about effect sizes?
  * absolute coefficient size (related to range of predictor) -- talking about effect ranges.
  * relative coefficient size (related to its standard error)
  * model improvement, partial R-square, etc.
  * accuracy?
 * How to back-translated common transformations of outcomes and predictors?
+== Interpreting a simple model (1 minutes) ==
Quick example of very naive model interpretation. We will use R throughout to give examples.
 * What's the overall output (fitted) of the model and what does it mean?
 * Usually we're not interested in the fitted values per se but in the structure of the model that they stem from. Describe how to read model summary:
   * Simple example from linear mixed model (build on Roger's data?)
     * What do coefficients in model summary mean? 
     * What does intercept mean?
       * depends on coding! 
         * compare contrast vs. dummy coding in balanced data
         * generalize to centering
       * in interpreting effects need to be uncentered
       * ok, now we have interpretable effects
  * discuss issues of scale and interpretation. Give a simple example of a back-translation into the original space. Show the formula (e.g. for effects on log duration). Describe both predictor and outcome transforms. 
  * Be clear about what's nice about having coefficients: directionality and shape of effect can be tested and immediately interpreted. Usually, psycholinguistic theories make predictions about directionality of effect (not just that it matters).
-Line 41:
+Line 32:
+== Evaluating a model I - coefficients (XX minutes) ==
But can we trust these coefficients?
-Line 42:
+Line 35:
-differences between different models (e.g. ordinary vs. multilevel; linear vs. logit) in terms of available measures of fit; test of significance; etc.
+ * Explain effects of collinearity in SE(beta) and on interpretation of beta.
 * How to detect collinearity
   * {{{vif()}}}
   * {{{cor}}}, {{{cor(, method="spearman") }}}, {{{pairs}}}
   * fixed effect correlations in mixed models
   * doesn't always matter: e.g. if controls that we don't care about!
 * How to remove collinearity:
   * centering: removes collinearity with intercept and higher order terms, including interactions
   * residualization and how to interpret the results
   * PCA or alikes (disadvantage: interpretability, though sometimes interpretability is still ok; great for controls)
   * stratification (using subsets of data)
 * Use of model comparison tests
   * are robust against collinearity
   * great to assess significance of interactions, non-linear components, etc.; especially useful for controls where the directionality does not matter that much.
   * do not provide directionality of effect
   * not completely ok for all types of models --> refer to conceptual background section and to Harald's section
-Line 44:
+Line 52:
-=== Example of written description of the model ===
+== Evaluating a model II - overall quality (XX minutes) ==
Collinearity aside, can we trust the model overall? Models are fit under assumptions. Are those met? If not, do we have to worry about the violations? The world is never perfect, but when should I really be cautious?
-Line 46:
+Line 55:
+ * Overly influential cases:
   * outlier handling
   * tests like dfbeta, dfbetas, cooks' etc. -- how much does it matter? Give an example of where a few points can drive a result.
 * overfitting and overdispersion:
   * explain concepts (explain for which models they apply)
   * how can you test? 
     * a priori considerations about DFs in model (linear vs. logit model rule of thumb)
     * overfitting is rarely a problem for laboratory experiment data
     * residual plots
     * predicted vs. observed (bootstrapped/cross-validated!)
   * what is it and what do multilevel models do to overcome it? 
 * are assumption of model met?
   * if not, does it matter?
   * how can I assess this? e.g. residuals for linear models, overdispersion parameter
-Line 47:
+Line 70:
-=== Visualization (3 minutes) ===
+== Comparing effect sizes (12 minutes) ==
Now that we know whether we can trust s model, how can we assess and compare effect sizes? Discuss different ways of how to talk about "effect size": What do the different measures assess? What are their trade-offs? When do different measures lead to different conclusions (if one is not careful enough)? Also mention differences between different types of models (e.g. ordinary vs. multilevel; linear vs. logit) in terms of available measures of fit; test of significance; etc.

 * coefficient-based:
   * absolute coefficient size (related to range of predictor) -- talking about effect ranges.
   * relative coefficient size (related to its standard error)
   * p-values based on coefficient's standard error:
     * different tests (t,z) --> refer to section on problems with t-test for mixed models; necessity to of mcmc sampling
 * model improvement
   * partial R-square
   * deviance
   * BIC, AIC
   * p-values based on model comparison (-2 * likelihood ratio):
     * not all models are fit with ML (mention mixed linear and mixed logit models) --> refer to Harald's section
     * different fits needed for variance and point estimates (because ML-estimates of variance are biased) --> refer to Harald's section?
     * robustness against collinearity
 * accuracy: discuss problems (dependency on marginal distribution of outcome)

== Visualizing effects (3 minutes) ==
-Line 51:
+Line 92:
-   * issues with scales and standardization
+   * issues with scales, back-transformation, and standardization
-Line 69:
+Line 110:
+== Publishing  model (2 minutes) ==
 * Summary of what readers (and reviewers) ''need'' to know