#acl HlpLabGroup:read,write,delete,revert,admin All:read
#format wiki
#language en
#pragma section-numbers 4
## page was renamed from HlpLab/StatsCourses/HLPCourse
## page was renamed from HlpLab/StatsMiniCourse

= Standards in fitting, evaluating, and interpreting regression models =
As part of '''Workshop on common issues and standard in ordinary and multilevel regression modeling'''
March 25, 2009, UC Davis

'''The workshop page is now online: [http://hlplab.wordpress.com/2009-pre-cuny-workshop-on-ordinary-and-multilevel-models-womm/ http://hlplab.wordpress.com/2009-pre-cuny-workshop-on-ordinary-and-multilevel-models-womm/] --- check it out for the final presentations (a 4 hour tutorial, including talks by Roger Levy, Harald Baayen, Victor Kuperman, Florian Jaeger, Dale Barr, and Austin Frank).

== To do ==
 1. We need examples for many of the topics below. best examples that we can carry through. 
 2. maybe we can distribute visualization ideas across the different sections rather than having them separately at the end.
 3. we may need to cut substantially, but I threw in all idea that came to mind for now.

== Interpreting a simple model (5 minutes) ==
Quick example of very naive model interpretation. We will use R throughout to give examples.
 * What's the overall output (fitted) of the model and what does it mean?
 * Usually we're not interested in the fitted values per se but in the structure of the model that they stem from. Describe how to read model summary:
   * Simple example from linear mixed model (build on Roger's data?)
     * What do coefficients in model summary mean? 
     * What does intercept mean?
       * depends on coding! 
         * compare contrast vs. dummy coding in balanced data
         * generalize to centering
       * in interpreting effects need to be uncentered
       * ok, now we have interpretable effects
  * discuss issues of scale and interpretation. Give a simple example of a back-translation into the original space. Show the formula (e.g. for effects on log duration). Describe both predictor and outcome transforms. 
  * Be clear about what's nice about having coefficients: directionality and shape of effect can be tested and immediately interpreted. Usually, psycholinguistic theories make predictions about directionality of effect (not just that it matters).

== Evaluating a model I - coefficients (XX minutes) ==
But can we trust these coefficients?

 * Explain effects of collinearity in SE(beta) and on interpretation of beta.
 * How to detect collinearity
   * {{{vif()}}}
   * {{{cor}}}, {{{cor(, method="spearman") }}}, {{{pairs}}}
   * fixed effect correlations in mixed models
   * doesn't always matter: e.g. if controls that we don't care about!
 * How to remove collinearity:
   * centering: removes collinearity with intercept and higher order terms, including interactions
   * residualization and how to interpret the results
   * PCA or alikes (disadvantage: interpretability, though sometimes interpretability is still ok; great for controls)
   * stratification (using subsets of data)
 * Use of model comparison tests
   * are robust against collinearity
   * great to assess significance of interactions, non-linear components, etc.; especially useful for controls where the directionality does not matter that much.
   * do not provide directionality of effect
   * not completely ok for all types of models --> refer to conceptual background section and to Harald's section

== Evaluating a model II - overall quality (XX minutes) ==
Collinearity aside, can we trust the model overall? Models are fit under assumptions. Are those met? If not, do we have to worry about the violations? The world is never perfect, but when should I really be cautious?

 * Overly influential cases:
   * outlier handling
   * tests like dfbeta, dfbetas, cooks' etc. -- how much does it matter? Give an example of where a few points can drive a result.
 * overfitting and overdispersion:
   * explain concepts (explain for which models they apply)
   * how can you test? 
     * a priori considerations about DFs in model (linear vs. logit model rule of thumb)
     * overfitting is rarely a problem for laboratory experiment data
     * residual plots
     * predicted vs. observed (bootstrapped/cross-validated!)
   * what is it and what do multilevel models do to overcome it? 
 * are assumption of model met?
   * if not, does it matter?
   * how can I assess this? e.g. residuals for linear models, overdispersion parameter

== Comparing effect sizes (12 minutes) ==
Now that we know whether we can trust s model, how can we assess and compare effect sizes? Discuss different ways of how to talk about "effect size": What do the different measures assess? What are their trade-offs? When do different measures lead to different conclusions (if one is not careful enough)? Also mention differences between different types of models (e.g. ordinary vs. multilevel; linear vs. logit) in terms of available measures of fit; test of significance; etc.

 * coefficient-based:
   * absolute coefficient size (related to range of predictor) -- talking about effect ranges.
   * relative coefficient size (related to its standard error)
   * p-values based on coefficient's standard error:
     * different tests (t,z) --> refer to section on problems with t-test for mixed models; necessity to of mcmc sampling
 * model improvement
   * partial R-square
   * deviance
   * BIC, AIC
   * p-values based on model comparison (-2 * likelihood ratio):
     * not all models are fit with ML (mention mixed linear and mixed logit models) --> refer to Harald's section
     * different fits needed for variance and point estimates (because ML-estimates of variance are biased) --> refer to Harald's section?
     * robustness against collinearity
 * accuracy: discuss problems (dependency on marginal distribution of outcome)

== Visualizing effects (3 minutes) ==
Discussion of some issues (see below) and some examples. 

 * Plotting coefficient summaries 
   * issues with scales, back-transformation, and standardization
   * Examples (just some plots, along with function names): 
     * [http://idiom.ucsd.edu/~rlevy/papers/doyle-levy-2008-bls.pdf p.8 for fixed effect summary, p.9 for random effect summary]
     * I have some functions, too.
 * Plotting individual effect shapes with confidence intervals:
   * back-transforming predictor to original scale (inverting: rcs, pol, centering, scaling, log, etc.)
   * choosing scale for outcome (inverting log-transformed scales or not; probabilities vs. log-odds for logit models)
   * Examples (just some plots, along with function names): 
     * For ordinary regression models: {{{plot.Design()}}}
     * For mixed models: {{{plotLMER.fnc()}}}, {{{my.plot.glmer()}}}; see [http://hlplab.wordpress.com/2009/01/19/plotting-effects-for-glmer-familybimomial-models/ example]
 * Plotting model fit:
   * Calibration plot:
     * issues with calibration plots for logit models
     * Examples (just some plots, along with function names): 
       * For ordinary models: {{{plot.calibration.Design()}}}
       * For mixed models: {{{my.plot.glmerfit()}}}; see [http://hlplab.wordpress.com/2009/01/19/visualizing-the-quality-of-an-glmerfamilybinomial-model/ example]
   * Visualization of predictors contribution to model (model comparison): {{{plot.anova.Design()}}}

== Publishing  model (2 minutes) ==
 * Summary of what readers (and reviewers) ''need'' to know

== Create downloadable cheat sheet? ==
 * With step-by-step guidelines of some things one should make sure to do when developing a model? Does Harell have something like this?
 * see also" [http://idiom.ucsd.edu/~rlevy/teaching/fall2008/lign251/one_page_of_main_concepts.pdf roger's summary]

== Preparatory readings? ==
 * Victor's slides?