Standards in fitting, evaluating, and interpreting regression models

As part of Workshop on common issues and standard in ordinary and multilevel regression modeling March 25, 2009, UC Davis

Common issues in regression modeling and some solutions

common issues in regression modeling
- collinearity
- overfitting
- overly influential cases
- overdispersion?
- model quality (e.g. residuals for linear models)
- building a model: adding/removing variables (also: interactions)
some solutions to these problems for common model types
- outlier handling
- centering
- removing collinearity (e.g. PCA, residualization)
- stratification (using subsets of data)
interpreting the model, making sure the model answers the question of interest:
- testing significance (SE-based tests vs. model comparison)
- interpration of model output, e.g. interpretation of coefficients
- (also: coding of variables)
- follow-up tests

Some suggestions on how to present model results

What do readers need to know?
What do reviewers need to know?
How to talk about effect sizes?
- absolute coefficient size (related to range of predictor) -- talking about effect ranges.
- relative coefficient size (related to its standard error)
- model improvement, partial R-square, etc.
- accuracy?
How to back-translated common transformations of outcomes and predictors?

differences between different models (e.g. ordinary vs. multilevel; linear vs. logit) in terms of available measures of fit; test of significance; etc.

Example of written description of the model

Visualization (3 minutes)

Discussion of some issues (see below) and some examples.

Plotting coefficient summaries
- issues with scales and standardization
- Examples (just some plots, along with function names):
  - [http://idiom.ucsd.edu/~rlevy/papers/doyle-levy-2008-bls.pdf p.8 for fixed effect summary, p.9 for random effect summary]
  - I have some functions, too.
Plotting individual effect shapes with confidence intervals:
- back-transforming predictor to original scale (inverting: rcs, pol, centering, scaling, log, etc.)
- choosing scale for outcome (inverting log-transformed scales or not; probabilities vs. log-odds for logit models)
- Examples (just some plots, along with function names):
  - For ordinary regression models: plot.Design()
  - For mixed models: plotLMER.fnc(), my.plot.glmer(); see [http://hlplab.wordpress.com/2009/01/19/plotting-effects-for-glmer-familybimomial-models/ example]
Plotting model fit:
- Calibration plot:
  - issues with calibration plots for logit models
  - Examples (just some plots, along with function names):
    - For ordinary models: plot.calibration.Design()
    - For mixed models: my.plot.glmerfit(); see [http://hlplab.wordpress.com/2009/01/19/visualizing-the-quality-of-an-glmerfamilybinomial-model/ example]
- Visualization of predictors contribution to model (model comparison): plot.anova.Design()

Create downloadable cheat sheet?

With step-by-step guidelines of some things one should make sure to do when developing a model? Does Harell have something like this?
see also" [http://idiom.ucsd.edu/~rlevy/teaching/fall2008/lign251/one_page_of_main_concepts.pdf roger's summary]

Preparatory readings?

Victor's slides?