3564
Comment:
|
7551
|
Deletions are marked like this. | Additions are marked like this. |
Line 8: | Line 8: |
= Workshop on common issues and standard in ordinary and multilevel regression modeling = | = Standards in fitting, evaluating, and interpreting regression models = As part of '''Workshop on common issues and standard in ordinary and multilevel regression modeling''' |
Line 11: | Line 12: |
== Goal of this workshop == in progress |
== '''To do'''== 1. We need examples for many of the topics below. best examples that we can carry through. 2. maybe we can distribute visualization ideas across the different sections rather than having them separately at the end. 3. we may need to cut substantially, but I threw in all idea that came to mind for now. |
Line 14: | Line 17: |
== Common issues in regression modeling and some solutions == | == Interpreting a simple model (1 minutes) == Quick example of very naive model interpretation. We will use R throughout to give examples. * What's the overall output (fitted) of the model and what does it mean? * Usually we're not interested in the fitted values per se but in the structure of the model that they stem from. Describe how to read model summary: * Simple example from linear mixed model (build on Roger's data?) * What do coefficients in model summary mean? * What does intercept mean? * depends on coding! * compare contrast vs. dummy coding in balanced data * generalize to centering * in interpreting effects need to be uncentered * ok, now we have interpretable effects * discuss issues of scale and interpretation. Give a simple example of a back-translation into the original space. Show the formula (e.g. for effects on log duration). Describe both predictor and outcome transforms. * Be clear about what's nice about having coefficients: directionality and shape of effect can be tested and immediately interpreted. Usually, psycholinguistic theories make predictions about directionality of effect (not just that it matters). |
Line 16: | Line 32: |
Maybe develop cheat sheet? with step-by-step guidelines of some things one should make sure to do when developing a model? does baayen or harell have something like this? * e.g. [http://idiom.ucsd.edu/~rlevy/teaching/fall2008/lign251/one_page_of_main_concepts.pdf roger's summary] |
== Evaluating a model I - coefficients (XX minutes) == But can we trust these coefficients? |
Line 19: | Line 35: |
* Explain effects of collinearity in SE(beta) and on interpretation of beta. * How to detect collinearity * {{{vif()}}} * {{{cor}}}, {{{cor(, method="spearman") }}}, {{{pairs}}} * fixed effect correlations in mixed models * doesn't always matter: e.g. if controls that we don't care about! * How to remove collinearity: * centering: removes collinearity with intercept and higher order terms, including interactions * residualization and how to interpret the results * PCA or alikes (disadvantage: interpretability, though sometimes interpretability is still ok; great for controls) * stratification (using subsets of data) * Use of model comparison tests * are robust against collinearity * great to assess significance of interactions, non-linear components, etc.; especially useful for controls where the directionality does not matter that much. * do not provide directionality of effect * not completely ok for all types of models --> refer to conceptual background section and to Harald's section |
|
Line 20: | Line 52: |
* common issues in regression modeling * collinearity * overfitting * overly influential cases * overdispersion? * model quality (e.g. residuals for linear models) * building a model: adding/removing variables (also: interactions) * some solutions to these problems for common model types * outlier handling * centering * removing collinearity (e.g. PCA, residualization) * stratification (using subsets of data) * interpreting the model, making sure the model answers the question of interest: * testing significance (SE-based tests vs. model comparison) * interpration of model output, e.g. interpretation of coefficients * (also: coding of variables) * follow-up tests |
== Evaluating a model II - overall quality (XX minutes) == Collinearity aside, can we trust the model overall? Models are fit under assumptions. Are those met? If not, do we have to worry about the violations? The world is never perfect, but when should I really be cautious? |
Line 38: | Line 55: |
== Some suggestions on how to present model results == * What do readers ''need'' to know? * What do reviewers ''need'' to know? * How to talk about effect sizes? * absolute coefficient size (related to range of predictor) -- talking about effect ranges. * relative coefficient size (related to its standard error) * model improvement, partial R-square, etc. * accuracy? * How to back-translated common transformations of outcomes and predictors? |
* Overly influential cases: * outlier handling * tests like dfbeta, dfbetas, cooks' etc. -- how much does it matter? Give an example of where a few points can drive a result. * overfitting and overdispersion: * explain concepts (explain for which models they apply) * how can you test? * a priori considerations about DFs in model (linear vs. logit model rule of thumb) * overfitting is rarely a problem for laboratory experiment data * residual plots * predicted vs. observed (bootstrapped/cross-validated!) * what is it and what do multilevel models do to overcome it? * are assumption of model met? * if not, does it matter? * how can I assess this? e.g. residuals for linear models, overdispersion parameter |
Line 48: | Line 70: |
=== Written description of the model === | == Comparing effect sizes (12 minutes) == Now that we know whether we can trust s model, how can we assess and compare effect sizes? Discuss different ways of how to talk about "effect size": What do the different measures assess? What are their trade-offs? When do different measures lead to different conclusions (if one is not careful enough)? Also mention differences between different types of models (e.g. ordinary vs. multilevel; linear vs. logit) in terms of available measures of fit; test of significance; etc. |
Line 50: | Line 73: |
* coefficient-based: * absolute coefficient size (related to range of predictor) -- talking about effect ranges. * relative coefficient size (related to its standard error) * p-values based on coefficient's standard error: * different tests (t,z) --> refer to section on problems with t-test for mixed models; necessity to of mcmc sampling * model improvement * partial R-square * deviance * BIC, AIC * p-values based on model comparison (-2 * likelihood ratio): * not all models are fit with ML (mention mixed linear and mixed logit models) --> refer to Harald's section * different fits needed for variance and point estimates (because ML-estimates of variance are biased) --> refer to Harald's section? * robustness against collinearity * accuracy: discuss problems (dependency on marginal distribution of outcome) |
|
Line 51: | Line 88: |
=== Visualization === | == Visualizing effects (3 minutes) == Discussion of some issues (see below) and some examples. |
Line 54: | Line 92: |
* issues with scales and standardization | * issues with scales, back-transformation, and standardization |
Line 63: | Line 101: |
* For mixed models: {{{plotLMER.fnc()}}}, {{{my.plot.glmer()}}}; see http://hlplab.wordpress.com/2009/01/19/plotting-effects-for-glmer-familybimomial-models/ example] | * For mixed models: {{{plotLMER.fnc()}}}, {{{my.plot.glmer()}}}; see [http://hlplab.wordpress.com/2009/01/19/plotting-effects-for-glmer-familybimomial-models/ example] |
Line 69: | Line 107: |
* For mixed models: {{{my.plot.glmerfit()}}}; see http://hlplab.wordpress.com/2009/01/19/visualizing-the-quality-of-an-glmerfamilybinomial-model/ example] | * For mixed models: {{{my.plot.glmerfit()}}}; see [http://hlplab.wordpress.com/2009/01/19/visualizing-the-quality-of-an-glmerfamilybinomial-model/ example] |
Line 72: | Line 110: |
== What can you do to advance development of procedures and visualizations? == cite =) |
== Publishing model (2 minutes) == * Summary of what readers (and reviewers) ''need'' to know |
Line 75: | Line 113: |
== Create downloadable cheat sheet? == * With step-by-step guidelines of some things one should make sure to do when developing a model? Does Harell have something like this? * see also" [http://idiom.ucsd.edu/~rlevy/teaching/fall2008/lign251/one_page_of_main_concepts.pdf roger's summary] |
|
Line 76: | Line 117: |
== Readings == |
== Preparatory readings? == * Victor's slides? |
Standards in fitting, evaluating, and interpreting regression models
As part of Workshop on common issues and standard in ordinary and multilevel regression modeling March 25, 2009, UC Davis
== To do==
- We need examples for many of the topics below. best examples that we can carry through.
- maybe we can distribute visualization ideas across the different sections rather than having them separately at the end.
- we may need to cut substantially, but I threw in all idea that came to mind for now.
Interpreting a simple model (1 minutes)
Quick example of very naive model interpretation. We will use R throughout to give examples.
- What's the overall output (fitted) of the model and what does it mean?
- Usually we're not interested in the fitted values per se but in the structure of the model that they stem from. Describe how to read model summary:
- Simple example from linear mixed model (build on Roger's data?)
- What do coefficients in model summary mean?
- What does intercept mean?
- depends on coding!
- compare contrast vs. dummy coding in balanced data
- generalize to centering
- in interpreting effects need to be uncentered
- ok, now we have interpretable effects
- depends on coding!
- discuss issues of scale and interpretation. Give a simple example of a back-translation into the original space. Show the formula (e.g. for effects on log duration). Describe both predictor and outcome transforms.
- Be clear about what's nice about having coefficients: directionality and shape of effect can be tested and immediately interpreted. Usually, psycholinguistic theories make predictions about directionality of effect (not just that it matters).
- Simple example from linear mixed model (build on Roger's data?)
Evaluating a model I - coefficients (XX minutes)
But can we trust these coefficients?
- Explain effects of collinearity in SE(beta) and on interpretation of beta.
- How to detect collinearity
vif()
cor, cor(, method="spearman") , pairs
- fixed effect correlations in mixed models
- doesn't always matter: e.g. if controls that we don't care about!
- How to remove collinearity:
- centering: removes collinearity with intercept and higher order terms, including interactions
- residualization and how to interpret the results
- PCA or alikes (disadvantage: interpretability, though sometimes interpretability is still ok; great for controls)
- stratification (using subsets of data)
- Use of model comparison tests
- are robust against collinearity
- great to assess significance of interactions, non-linear components, etc.; especially useful for controls where the directionality does not matter that much.
- do not provide directionality of effect
not completely ok for all types of models --> refer to conceptual background section and to Harald's section
Evaluating a model II - overall quality (XX minutes)
Collinearity aside, can we trust the model overall? Models are fit under assumptions. Are those met? If not, do we have to worry about the violations? The world is never perfect, but when should I really be cautious?
- Overly influential cases:
- outlier handling
- tests like dfbeta, dfbetas, cooks' etc. -- how much does it matter? Give an example of where a few points can drive a result.
- overfitting and overdispersion:
- explain concepts (explain for which models they apply)
- how can you test?
- a priori considerations about DFs in model (linear vs. logit model rule of thumb)
- overfitting is rarely a problem for laboratory experiment data
- residual plots
- predicted vs. observed (bootstrapped/cross-validated!)
- what is it and what do multilevel models do to overcome it?
- are assumption of model met?
- if not, does it matter?
- how can I assess this? e.g. residuals for linear models, overdispersion parameter
Comparing effect sizes (12 minutes)
Now that we know whether we can trust s model, how can we assess and compare effect sizes? Discuss different ways of how to talk about "effect size": What do the different measures assess? What are their trade-offs? When do different measures lead to different conclusions (if one is not careful enough)? Also mention differences between different types of models (e.g. ordinary vs. multilevel; linear vs. logit) in terms of available measures of fit; test of significance; etc.
- coefficient-based:
- absolute coefficient size (related to range of predictor) -- talking about effect ranges.
- relative coefficient size (related to its standard error)
- p-values based on coefficient's standard error:
different tests (t,z) --> refer to section on problems with t-test for mixed models; necessity to of mcmc sampling
- model improvement
- partial R-square
- deviance
- BIC, AIC
- p-values based on model comparison (-2 * likelihood ratio):
not all models are fit with ML (mention mixed linear and mixed logit models) --> refer to Harald's section
different fits needed for variance and point estimates (because ML-estimates of variance are biased) --> refer to Harald's section?
- robustness against collinearity
- accuracy: discuss problems (dependency on marginal distribution of outcome)
Visualizing effects (3 minutes)
Discussion of some issues (see below) and some examples.
- Plotting coefficient summaries
- issues with scales, back-transformation, and standardization
- Examples (just some plots, along with function names):
[http://idiom.ucsd.edu/~rlevy/papers/doyle-levy-2008-bls.pdf p.8 for fixed effect summary, p.9 for random effect summary]
- I have some functions, too.
- Plotting individual effect shapes with confidence intervals:
- back-transforming predictor to original scale (inverting: rcs, pol, centering, scaling, log, etc.)
- choosing scale for outcome (inverting log-transformed scales or not; probabilities vs. log-odds for logit models)
- Examples (just some plots, along with function names):
For ordinary regression models: plot.Design()
For mixed models: plotLMER.fnc(), my.plot.glmer(); see [http://hlplab.wordpress.com/2009/01/19/plotting-effects-for-glmer-familybimomial-models/ example]
- Plotting model fit:
- Calibration plot:
- issues with calibration plots for logit models
- Examples (just some plots, along with function names):
For ordinary models: plot.calibration.Design()
For mixed models: my.plot.glmerfit(); see [http://hlplab.wordpress.com/2009/01/19/visualizing-the-quality-of-an-glmerfamilybinomial-model/ example]
Visualization of predictors contribution to model (model comparison): plot.anova.Design()
- Calibration plot:
Publishing model (2 minutes)
Summary of what readers (and reviewers) need to know
Create downloadable cheat sheet?
- With step-by-step guidelines of some things one should make sure to do when developing a model? Does Harell have something like this?
see also" [http://idiom.ucsd.edu/~rlevy/teaching/fall2008/lign251/one_page_of_main_concepts.pdf roger's summary]
Preparatory readings?
- Victor's slides?