7108
Comment:
|
8951
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= Variables = | = Experimental Design Basics = == Variables == |
Line 10: | Line 12: |
== Continuous vs. Discrete == | === Continuous vs. Discrete === |
Line 21: | Line 23: |
== Dependent vs. Independent == | === Dependent vs. Independent === |
Line 24: | Line 26: |
For example, the independent variable of daily [[vitamin C]] intake (how much vitamin C one consumes) can influence the dependent variable of life expectancy (the average age one attains). Over some period of time, scientists will control the vitamin C intake in a substantial group of people. One part of the group will be given a daily high dose of vitamin C, and the remainder will be given a placebo pill (so that they are unaware of not belonging to the first group) without vitamin C. The scientists will investigate if there is any statistically significant difference in the life span of the people who took the high dose and those who took the placebo (no dose). The goal is to see if the independent variable of high vitamin C dosage has a correlation with the dependent variable of people's life span. The designation independent/dependent is clear in this case, because if a correlation is found, it cannot be that life span has influenced vitamin C intake, but an influence in the other direction is possible. = Samples and summary statistics = Typically in experiments, one will measure a variable some number of times to collect a '''sample'''. For instance, if we were interested in the heights of human adults, we might measure the heights of a million people from New York. This would give us a sample of different human adult heights. Sets of samples are hard to understand on their own---what can a million individual values tell us on their own? Summary statistics provide a way to capture the basic trends in a sample of data in a concise, informative way. Probably the most common summary statistic is a '''mean''': the mean of a set of numbers gives the ''average'' (intuitively, the typical value) of the sample. Another common measure is the '''variance''', which computes the variability in the sample. So the mean of the heights of everyone in New York state would capture the typical value of people's height and the variance of heights would tell you how much variability there is between people. [To come: a word on populations.] = Estimators = We can view measures like the mean and variance of a sample as '''estimators''' of some true, unknown property of the population you care about. So the mean computed on the sample ''estimates'' (or approximates) the true mean of the population. Since we usually want to make statements about the true state of the world (men are taller than women) rather than our sample (our sample of men is taller than our sample of women), its useful to think about using our sample to estimate or approximately measure some true property of the world. Estimates can be '''biased''' or '''unbiased'''. Biased estimators are ones which, intuitively, are expected to give a (perhaps slightly) wrong answer. Unbiased estimators are expected to give the correct answer. As an example, suppose you collected a sample of heights and for some reason threw out the shortest 10 people before computing the mean. The mean you compute will be a biased estimator of the true mean since it will tend to overestime people's typical height. But, as you get more and more people, the shortest 10 will matter less and less and so the amount of bias will decrease as you get more and more people. For computing variance (or standard deviation), you should remember to use the unbiased estimator of variance, which includes a ''N''-1 instead of an ''N'' in the denominator. = Within vs. Between Variables = |
|
Line 74: | Line 30: |
\emph{independent variable} (the thing you manipulate) is whether or not | '''independent variable''' (the thing you manipulate) is whether or not |
Line 78: | Line 34: |
=== Within vs. Across Subjects === | |
Line 103: | Line 60: |
== Samples and Summary Statistics == Typically in experiments, one will measure a variable some number of times to collect a '''sample'''. For instance, if we were interested in the heights of human adults, we might measure the heights of a million people from New York. This would give us a sample of different human adult heights. Sets of samples are hard to understand on their own---what can a million individual values tell us on their own? Summary statistics provide a way to capture the basic trends in a sample of data in a concise, informative way. Probably the most common summary statistic is a '''mean''': the mean of a set of numbers gives the ''average'' (intuitively, the typical value) of the sample. Another common measure is the '''variance''', which computes the variability in the sample. So the mean of the heights of everyone in New York state would capture the typical value of people's height and the variance of heights would tell you how much variability there is between people. [To come: a word on populations.] == Estimators == We can view measures like the mean and variance of a sample as '''estimators''' of some true, unknown property of the population you care about. So the mean computed on the sample ''estimates'' (or approximates) the true mean of the population. Since we usually want to make statements about the true state of the world (men are taller than women) rather than our sample (our sample of men is taller than our sample of women), its useful to think about using our sample to estimate or approximately measure some true property of the world. Estimates can be '''biased''' or '''unbiased'''. Biased estimators are ones which, intuitively, are expected to give a (perhaps slightly) wrong answer. Unbiased estimators are expected to give the correct answer. As an example, suppose you collected a sample of heights and for some reason threw out the shortest 10 people before computing the mean. The mean you compute will be a biased estimator of the true mean since it will tend to overestimate people's typical height. But, as you get more and more people, the shortest 10 will matter less and less and so the amount of bias will decrease as you get more and more people. For computing variance (or standard deviation), you should remember to use the unbiased estimator of variance, which includes a ''N-1'' instead of an ''N'' in the denominator. = Choose the Right Statistical Test = The following table may be helpful in ensuring you choose the correct statistical test for your experimental data. ||'''Goal'''||Measurement (from Gaussian Population)||Rank, Score, or Measurement (from Non-Gaussian Population)||Binomial/Binary (Two Possible Outcomes)|| ||Describe One Group||Mean, SD||Median, interquartile range||Proportion|| ||Compare One Group to Hypothetical Value||One-sample t-test||Wilcoxon test||Chi-square or Binomial test|| ||Compare Two Unpaired Groups||Unpaired t-test||Mann-Whitney test||Fisher's test (Chi-square for large samples)|| ||Compare Two Paired Groups||Paired t-test||Wilcocon test||McNemar's test|| ||Compare Three or More Unmatched Groups||One-way ANOVA||Kruskal-Wallis test||Chi-square test|| ||Compare Three or More Matched Groups||Repeated-measures ANOVA||Friedman test||Cochrane Q|| ||Quantify Association Between Two Variables||Pearson correlation||Spearman correlation||Contingency coefficients|| ||Predict Value from Another Measured Variable||Linear Regression or Nonlinear Regression||Nonparametric Regression||Logistic Regression|| ||Predict Value from Several Measured or Binomial Variables||Multiple Linear Regression or Multiple Nonlinear Regression|| ||Multiple Logistic Regression|| = Interpret a Significant Result = Statistical significance means something very specific in experimental work: an effect is significant if the test statistic you find is very unlikely to have occurred under the null hypothesis. For instance, if you run a t-test and find a t-value of 5.8, this is extremely unlikely to occur when the null hypothesis (no difference in means) is true. The interpretation of this is that the null hypothesis is unlikely to be correct. The p-value measures what proportion of the time the null hypothesis will generate a test statistic at least as large as the one you see. So if the p-value is 0.05, it means that 5% of the time---1 in 20 times---the null hypothesis will generate a test statistic at least as large as the one you see. So in that sense, the p value provides an intuitive measure for how unlikely the null hypothesis is to have generated data like the data you observe. But be careful--it is ''possible'' that the null hypothesis is right; is is just statistically unlikely. The term '''statistical significant''' does ''not'' mean that the result is significant in the sense of being important. A result can be statistically significant (the test statistic is unlikely under the null) but really not be that important. For instance, people's attractiveness might have a statistically significant effect on income, but this effect might not be that important if income is primarily determined by other factors like type of job and education level. |
Experimental Design Basics
Variables
Variables in statistics are a lot like variables in algebra or calculus, except that they are typically thought of as having a distribution of values. For instance, a variable might describe someone's reaction time to a stimulus, but this variable it will be different each time it is measured. So a variable like reaction time can be understood as a distribution---a collection of all possible reaction times and how likely each is---rather than a single value.
Continuous vs. Discrete
Variables in experiments can be continuous or discrete. In psychology experiments, continuous variables are almost always real numbers (e.g. people's height, reaction time). Discrete variables are ones that do not vary continuously. Typical examples in psychology experiments take on only a finite number of values, for instance in measures like level of education, number of children, or correct/incorrect. However, it is possible that discrete variables give a potentially infinite number of values, as in counts of how neuron spikes occur in a given amount of time.
Dependent vs. Independent
The terms dependent variable and independent variable are used to distinguish between two types of quantifiable factors being considered in an experiment. In simple terms, the independent variable is typically the variable representing the value being manipulated or changed and the dependent variable is the observed result of the independent variable being manipulated.
In simple experiments, you manipulate one variable and measure another (hopefully while holding everything else constant). For instance, you might measure people's performance on a test (i) when classical music is playing versus (ii) when no music is playing. Here, the independent variable (the thing you manipulate) is whether or not music is play (i/ii) and the dependent variable (the thing you care about) is performance on the test.
Within vs. Across Subjects
It's important to realize that independent variables can be either within subjects or across subjects. When an independent variable is within subjects, it means that you have measured each subject with each level of the variable---for instance, if you each participant took the test with and without music. An across subjects design would give each participant only one test, either with or without music.
In general, within subject designs are more likely to find effects because they control for additional noise---in this case, each subject's typical ability to answer questions on the test. Here is another example of why within designs are more powerful: suppose you were trying to determine if 4th graders were taller than 3rd graders. If you took a sample of a typical 4th grade and a typical 3rd grade class, you would see a highly overlapping distribution of heights. That is, each classes' heights would be very variable and there would not be much difference in the mean heights, so it would take a lot of samples to find a difference. But suppose you performed the within subjects experiment. You could take 3rd graders and measure their heights, and then waited a year until they were 4th graders, and measured their heights again. Everyone would have grown and you could very easily find a significant result by comparing each child's heights in 3rd and 4th grade. This is a within subjects design because each subject is measured twice, once in each condition (3rd vs. 4th grade), and is clearly much more likely to find an effect that is real.
Samples and Summary Statistics
Typically in experiments, one will measure a variable some number of times to collect a sample. For instance, if we were interested in the heights of human adults, we might measure the heights of a million people from New York. This would give us a sample of different human adult heights. Sets of samples are hard to understand on their own---what can a million individual values tell us on their own?
Summary statistics provide a way to capture the basic trends in a sample of data in a concise, informative way. Probably the most common summary statistic is a mean: the mean of a set of numbers gives the average (intuitively, the typical value) of the sample. Another common measure is the variance, which computes the variability in the sample. So the mean of the heights of everyone in New York state would capture the typical value of people's height and the variance of heights would tell you how much variability there is between people.
[To come: a word on populations.]
Estimators
We can view measures like the mean and variance of a sample as estimators of some true, unknown property of the population you care about. So the mean computed on the sample estimates (or approximates) the true mean of the population. Since we usually want to make statements about the true state of the world (men are taller than women) rather than our sample (our sample of men is taller than our sample of women), its useful to think about using our sample to estimate or approximately measure some true property of the world.
Estimates can be biased or unbiased. Biased estimators are ones which, intuitively, are expected to give a (perhaps slightly) wrong answer. Unbiased estimators are expected to give the correct answer. As an example, suppose you collected a sample of heights and for some reason threw out the shortest 10 people before computing the mean. The mean you compute will be a biased estimator of the true mean since it will tend to overestimate people's typical height. But, as you get more and more people, the shortest 10 will matter less and less and so the amount of bias will decrease as you get more and more people.
For computing variance (or standard deviation), you should remember to use the unbiased estimator of variance, which includes a N-1 instead of an N in the denominator.
Choose the Right Statistical Test
The following table may be helpful in ensuring you choose the correct statistical test for your experimental data.
Goal |
Measurement (from Gaussian Population) |
Rank, Score, or Measurement (from Non-Gaussian Population) |
Binomial/Binary (Two Possible Outcomes) |
Describe One Group |
Mean, SD |
Median, interquartile range |
Proportion |
Compare One Group to Hypothetical Value |
One-sample t-test |
Wilcoxon test |
Chi-square or Binomial test |
Compare Two Unpaired Groups |
Unpaired t-test |
Mann-Whitney test |
Fisher's test (Chi-square for large samples) |
Compare Two Paired Groups |
Paired t-test |
Wilcocon test |
McNemar's test |
Compare Three or More Unmatched Groups |
One-way ANOVA |
Kruskal-Wallis test |
Chi-square test |
Compare Three or More Matched Groups |
Repeated-measures ANOVA |
Friedman test |
Cochrane Q |
Quantify Association Between Two Variables |
Pearson correlation |
Spearman correlation |
Contingency coefficients |
Predict Value from Another Measured Variable |
Linear Regression or Nonlinear Regression |
Nonparametric Regression |
Logistic Regression |
Predict Value from Several Measured or Binomial Variables |
Multiple Linear Regression or Multiple Nonlinear Regression |
|
Multiple Logistic Regression |
Interpret a Significant Result
Statistical significance means something very specific in experimental work: an effect is significant if the test statistic you find is very unlikely to have occurred under the null hypothesis. For instance, if you run a t-test and find a t-value of 5.8, this is extremely unlikely to occur when the null hypothesis (no difference in means) is true. The interpretation of this is that the null hypothesis is unlikely to be correct. The p-value measures what proportion of the time the null hypothesis will generate a test statistic at least as large as the one you see. So if the p-value is 0.05, it means that 5% of the time---1 in 20 times---the null hypothesis will generate a test statistic at least as large as the one you see. So in that sense, the p value provides an intuitive measure for how unlikely the null hypothesis is to have generated data like the data you observe. But be careful--it is possible that the null hypothesis is right; is is just statistically unlikely.
The term statistical significant does not mean that the result is significant in the sense of being important. A result can be statistically significant (the test statistic is unlikely under the null) but really not be that important. For instance, people's attractiveness might have a statistically significant effect on income, but this effect might not be that important if income is primarily determined by other factors like type of job and education level.