Differences between revisions 9 and 54 (spanning 45 versions)

Experimental Design Basics

Random Variables

Random variables in statistics are a lot like variables in algebra or calculus, except that they represent things that take on random values when sampled from the world. For instance, a variable might describe someone's reaction time to a stimulus, but the value of this variable will be different each time it is measured. A random variable like reaction time can be understood as a distribution---a collection of all possible reaction times and how likely each is---rather than a single value.

Continuous vs. discrete

Random variables in experiments can be continuous or discrete. In psychology experiments, continuous variables are almost always real numbers (e.g. people's height, reaction time). For continuous variables, the number of possible values it can have between any two given values is infinite; thus, one cannot enumerate their values in order. Discrete variables are variables whose values can be enumerated. Two important types of discrete variable are ordinal variables, often represented by integer values (like number of children in a family) and categorical variables, whose values represent category membership (like type of car or employment status). Most discrete variables used in psychology experiments take on only a finite number of values, for instance in measures like level of education, number of children, or correct/incorrect. However, it is possible that discrete variables have a potentially infinite number of values, as in the number of quanta of light emitted by a light source in a given period of time.

Representing Probabilities

Probability distributions

A discrete random variable X is characterized by a probability distribution, P(X=x). For each possible value x, P(X=x) specifies the probability that the value x will occur or be observed. A simple way to think of this is that observations of X are equivalent to random draws from a hat containing balls labeled with the different possible values of X (the population of X). For any given value x, P(X=x) represents the proportion of balls that have the label x. The sum of P(X=x) over all possible values x is 1. To compute the probability that an observation of X will lie in any particular range (e.g. P(X > 1) or P(-10 < X < 0)), one needs merely sum up the values of P(X=x) for all values x within the specified range.

Probability density functions

A continuous random variable X is characterized by a probability density function, p(x). Technically, p(x) does not represent the probability of X being equal to a particular value x (hence it is not written as p(X=x). This is because the probability of a continuous random variable having any specific value x is actually 0. For continuous variables one can only specify the probability that it will be within a specified range of values; for example, the probability that X > 1 or that -1 < X < 0. p(x) is a function that allows us to calculate these probabilities, specifically, by calculating the area under p(x) for the range of values specified. For example, the probability that any particular observation of X is greater than 1, P(X > 1), is the area under p(x) between 1 and infinity. The probability that X is between -1 and 0, P( -1 < X < 0), is the area under p(x) between -1 and 1. A more intuitive way to think of p(x) is that it specifies the probability that X has a value within a small neighborhood of x.

Populations and Samples

A population contains all possible instances of a random variable, with the number of occurrences of each instance of the random variable proportional to its probability of occurring. If a random variable X represents the weights of American males, the population of X contains the weights of all American males. One can think of the population of X as the hat from which random observations of X are drawn. A sample is a collection of particular observations of X. Experimental samples are always sets containing a finite number of samples of X. In an experiment designed to estimate the average reading score of 2nd graders in American public schools, one might sample the scores of 16 randomly chosen 2nd grade students from American public schools. If we let X represent reading scores of 2nd grade students in American public schools, the 16 measured scores would be a sample of X. The population of X would be the reading scores of all 2nd grade students in American public schools.

Summary statistics

Summary statistics provide a way to capture the basic trends in a population or in a sample of data in a concise, informative way. Probably the most common summary statistic is a mean: the mean of a set of numbers gives the average (intuitively, the typical value) of the sample. Another common measure is the variance, which computes the variability in the sample. So the mean of the heights of everyone in New York would capture the typical value of New Yorker's height and the variance of heights would tell you how much variability there is between New Yorkers. Population statistics represent the trends in an entire population, while sample statistics represent the trends in a sample.

Estimators

We can view sample statistics like the mean and variance of a sample as estimators of the true, unknown population statistics you care about. So the mean computed on the sample estimates (or approximates) the true mean of the population. Since we usually want to make statements about the true state of the world (men are taller than women) rather than our sample (our sample of men is taller than our sample of women), it's useful to think about using our sample to estimate some true property of the world.

Estimates can be biased or unbiased. Biased estimators are ones which, intuitively, are expected to give a (perhaps slightly) wrong answer. Unbiased estimators are expected to give the correct answer. As an example, suppose you collected a sample of heights and for some reason threw out the shortest 10 people before computing the mean. The mean you compute will be a biased estimator of the true mean since it will tend to overestimate people's typical height. But, as you get more and more people, the shortest 10 will matter less and less and so the amount of bias will decrease as your sample size increases.

For computing variance (or standard deviation), you should remember to use the unbiased estimator of variance, which uses N-1 instead of N in the denominator.

Dependent vs. Independent vs. Explanatory Variables

The terms dependent variable and independent variable are used to distinguish between two types of quantifiable factors being considered in an experiment. In simple terms, the independent variable is typically the variable being manipulated (hopefully while holding everything else constant), and the dependent variable is the observed result of that manipulation.

For example, you might measure people's performance on a test (i) when classical music is playing versus (ii) when no music is playing. Here, the independent variable (the thing you manipulate) is whether or not music is played (i/ii) and the dependent variable (the thing you measure) is performance on the test.

Of course, some studies do not involve manipulating a variable, but still measure how well one variable predicts another. These studies typically use some form of regression that estimates what function best predicts a variable Y from a variable X (simple linear regression finds a function of the form Y = aX + b). In such studies, the variable being predicted (Y) is still referred to as the dependent variable, but the variable you are using to predict it is referred to as the explanatory variable. For example, consider a study that explores how natural variations in sleep relate to scores on the quantitative section of the SAT. Since the question is whether hours of sleep (X) predict SAT score (Y), the explanatory variable is sleep and the dependent variable is the SAT score.

Within- vs. Between-Subjects Experimental Designs

Controlled experiments typically manipulate the value of an independent variable to measure the effect of the independent variable on a dependent variable. This can be done in one of two broadly different ways. In a within-subjects design, measurements of the dependent variable are made for each subject at all levels of the independent variable. For example, an experiment looking at the effect of sleep on cognitive performance might measure each of 10 subjects' scores on a cognitive test after a night without sleep and then again after a night with sleep. This gives a score for each subject for each of two values of the independent variable (sleep / no sleep). In a between-subjects design, measurements of the dependent variable are made for separate groups of subjects, with each group being assigned a single value of the independent variable. A between-subjects design for our sleep study would assign subjects randomly to one of two groups. The first group would not sleep the night before taking the cognitive test and the second group would sleep the night before the test. The appropriate statistical tests for an effect of sleep on test scores would be different for the two designs. For the within-subjects study, one would compute the average change in test scores within each subject across the two test conditions and test whether it was significantly different form 0. For the between-subjects study, one would compute the average test score within each group and test whether the two averages were significantly different.

One might well ask when one should use one or the other experimental design. This can be determined by many factors, but the most important to consider is which design is most likely to uncover an effect that is there. This in turn depends on how variable your data is likely to be - the more variance, the less sensitive the experiment. A disadvantage of between subjects designs is that one often finds large inter-subject variability that can mask small effects of the independent variable. The within-subjects design solves this by measuring the effect within each subject. In the sleep study, for example, test scores will depend on subject IQ, education level, motivation, etc., so one expects high variability between scores within each group of subjects in the between-subjects design. In the within-subjects design, however, one might expect to see more consistent differences between scores across sleep conditions within subjects; that is, one might expect that while overall scores will vary quite a bit between subjects, how they differ across sleep conditions will be much more consistent.

While this would seem to suggest always using a within-subjects design, one has to take care. Within-subjects designs almost always have at least one confounding variable - the order in which subjects are tested on different levels of the independent variable. This has to be controlled for by counter-balancing the order with which subjects are tested on each level of the independent variable; that is, insuring that all possible orders are tested equally often (half the subjects run in the sleep condition first and half in the no sleep condition first). This manipulation can itself create a lot of variability in the data; for example, motivation may vary greatly between the first and second day of the experiment adding to variability in the difference scores for subjects in a counterbalanced study.

Choose the Right Statistical Test

In statistical inferences, the "true" populations that you care about are usually assumed to be distributed in a particular way. The most common assumption is that data items in the populations are distributed according to a normal distribution (i.e. the bell curve, also referred to as a Gaussian distribution). Nearly all statistical tests taught in an undergrad statistics class require this assumption to hold. When this assumption does not hold (for example, when your data are not interval variables, you need to think twice about which test to use), another set of tests, which are referred to as non-parametric tests, should be used. Although in most BCS lab courses it is rare to encounter a situation where the normality assumption does not hold, keep in mind that the t-test, for instance, does not apply everywhere. The following table may be helpful in ensuring you choose the correct statistical test for your experimental data.

	Data Type
Goal	Measurement (from Gaussian Population)	Rank, Score, or Measurement (from Non-Gaussian Population)	Binomial/Binary (Two Possible Outcomes)
Describe One Group	Mean, SD	Median, interquartile range	Proportion
Compare One Group to Hypothetical Value	Z-test, Single-sample t-test	Wilcoxon test	Chi-square test or Binomial est
Compare Two Independent Groups	Unpaired (independent group) t-test	Mann-Whitney test	Fisher's test (Chi-square for large samples)
Compare Two (Paired) Measurements from Same Group	Paired (repeated-measures) t-test	Wilcoxon test	McNemar's test
Compare Three or More Independent Groups	One-way ANOVA	Kruskal-Wallis test	Chi-square test
Compare Three or More Measurements from Same Group	Repeated-measures ANOVA	Friedman test	Cochrane Q
Quantify Association Between Two Variables	Pearson correlation	Spearman correlation	Contingency coefficients
Predict Value from Another Measured Variable	Linear regression or Nonlinear regression	Nonparametric regression	Logistic regression
Compare Three or More Independent Groups with Two Variables	Two-way ANOVA
Predict Value from Several Measured or Binomial Variables	Multiple regression		Multiple logistic regression

Interpreting the Statistical Significance of a Result

Statistical significance means something very specific in experimental work: an effect is significant if the test statistic you find is very unlikely to have occurred under the null hypothesis. For instance, if you run a t-test and find a t-value of 5.8, this is extremely unlikely to occur when the null hypothesis (no difference in means) is true. The interpretation of this is that the null hypothesis is unlikely to be correct. The p-value measures what proportion of the time the null hypothesis will generate a test statistic at least as large as the one you see. So if the p-value is 0.05, it means that 5% of the time---1 in 20 times---the null hypothesis will generate a test statistic at least as large as the one you see. So in that sense, the p-value provides an intuitive measure for how unlikely the null hypothesis is to have generated data like the data you observe. But be careful--it is possible that the null hypothesis is actually true; it is just statistically unlikely.

When a statistical test for an effect is found to be not significant; this means that the probability of getting your data given that the null hypothesis is true is higher than the p-value used for the test (usually .05). It does not, in fact, mean that the null hypothesis is likely to be true. The p-value, in fact, tells you little about the probability of the null hypothesis being true - it just tells you the probability of getting data like yours if the null hypothesis were true. When the p-value is lower than some fiducial value like .05, one rejects that null hypothesis on the grounds that you were unlikely to obtain your data if it were true. There are many reasons for getting a p-value greater than .05. It is quite possible that your data were simply too variable to detect an effect that is really there; thus, you should NEVER infer that the null hypothesis is true if your statistical test fails to find significance. In that case, you are forced to remain agnostic about the null hypothesis.

The term statistically significant also does not mean that the result is significant in the sense of being important. A result can be statistically significant (the test statistic is unlikely under the null) but really not be that important. For instance, you might find a statistically significant difference in weight loss between two diet drugs, but the difference is only, on average, half a pound. This may or may not be worth the cost of the better drug or may be counter-balanced by the side-effects of the better drug.

Go back to the Homepage

-  ⇤ ← Revision 9 as of 2011-09-19 16:34:56 → 
  Size: 6127
  Editor: CelesteKidd
  Comment:
+   ← Revision 54 as of 2012-01-23 18:46:22 → ⇥
  Size: 17514
  Editor: KathyNordeen
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 2:
-== Variables ==
Variables in statistics are a lot like variables in algebra or calculus,
except that they are typically thought of as having a
'''distribution''' of values. For instance, a variable might describe
someone's reaction time to a stimulus, but this variable it will be
different each time it is measured. So a variable like ''reaction time''
+== Random Variables ==
Random variables in statistics are a lot like variables in algebra or calculus,
except that they represent things that take on random values when sampled from the world. 
For instance, a variable might describe
someone's reaction time to a stimulus, but the value of this variable will be
different each time it is measured. A random variable like ''reaction time''
-Line 11:
+Line 12:
-=== Continuous vs. Discrete ===
Variables in experiments can be '''continuous''' or '''discrete'''. In
+=== Continuous vs. discrete ===
Random variables in experiments can be '''continuous''' or '''discrete'''. In
-Line 14:
+Line 15:
-numbers (e.g. people's height, reaction time). Discrete variables are
ones that do not vary continuously. Typical examples in psychology
+numbers (e.g. people's height, reaction time). For continuous variables, the number of possible values
it can have between any two given values is infinite; thus, one cannot enumerate their values in order. Discrete variables are
variables whose values can be enumerated. Two important types of discrete variable are ordinal variables, often represented by integer values
(like number of children in a family) and categorical variables, whose values represent category membership (like type of car or employment status). Most discrete variables used in psychology
-Line 18:
+Line 21:
-correct/incorrect. However, it is possible that discrete variables give
a potentially infinite number of values, as in counts of how neuron
spikes occur in a given amount of time.
+correct/incorrect. However, it is possible that discrete variables have
a potentially infinite number of values, as in the number of quanta of light emitted by a light source in a given period of time.
-Line 22:
+Line 24:
-=== Dependent vs. Independent ===
The terms '''dependent variable''' and '''independent variable''' are used to distinguish between two types of quantifiable factors being considered in an experiment. In simple terms, the independent variable is typically the variable representing the value being manipulated or changed and the dependent variable is the observed result of the independent variable being manipulated.
+== Representing Probabilities ==
-Line 25:
+Line 26:
-In simple experiments, you manipulate one variable and measure another
(hopefully while holding everything else constant). For instance, you
might measure people's performance on a test (i) when classical music is
playing versus (ii) when no music is playing. Here, the
'''independent variable''' (the thing you manipulate) is whether or not
music is play (i/ii) and the '''dependent variable''' (the thing you
care about) is performance on the test.
+=== Probability distributions ===
A discrete random variable ''X'' is characterized by a probability distribution, ''P(X=x)''. For each possible value ''x'', 
''P(X=x)'' specifies the probability that the value ''x'' will occur or be observed. A simple way to think of this is that observations of ''X'' are equivalent to random draws from a hat containing balls labeled with the different possible values of ''X'' (the population of ''X''). For any given value ''x'', ''P(X=x)'' represents the proportion of balls that have the label ''x''. The sum of ''P(X=x)'' over all possible values ''x'' is 1. To compute the probability that an observation of ''X'' will lie in any particular range (e.g. ''P(X > 1)'' or ''P(-10 < X < 0)''), one needs merely sum up the values of ''P(X=x)'' for all values ''x'' within the specified range.
-Line 33:
+Line 30:
-=== Within vs. Across Subjects ===
It's important to realize that independent variables can be either
''within'' subjects or ''across'' subjects. When an independent
variable is within subjects, it means that you have measured each
subject with each level of the variable---for instance, if you each
participant took the test with and without music. An across subjects
design would give each participant only one test, either with or without
music.
+=== Probability density functions ===
A continuous random variable ''X'' is characterized by a probability density function, ''p(x)''. Technically, ''p(x)'' does not represent the probability of ''X''
being equal to a particular value ''x'' (hence it is not written as ''p(X=x)''. This is because the probability of a continuous random variable having any specific value ''x'' is actually 0. For continuous variables one can only specify the probability that it will be within a specified range of values; for example, the probability that ''X > 1'' or that ''-1 < X < 0''. ''p(x)'' is a function that allows us to calculate these probabilities, specifically, by calculating the area under ''p(x)'' for the range of values specified. For example, the probability that any particular observation of ''X'' is greater than 1, ''P(X > 1)'', is the area under ''p(x)'' between 1 and infinity. The probability that ''X'' is between -1 and 0, ''P( -1 < X < 0)'', is the area under ''p(x)'' between -1 and 1. A more intuitive way to think of ''p(x)'' is that it specifies the probability that ''X'' has a value within a small neighborhood of ''x''.
-Line 42:
+Line 34:
-In general, within subject designs are more likely to find effects
because they control for additional noise---in this case, each subject's
typical ability to answer questions on the test. Here is another example
of why within designs are more powerful: suppose you were trying to
determine if 4th graders were taller than 3rd graders. If you took a
sample of a typical 4th grade and a typical 3rd grade class, you would
see a highly overlapping distribution of heights. That is, each classes'
heights would be very variable and there would not be much difference in
the mean heights, so it would take a lot of samples to find a
difference. But suppose you performed the within subjects experiment.
You could take 3rd graders and measure their heights, and then waited a
year until they were 4th graders, and measured their heights again.
Everyone would have grown and you could very easily find a significant
result by comparing each child's heights in 3rd and 4th grade. This is a
within subjects design because each subject is measured twice, once in
each condition (3rd vs. 4th grade), and is clearly much more likely to
find an effect that is real.
+== Populations and Samples ==
A '''population''' contains all possible instances of a random variable, with the number of occurrences of each instance of the random variable
proportional to its probability of occurring. If a random variable ''X'' represents the weights of American males, the population of ''X'' contains the weights of all American males. One can think of the population of ''X'' as the hat from which random observations of ''X'' are drawn. A '''sample''' is a collection of particular observations of ''X''. Experimental samples are always sets containing a finite number of samples of ''X''. In an experiment designed to estimate the average reading score of 2nd graders in American public schools, one might sample the scores of 16 randomly chosen 2nd grade students from American public schools. If we let ''X'' represent reading scores of 2nd grade students in American public schools, the 16 measured scores would be a sample of ''X''. The population of ''X'' would be the reading scores of all 2nd grade students in American public schools.
-Line 60:
+Line 38:
-== Samples and Summary Statistics ==
Typically in experiments, one will measure a variable some number of
times to collect a '''sample'''. For instance, if we were interested in
the heights of human adults, we might measure the heights of a million
people from New York. This would give us a sample of different human
adult heights. Sets of samples are hard to understand on their
own---what can a million individual values tell us on their own? 

Summary statistics provide a way to capture the basic trends in a sample
+=== Summary statistics ===
Summary statistics provide a way to capture the basic trends in a population or in a sample
-Line 73:
+Line 44:
-the sample. So the mean of the heights of everyone in New York state
would capture the typical value of people's height and the variance of
heights would tell you how much variability there is between people.
+the sample. So the mean of the heights of everyone in New York would capture the typical value of New Yorker's height and the variance of
heights would tell you how much variability there is between New Yorkers. '''Population statistics''' represent the trends in an entire population, while '''sample statistics''' represent the trends in a sample.
-Line 77:
+Line 47:
-[To come: a word on populations.]

== Estimators ==
We can view measures like the mean and variance of a sample as
'''estimators''' of some true, unknown property of the population you
+=== Estimators ===
We can view sample statistics like the mean and variance of a sample as
'''estimators''' of the true, unknown population statistics you
-Line 86:
+Line 54:
-sample of women), its useful to think about using our sample to estimate
or approximately measure some true property of the world.
+sample of women), it's useful to think about using our sample to estimate some true property of the world.
-Line 97:
+Line 64:
-amount of bias will decrease as you get more and more people.
+amount of bias will decrease as your sample size increases.
-Line 100:
+Line 67:
-use the unbiased estimator of variance, which includes a ''N-1'' instead of
an ''N'' in the denominator.
+use the unbiased estimator of variance, which uses ''N-1'' instead of ''N'' in the denominator. 

=== Dependent vs. Independent vs. Explanatory Variables ===
The terms '''dependent variable''' and '''independent variable''' are used to distinguish between two types of quantifiable factors being considered in an experiment. In simple terms, the independent variable is typically the variable being manipulated (hopefully while holding everything else constant), and the dependent variable is the observed result of that manipulation.

For example, you
might measure people's performance on a test (i) when classical music is
playing versus (ii) when no music is playing. Here, the
'''independent variable''' (the thing you manipulate) is whether or not
music is played (i/ii) and the '''dependent variable''' (the thing you
measure) is performance on the test. 

Of course, some studies do not involve manipulating a variable, but still measure how well one variable predicts another. These studies typically use some form of regression that estimates what function best predicts a variable Y from a variable X (simple linear regression finds a function of the form ''Y = aX + b''). In such studies, the variable being predicted (Y) is still referred to as the dependent variable, but the variable you are using to predict it is referred to as the '''explanatory variable'''. For example, consider a study that explores how natural variations in sleep relate to scores on the quantitative section of the SAT.  Since the question is whether hours of sleep (X) predict SAT score (Y), the explanatory variable is sleep and the dependent variable is the SAT score.

== Within- vs. Between-Subjects Experimental Designs ==

Controlled experiments typically manipulate the value of an independent variable to measure the effect of the independent
variable on a dependent variable. This can be done in one of two broadly different ways. In a within-subjects design, measurements of the dependent variable are made for each subject at all levels of the independent variable. For example, an experiment looking at the effect of sleep on cognitive performance might measure each of 10 subjects' scores on a cognitive test after a night without sleep and then again after a night with sleep. This gives a score for each subject for each of two values of the independent variable (sleep / no sleep). In a between-subjects design, measurements of the dependent variable are made for separate groups of subjects, with each group being assigned a single value of the independent variable. A between-subjects design for our sleep study would assign subjects randomly to one of two groups. The first group would not sleep the night before taking the cognitive test and the second group would sleep the night before the test. The appropriate statistical tests for an effect of sleep on test scores would be different for the two designs. For the within-subjects study, one would compute the average change in test scores within each subject across the two test conditions and test whether it was significantly different form 0. For the between-subjects study, one would compute the average test score within each group and test whether the two averages were significantly different.

One might well ask when one should use one or the other experimental design. This can be determined by many factors, but the most important to consider
is which design is most likely to uncover an effect that is there. This in turn depends on how variable your data is likely to be - the more variance, the less sensitive the experiment. A disadvantage of between subjects designs is that one often finds large inter-subject variability that can mask small effects of the independent variable. The within-subjects design solves this by measuring the effect within each subject. In the sleep study, for example, test scores will depend on subject IQ, education level, motivation, etc., so one expects high variability between scores within each group of subjects in the between-subjects design. In the within-subjects design, however, one might expect to see more consistent differences between scores across sleep conditions within subjects; that is, one might expect that while overall scores will vary quite a bit between subjects, how they differ across sleep conditions will be much more consistent. 

While this would seem to suggest always using a within-subjects design, one has to take care. Within-subjects designs almost always have at least one confounding variable - the order in which subjects are tested on different levels of the independent variable. This has to be controlled for by '''counter-balancing''' the order with which subjects are tested on each level of the independent variable; that is, insuring that all possible orders are tested equally often (half the subjects run in the sleep condition first and half in the no sleep condition first). This manipulation can itself create a lot of variability in the data; for example, motivation may vary greatly between the first and second day of the experiment adding to variability in the difference scores for subjects in a counterbalanced study.

= Choose the Right Statistical Test =

In statistical inferences, the "true" populations that you care about are usually assumed to be distributed in a particular way. The most common assumption is that data items in the populations are distributed according to a normal distribution (i.e. the bell curve, also referred to as a Gaussian distribution). Nearly all statistical tests taught in an undergrad statistics class require this assumption to hold. When this assumption does not hold (for example, when your data are not interval variables, you need to think twice about which test to use), another set of tests, which are referred to as non-parametric tests, should be used. Although in most BCS lab courses it is rare to encounter a situation where the normality assumption does not hold, keep in mind that the t-test, for instance, does not apply everywhere. The following table may be helpful in ensuring you choose the correct statistical test for your experimental data.


|| ||||||'''Data Type'''||
||'''Goal'''||Measurement (from Gaussian Population)||Rank, Score, or Measurement (from Non-Gaussian Population)||Binomial/Binary (Two Possible Outcomes)||
||Describe One Group||''Mean'', ''SD''||''Median'', ''interquartile range''||''Proportion''||
||Compare One Group to Hypothetical Value||''[[OneSampleOneVariable#Z-test|Z-test]]'', ''[[OneSampleOneVariable#Single-Sample t-test|Single-sample t-test]]''||''Wilcoxon test''||''Chi-square test'' or ''Binomial est''||
||Compare Two Independent Groups||''[[TwoSamplesOneVariable#Unpaired (Independent Group) t-test|Unpaired (independent group) t-test]]''||''Mann-Whitney test''||''Fisher's test'' (Chi-square for large samples)||
||Compare Two (Paired) Measurements from Same Group||''[[TwoSamplesOneVariable#Paired (Repeated-Measures) t-test|Paired (repeated-measures) t-test]]''||''Wilcoxon test''||''McNemar's test''||
||Compare Three or More Independent Groups||''[[MultipleSamplesOneVariable#One-Way ANOVA|One-way ANOVA]]''||''Kruskal-Wallis test''||''Chi-square test''||
||Compare Three or More Measurements from Same Group||''[[MultipleSamplesOneVariable#Repeated-measures ANOVA|Repeated-measures ANOVA]]''||''Friedman test''||''Cochrane Q''||
||Quantify Association Between Two Variables||''[[TwoVariables#Pearson r Correlation|Pearson correlation]]''||''Spearman correlation''||''Contingency coefficients''||
||Predict Value from Another Measured Variable||''[[TwoVariables#Linear Regression|Linear regression]]'' or ''Nonlinear regression''||''Nonparametric regression''||''Logistic regression''||
||Compare Three or More Independent Groups with Two Variables||[[MoreThanTwoVariables#Two-Way ANOVA|Two-way ANOVA]]|| || ||
||Predict Value from Several Measured or Binomial Variables||[[MoreThanTwoVariables#Multiple Regression|Multiple regression]]|| ||''Multiple logistic regression''||

= Interpreting the Statistical Significance of a Result =
Statistical significance means something very specific in experimental
work: an effect is significant if the test statistic you find is very
unlikely to have occurred under the null hypothesis. For instance, if you
run a t-test and find a t-value of 5.8, this is extremely unlikely to
occur when the null hypothesis (no difference in means) is true. The
interpretation of this is that the null hypothesis is unlikely to be
correct.  The p-value measures what proportion of the time the null
hypothesis will generate a test statistic at least as large as the one
you see. So if the p-value is 0.05, it means that 5% of the time---1 in 20 times---the null hypothesis will generate a test statistic at least as large as the one you see. So in that sense, the p-value provides an intuitive measure for how unlikely the null hypothesis is to have generated data like the data you observe. But be careful--it is possible that the null hypothesis is actually true; it is just statistically unlikely.

When a statistical test for an effect is found to be not significant; this means that the probability of getting your data given that the null hypothesis is true is higher than the p-value used for the test (usually .05). It does not, in fact, mean that the null hypothesis is likely to be true. The p-value, in fact, tells you little about the probability of the null hypothesis being true - it just tells you the probability of getting data like yours if the null hypothesis were true. When the p-value is lower than some fiducial value like .05, one rejects that null hypothesis on the grounds that you were unlikely to obtain your data if it were true. There are many reasons for getting a p-value greater than .05. It is quite possible that your data were simply too variable to detect an effect that is really there; thus, you should '''NEVER''' infer that the null hypothesis is true if your statistical test fails to find significance. In that case, you are forced to remain agnostic about the null hypothesis.

The term '''statistically significant''' also does not mean that the
result is significant in the sense of being important. A result can
be statistically significant (the test statistic is unlikely under the
null) but really not be that important. For instance, you might find a statistically significant difference in weight loss between two diet drugs, but the difference is only, on average, half a pound. This may or may not be worth the cost of the better drug or may be counter-balanced by the side-effects of the better drug.  

[[FrontPage|Go back to the Homepage]]