Statistical Tests for Experiments with Two Samples
A common experimental design is taking measurements from both a test group and a control group. For example, we may want to test the effectiveness of a new drug. A scientifically reasonable way of conducting an experiment is to give the drug to the test group while giving the placebo to the control group. If there is a significant difference between the test group and the control group, we can conclude the drug is at least statistically affecting patients' well-being.
An important concept in experiments with two groups is "repeated measure design" or "paired design". A paired design refers to studies where two or more sets of measurements are taken from a single group of subjects under different conditions. Suppose we want to study the effect of sleep on memory. Having a test group of subjects who sleep normally and another group of subjects who are deprived of sleep does not reveal much about the target effect. Any observed difference may be due to the inherent difference in memory performance between these two groups. A repeated measure design is recommended here so that we can measure memory performance before and after sleep deprivation on the same group of subjects.
In the following, we illustrate how to conduct statistical tests in the unpaired design (i.e. test group vs. control group) and in the paired design.
Unpaired (Independent Group) t-test
The unpaired t-test is applicable when the experimental group and the control group consist of different pools of subjects. Consider the following example:
Example Problem
A physiologist has conducted an experiment to evaluate the effect of hormone X on male sexual behavior. Ten rats were injected with hormone X and ten other rats received a placebo injection. Each rat was then housed with a female rat for 20 minutes, and the number of times each male mounted the female was counted. The test group had a sample mean number of mounts = 8.4 and a sample std. deviation = 6.197. The placebo group had a sample mean number of mounts = 5.6 and a sample std. deviation= 5.139.
Let's first identify the null hypothesis: The hormone has no effect on the number of times male rats mount a female during a 20 minute period. Note that we have 20 rats here divided into two groups. Such a design is an independent group design and an unpaired t-test is appropriate.
Solve the Problem by Hand
Now we test for a significant change in the mating rate of rats under the treatment of the hormone using an = 0.05 significance level.
We define D0 as the difference in population means between the two conditions. The null hypothesis states that this value is 0:
H0: The hormone has no effect on the mating rate of rats (D0 = 0).
Then, calculate the difference in means of the two samples:
and the standard error of the difference in means can be computed from the variance of each sample and :
Finally, we calculate the tobt. The formula is as follows:
By looking up in the t distribution table, we find the 2-tailed t-critical value at df = 2N − 2 = 18 is 2.101. Since our obtained t-statistic is smaller than the critical value, we fail to reject the null hypothesis. The data do not support the hypothesis that hormone X has an effect on the frequency of male rat mounting behavior during a 20-minute period.
Note: the degrees of freedom for an independent group design is the number of subjects minus 2 (number of groups).
Paired (Repeated-Measures) t-test
The paired group t-test is used when the two sets of measurements are taken on the same group of subjects. Consider the following example.
Example Problem
You are interested in determining whether an experimental birth control pill has the side effect of changing blood pressure. You randomly sample ten women from the city in which you live. You give five of them a placebo for a month and then measure their blood pressure. Then you switch them to the birth control pill for a month and again measure their blood pressure. The other five women receive the same treatment except that they are given the birth control pill first for a month, followed by the placebo for a month. The blood pressure readings are shown here.
Subject No. |
Placebo pill |
Birth Control pill |
1 |
102 |
108 |
2 |
76 |
76 |
3 |
66 |
69 |
4 |
71 |
78 |
5 |
68 |
74 |
6 |
85 |
85 |
7 |
82 |
79 |
8 |
78 |
78 |
9 |
79 |
80 |
10 |
80 |
81 |
The first step, again, is to identify the null hypothesis: The birth control pill does not affect the blood pressure of women who are taking it (i.e. ). A two-tailed test should be used since an effect in either direction (i.e. whether the birth control pill increases or decreases blood pressure) would be of interest.
Solve the Problem by Hand
Now we test for a significant change in the blood pressure of women on birth control pills using an = 0.05 significance level.
First, calculate the mean blood pressure change of the sample:
Next, the standard deviation of the sample difference scores (this is done by taking the differences first and then calculate the standard deviation):
s = 3.281
Now, the standard error of the mean sD>> is:
Note that in the unpaired t-test, we calculated the standard error of the difference in means, while here we calculated the standard error of the mean difference. This is the key difference between the independent group design and the paired group design.
And, finally, our tobt:
What is the degrees of freedom for a paired t-test? Since we only have 10 subjects in our group, the df is the same as in the one-sample t-test: N - 1. Now, we can look up the critical value of t for a two-tail test using = 0.05 and the appropriate degrees of freedom (df = 10-1 = 9). The critical value is 2.262, which is larger than our obtained t-statistic. Thus, we retain the null hypothesis and conclude that there is no sufficient evidence for the claim that the birth control pill affects the blood pressure of women who take it.
Note: If you used a statistics program to calculate the result of the t-test, you will find the p-value of this test is 0.074. When the p-value is greater than 0.05 and less than 0.1, we often refer to the result as "marginally significant". A marginally significant result often indicates a weak but probable effect, and thus is worth mentioning. Reporting a result as "marginally significant" is a common practice in experimental sciences.