The Gaussian/Normality Assumption Revisited
In the "Choose the Right Statistical Test" section, we briefly mentioned the normality assumption underlying many statistical tests. In fact, the statistical tests described in this wiki all require that the populations of interest are normally distributed. Thus, when your data are collected from populations that violate this assumption, these tests are not applicable.
Here we illustrate two kinds of situations in which the populations of interest are not normally distributed:
- Suppose that we are interested in estimating the mean number of accidents during a specific time period, say, within 24 hours, on the highway. Intuition tells us accidents on the highway are extremely rare. If we collect data from traffic reports for many 24-hour periods, chances are that the frequency of periods with low accident reports are very high, while the frequency of periods with high accident reports are very low.
- This is a case where the population distribution is not normal. The distribution of accidents on the highway is highly skewed to the right (meaning most of the probability density mass is on the left of the distribution). If it were a normal distribution, the highway would be much more dangerous.
Another situation where the underlying population may not be normally distributed involves ordinal data. Ordinal data are discrete variables: by definition, the underlying population cannot be normal because a normal distribution is continuous. For tests that are applicable in these cases, please refer to the table in "Choose the Right Statistical Test".
In summary, the array of statistical tests covered in this wiki require that the populations of interest are distributed according to a normal distribution. If that assumption is violated, which usually occurs in the form of the two examples above, one should be careful about selecting the appropriate tests.