Differences between revisions 4 and 5
Revision 4 as of 2011-10-24 23:51:09
Size: 2170
Editor: cpe-69-207-82-159
Comment:
Revision 5 as of 2011-10-24 23:51:29
Size: 2169
Editor: cpe-69-207-82-159
Comment:
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:
  * Another situation where the underlying population may not be normally distributed involves the use of ordinal data. Ordinal data are discrete variables: by definition, the underlying population cannot be normal because a normal distribution is continuous. For tests that are applicable in these cases, please refer to the table in [[GeneralGuidance#Choose_the_Right_Statistical_Test|Choose the right statistical test]].  * Another situation where the underlying population may not be normally distributed involves the use of ordinal data. Ordinal data are discrete variables: by definition, the underlying population cannot be normal because a normal distribution is continuous. For tests that are applicable in these cases, please refer to the table in [[GeneralGuidance#Choose_the_Right_Statistical_Test|Choose the right statistical test]].

The Gaussian/Normality Assumption Revisited

In The Basics we briefly mentioned the normality assumption underlying many statistical tests. In fact, the statistical tests described in this wiki all require that the populations of interest are normally distributed. Thus, when your data are collected from populations that violate this assumption, these tests are not applicable.

Here we illustrate two kinds of situations where the population of interest is not normally distributed:

  • Suppose that we are interested in estimating the mean number of accidents during a specific time period, say, within 24 hours, on the highway. Intuition tells us accidents on the highway are extremely rare. If we collect data from traffic reports for many 24-hour periods, chances are that most of those 24-hour periods are without an accident, some of them with one accident reports, and probably very few of them have two accidents or more. In other words, frequencies of low accident counts are very high, while the frequencies of high accident counts are very low.
  • This is a case where the population distribution is not normal. The distribution of accidents on the highway is highly skewed to the right (meaning most of the probability density mass is on the left of the distribution). If it were a normal distribution, the highway would be much more dangerous.
  • Another situation where the underlying population may not be normally distributed involves the use of ordinal data. Ordinal data are discrete variables: by definition, the underlying population cannot be normal because a normal distribution is continuous. For tests that are applicable in these cases, please refer to the table in Choose the right statistical test.

In summary, the array of statistical tests covered in this wiki require that the populations of interest are distributed according to a normal distribution. If that assumption is violated, which usually occurs in the form of the two examples above, one should be careful about selecting the appropriate tests.

NonParametricWarning (last edited 2012-01-07 21:17:23 by CelesteKidd)

MoinMoin Appliance - Powered by TurnKey Linux