What is Dispersion in Statistics | Measures, Types and Formulas
By gobrain
Jun 9th, 2024
Dispersion in statistics refers to how spread out or scattered a set of data is in relation to its central tendency. In simpler terms, it tells you how much the data points deviate from an average value (mean, median, or mode).
For example, A data set with low dispersion will have its values clustered closely around the central tendency, while a data set with high dispersion will have its values spread out over a larger range.
Imagine you measure the heights of 10 people. In one scenario, all 10 people are very close in height, with values ranging from 5'6" to 5'8". In another scenario, the heights range from 4'10" to 6'4". The second scenario has higher dispersion because the data points are more spread out.
Measures of Dispersion
In statics, we have different measures to calculate the dispersion in a dataset. These measures provide a numerical value that summarizes how scattered the data points are. There are several common measures of dispersion, each with its own advantages and limitations.
Measures are categorized in two ways:
- Absolute Measure of Dispersion
- Relative Measure of Dispersion
Now, let's see them in depth:
Absolute Measure of Dispersion
Absolute measures of dispersion quantify the spread of data in the same units as the original data set. These measures are helpful for understanding the dispersion within a single data set or comparing data sets with the same units.
Here's a breakdown of some common absolute measures of dispersion:
Range
The range is the simplest measure of dispersion. It's calculated by subtracting the smallest value (minimum) from the largest value (maximum) in the data set. While easy to compute, the range can be misleading if the data has outliers (extreme values).
Variance
The variance is the average squared deviation of each data point from the mean. It represents how much, on average, the data points differ from the mean. However, variance is expressed in squared units of the data, which can be difficult to interpret in the context of the original data.
Standard Deviation
The standard deviation is the square root of the variance. It's expressed in the same units as the original data, making it easier to interpret than the variance. Generally, a higher standard deviation indicates greater spread in the data.
Mean Deviation
The mean deviation is the average of the absolute deviations of each data point from the mean (or median). It's less sensitive to outliers compared to the range but can be less efficient to calculate for large datasets.
Quartile Deviation
The quartile deviation is calculated using the interquartile range (IQR), which is the difference between the upper quartile (Q3) and the lower quartile (Q1) of the data. It represents the spread of the middle 50% of the data points.
Relative Measures of Dispersion
While the measures of dispersion we discussed earlier (range, variance, standard deviation, etc.) are useful, they have a limitation: they are expressed in the same units as the original data. This makes it difficult to compare the spread of data sets that use different units.
For instance, imagine you have data on income in two countries: Country A measures income in dollars and Country B uses euros. You can't directly compare the standard deviations of these two datasets to determine which country has a higher income spread because the units (dollars and euros) are different.
This is where relative measures of dispersion come in. These measures express the spread of data as a proportion of the central tendency (usually the mean). This allows for comparisons between data sets with different units.
Here are some common types of relative measures of dispersion:
Coefficient of Variation (CV)
This is the most widely used relative measure. It's calculated by dividing the standard deviation by the mean and multiplying by 100% (to express it as a percentage). A higher CV indicates a higher proportion of the data spread relative to the mean.
Coefficient of Range
This is calculated by dividing the range by the sum of the maximum and minimum values in the data set. It's less commonly used but can be helpful for quick comparisons.
Coefficient of Quartile Deviation
Similar to the CV, this measure divides the interquartile range (IQR) by the median and multiplies by 100%. It represents the spread of the middle 50% of the data relative to the median.
Coefficient of Mean Deviation
This is the mean deviation divided by the mean and expressed as a percentage. It's less common than CV but can be useful for datasets with outliers.
Conclusion
Dispersion is the important concept in statistics to understand how data is spread out from central tendecy(mean, medium). In this article, we have discussed the different types of dispersion measures.
Thank you for reading.