|
|
|
|
DESCRIPTIVE STATISTICS AND GRAPHICAL ILLUSTRATIONSRange Quartiles Standard Deviation Boxplot Stem and Leaf Test Yourself
MEASURES OF CENTRAL TENDENCYA measure of central tendency is a number which indicates the middle of the distribution of data values. The three main measures are the median, the mode and the mean. MedianThe median is a number which is greater than half the data values and less than the other half. If there are an odd number of values, the median is the middle one when they are sorted in order of magnitude. If there are an even number of values, the median is the average of the two middle values. E.g. 6, 6.7, 3.8, 7, 5.8 Arranged in order of magnitude these are 3.8, 5.8, 6, 6.7, 7 | median
E.g. 6, 6.7, 3.8, 7, 5.8, 9.9 Arranged in order of magnitude these are: 3.8, 5.8, 6, 6.7, 7, 9.9 | | 2 middle values median = ( 6 + 6.7 ) / 2 = 6.35
ModeThe mode is the value or category which occurs most frequently. If several data values occur with the same maximal frequency, they are all modes. E.g. 3.8, 5.8, 6, 6, 6.7, 7, 9.9 mode = 6
E.g. 3.8, 5.8, 6, 6, 6.7, 7, 9.9, 9.9 mode = 6, 9.9
E.g. 3.8, 5.8, 6, 6.7, 7, 9.9 mode = 3.8, 5.8, 6, 6.7, 7, 9.9
MeanThis is denoted by x (read as 'x bar') and defined as the arithmetic mean of all the data values. x = x1 + x2 + x3 + ... + xn / n E.g. 4 x = 3.8 + 5.8 + 6 + 6.7 + 7 + 9.9 / 6 x = 6.5 It is necessary to sort the data in order of magnitude before you can find the median. For large data sets this may be time consuming and this is the reason why medians were not used much until computers became readily available The median is not affected by extreme values, but the mean is changed (compare results for data sets A and B above). In many situations the median is a better description of central tendency (e.g. many more people have less than the average income than have more).
MEASURES OF VARIABILITYThese are statistics which summarize how spread out the data values are. They are also called measures of dispersion. RangeThe range is the difference between the lowest value and the highest value: the maximum minus the minimum. For the data, the maximum is 9.975 and the minimum is 3.8: Range = (Maximum - Minimum) = (9.975 - 3.8) = 6.175 The range depends only on the extreme values in the data set.
QuartilesWhen the data are arranged in order of magnitude (i.e. they are ranked) the quartiles are 3 numbers which divide the data into four groups each having approximately the same number of values. 25% | 25% | 25% | 25% Q1 Q2 Q3 Procedure
Interquartile RangeThe interquartile range is defined as IQR = Q3 - Q1. i.e. the 75th and 25th percentiles or equivalently the middle 50% of the data. E.g. Consider 6.0, 6.7, 3.8, 7.0, 5.8, 9.9, 10.5, 5.9, 20.0 Arrange these in order of magnitude 3.8, 5.8, 5.9, 6.0, 6.7, 7.0, 9.9, 10.5, 20.0 The median is Q2 = 6.7 (there are 4 values on either side) Q1 = 5.9 (median of the 4 smallest values) Q3 = 10.2 (median of the 4 largest values) IQR = 10.2 - 5.9 = 4.3. Just as the median is not affected much by extreme values, neither is the IQR.
PercentilesValues that divide cases below which certain percentages of values fall. Quartiles divide the ordered data into quarters, but we can consider any fractions we please. The most common are "percentiles", where we take hundredths. The first quartile is thus the 25th percentile, the median is the 50th percentile and the upper quartile is the 75th percentile. The percentiles most commonly used, after the 50th, are those close to 100. Thus the 90th percentile is the value that is exceeded by only 10% of the sample or the population, and the 99th percentile is exceeded by only 1 in 100. You will occasionally also see "deciles", which are found by dividing the data into tenths, and "quintiles", which divide the data into fifths. The first quintile is identical to the 20th percentile, the median is the fifth decile, and so on.
Standard Deviation and VarianceThe standard deviation describes the "average distance" of data values from their mean i.e. the square root of the average squared deviations from the mean. It measures how the data values differ from the mean. a small standard deviation implies most values are near the average. A large standard deviation indicates that values are widely spread above and below the average. The distance of each value xi from the mean x is di = x - xi The mean of those distances, d is always zero. So we take the square root, namely sqrt sum di^2 / n-1 Usually n-1 instead of n is used in the denominator because this gives an estimate with slightly better mathematical properties. i.e. use s = sqrt sum di^2 / n-1 This is called the sample standard deviation. The value obtained before taking the square root is called the sample variance. It is denoted by s^2. s^2 = sum di^2 / n-1 E.g. Data set A: { xi } = 2, 3, 3, 4, 5, 7, 8: There are n = 7 observations and x = 4.57. The deviations from the mean, di = xi - x, are: -2.57, -1.57, -1.57, -0.57, 0.43, 2.43, 3.43. So s^2 = 1/6 ( (-2.57^2) + (-1.57^2) + ... + (3.43^2)) = 4.95 s = sqrt 4.95 = 2.22
GRAPHICAL ILLUSTRATIONSBoxplotThis is a graphical summary based on the median, quartiles and extreme values. Often called the Box and Whiskers Plot, the box represents the interquartile range which contains 50% of the cases. The whiskers are lines that extend from the box to the highest and lowest values. A line across the box indicates the median. Extreme values are cases more than 1.5 box lengths from the upper or lower end of the box. The extreme cases are listed on the plot.
Stem and LeafThis is a depiction of the shape of the data based on the actual numbers observed. The stem usually depicts the 10s and the leaves depict the units.
Test Yourself
Back to Top |
|
For problems or questions regarding this web contact [ProjectEmail].
|