Descriptive Statistics and Graphical Illustrations

Home Next

 

DESCRIPTIVE STATISTICS AND GRAPHICAL ILLUSTRATIONS

Median            Mode                      Mean

Range            Quartiles         Standard Deviation

Boxplot        Stem and Leaf        Test Yourself        

 

MEASURES OF CENTRAL TENDENCY

A measure of central tendency is a number which indicates the middle of the distribution of data values. The three main measures are the median, the mode and the mean.

Median

The median is a number which is greater than half the data values and less than the other half. If there are an odd number of values, the median is the middle one when they are sorted in order of magnitude. If there are an even number of values, the median is the average of the two middle values.

E.g.

6,   6.7,   3.8,   7,   5.8

Arranged in order of magnitude these are

3.8,   5.8,  6,   6.7,   7

                      |

             median

 

E.g.

6,   6.7,   3.8,   7,   5.8,   9.9

Arranged in order of magnitude these are:

3.8,   5.8,   6,    6.7,   7,   9.9

                        |         |

             2 middle values

median = ( 6 + 6.7 ) / 2 = 6.35 

 

APPLET "Median"

Enter the median of the data values on the space provided and press the Check button to check your answer. Click on the More button for more data values. For the correct answer press the Answer button.


 

Back to Top

 

Mode

The mode is the value or category which occurs most frequently. If several data values occur with the same maximal frequency, they are all modes.

E.g.

3.8,   5.8,   6,    6,   6.7,   7,   9.9

mode = 6

 

E.g.

3.8,   5.8,   6,    6,   6.7,   7,   9.9,   9.9

mode = 6,   9.9

 

E.g.

3.8,   5.8,   6,    6.7,   7,   9.9

mode = 3.8,   5.8,   6,   6.7,   7,   9.9

 

APPLET "Mode"

Enter the mode of the data values on the space provided separated by a SINGLE SPACE if more than one mode and press the Check button to check your answer. Click on the More button for more data values. For the correct answer press the Answer button.


 

Back to Top

 

Mean

This is denoted by x (read as 'x bar') and defined as the arithmetic mean of all the data values.

x = x1 + x2 + x3 + ... + xn / n

E.g. 4

x = 3.8 + 5.8 + 6 + 6.7 + 7 + 9.9 / 6

x = 6.5

It is necessary to sort the data in order of magnitude before you can find the median. For large data sets this may be time consuming and this is the reason why medians were not used much until computers became readily available

The median is not affected by extreme values, but the mean is changed (compare results for data sets A and B above).

In many situations the median is a better description of central tendency (e.g. many more people have less than the average income than have more).

APPLET "Mean"

Enter the mean of the data values on the space provided and press the Check button to check your answer. Click on the More button for more data values. For the correct answer press the Answer button.


 

Back to Top

 

MEASURES OF VARIABILITY

These are statistics which summarize how spread out the data values are. They are also called measures of dispersion.

Range

The range is the difference between the lowest value and the highest value: the maximum minus the minimum. For the data, the maximum is 9.975 and the minimum is 3.8:

Range = (Maximum - Minimum) = (9.975 - 3.8) = 6.175

The range depends only on the extreme values in the data set.
Mistakes in data, such as reversing digits (e.g. 52 for 25) or omitting digits (e.g. 12 for 132) may produce extreme values. A measure of the spread of data which is not so much affected by extreme values as the range is to take values 5% in from either end, or 1/4 in from either end.

APPLET "Range"

Enter the range of the data values on the space provided and press the Check button to check your answer. Click on the More button for more data values. For the correct answer press the Answer button.


 

Back to Top

 

Quartiles

When the data are arranged in order of magnitude (i.e. they are ranked) the quartiles are 3 numbers which divide the data into four groups each having approximately the same number of values.

25% | 25% | 25% | 25%

      Q1     Q2     Q3

Procedure

  1. Order the n data values from smallest to largest.
  2. The 2nd quartile, Q2 is the median of the whole data set.
  3. If n is even, the first quartile, Q1, is the median of the smallest n/2 observations and the third quartile, Q3, is the median of the largest n/2 observations.
  4. If n is odd, Q1 is the median of the smallest (n-1)/2 observations, and Q3 is the median of the largest (n-1)/2 observations.

APPLET "Quartiles"

Enter the quartiles of the data values on the spaces provided and press the Check button to check your answer. Click on the More button for more data values. For the correct answer press the Answer button.


 

Back to Top

 

Interquartile Range

The interquartile range is defined as   IQR = Q3 - Q1.   i.e. the 75th and 25th percentiles or equivalently the middle 50% of the data.

E.g.

Consider  6.0,   6.7,   3.8,   7.0,    5.8,   9.9,   10.5,   5.9,   20.0

Arrange these in order of magnitude

3.8,   5.8,   5.9,   6.0,   6.7,   7.0,   9.9,   10.5,   20.0

The median is Q2 = 6.7 (there are 4 values on either side)

Q1 = 5.9 (median of the 4 smallest values)

Q3 = 10.2 (median of the 4 largest values)

IQR = 10.2 - 5.9 = 4.3.

Just as the median is not affected much by extreme values, neither is the IQR.

 

Back to Top

 

Percentiles

Values that divide cases below which certain percentages of values fall. Quartiles divide the ordered data into quarters, but we can consider any fractions we please. The most common are "percentiles", where we take hundredths. The first quartile is thus the 25th percentile, the median is the 50th percentile and the upper quartile is the 75th percentile.

The percentiles most commonly used, after the 50th, are those close to 100. Thus the 90th percentile is the value that is exceeded by only 10% of the sample or the population, and the 99th percentile is exceeded by only 1 in 100.

You will occasionally also see "deciles", which are found by dividing the data into tenths, and "quintiles", which divide the data into fifths. The first quintile is identical to the 20th percentile, the median is the fifth decile, and so on.

 

Back to Top

 

Standard Deviation and Variance

The standard deviation describes the "average distance" of data values from their mean i.e. the square root of the average squared deviations from the mean. It measures how the data values differ from the mean. a small standard deviation implies most values are near the average. A large standard deviation indicates that values are widely spread above and below the average.

The distance of each value xi from the mean x is di = x - xi

The mean of those distances, d is always zero. 
Instead we could use the squares of the distances di^2 (because the square of a negative number is positive)

But there is still a problem with the squared distances di^2 have units squared (e.g. if xi are lengths, di^2 are lengths squared or areas)

So we take the square root, namely sqrt sum di^2 / n-1

Usually n-1 instead of n is used in the denominator because this gives an estimate with slightly better mathematical properties.

i.e. use s = sqrt sum di^2 / n-1

This is called the sample standard deviation.

The value obtained before taking the square root is called the sample variance. It is denoted by s^2.

s^2 = sum di^2 / n-1

E.g.

Data set A:   { xi } = 2, 3, 3, 4, 5, 7, 8:

There are n = 7 observations and x = 4.57.

The deviations from the mean, di = xi - x, are:  

-2.57, -1.57, -1.57, -0.57, 0.43, 2.43, 3.43. So

s^2 = 1/6 ( (-2.57^2) + (-1.57^2) + ... + (3.43^2)) = 4.95

s = sqrt 4.95 = 2.22

 

APPLET "Descriptive Statistics Measures"

Click on the link provided.

              applet


 

Back to Top

 

GRAPHICAL ILLUSTRATIONS

Boxplot

This is a graphical summary based on the median, quartiles and extreme values.

Often called the Box and Whiskers Plot, the box represents the interquartile range which contains 50% of the cases. The whiskers are lines that extend from the box to the highest and lowest values.

A line across the box indicates the median. Extreme values are cases more than 1.5 box lengths from the upper or lower end of the box. The extreme cases are listed on the plot.

APPLET "Boxplot"

Click on the links provided.

              applet 1

              applet 2


 

Back to Top

 

Stem and Leaf

This is a depiction of the shape of the data based on the actual numbers observed.

The stem usually depicts the 10s and the leaves depict the units.

APPLET "Stem and Leaf"

Click on the link provided.

              applet


Back to Top

 

Test Yourself

APPLET "Stem and Leaf and Boxplot"

Click on the link provided.

              applet


 

Back to Top
 

For problems or questions regarding this web contact [ProjectEmail].
Last updated: October 03, 2001.