Topic 3: Summarize Data Numerically

Fei Ye

January 2024

1. Learning Goals


2. Quartiles, Interquartile Range and Outliers


3. Example: Median, IQR and Outliers

Find the median, quartiles, IQR and outliers (if they exist) of the sample height of 15 trees.

70, 65, 63, 72, 81, 83, 66, 75, 80, 75, 79, 76, 76, 69, 75

Solution:


Practice: Five-number Summary, Range and IQR


4. Box Plot


5. Example: Box Plot - Best Oscar Winners (1 of 2)

Create the boxplot for the ages of 32 best actor oscar winners (1970–2001).

31, 32, 32, 33, 35, 36, 37, 37, 38, 38, 39, 40, 40, 40, 42, 42, 43, 43, 45, 45, 46, 47, 48, 48, 51, 55, 55, 56, 60, 60, 61, 76

Solution: You may use Excel to find the five-number summary.


6. Example: Box Plot - Best Oscar Winners (2 of 2)


Practice: Five-Number Summary from the Boxplot


7. Notations and Calculations about Mean


8. Example: Mean City mpg

Find the mean city mpg for a sample of 10 cars.

18, 21, 20, 21, 16, 18, 18, 18, 16, 20

Solution: The mean is

$$\bar{x}=\frac{18+21+20+21+16+18+18+18+16+20}{10}=18.6.$$

The mean mpg of the 10 cars is 18.6 mpg.

In Excel, suppose the data are in the column array A1:A10, you may use the function =AVERAGE(A1:A10) to find the mean.


9. Weighted Mean


10. Example: Course Overall Grade

In a course, the overall grade is determined in the following way: the homework average counts for 10%, the quiz average counts for 10%, the test average counts 50%, and the final exam counts for 30%. What’s the overall grade of the student who earned 92 on homework, 95 on quizzes, 90 on tests and 93 on the final.

Solution: The overall grade is the weighted mean

$$\frac{\sum w_ix_i}{\sum w_i}=\frac{0.1\cdot 92+0.1\cdot 95+0.5\cdot 90+0.3\cdot 93}{0.1+0.1+0.5+0.3}=91.6.$$


Practice: Mean Petal Width

Find the average petal width for a sample of 10 iris followers.

0.2, 2.1, 0.2, 1.7, 2.3, 0.3, 1.2, 0.2, 1.8, 2.3


Practice: Mean from a Dotplot

Find the mean from the dot plot of sepal length for a sample of 10 iris flowers.


Practice: Course Overall Grade


11. Population Standard Deviation


12. Sample Standard Deviation

Note: To measure the spread, one may also use the mean absolute deviation \(MAD=\dfrac{\sum |x-\bar{x}|}{n}\). However, the standard deviation has better properties in applications.


13. Example: Mean and SD - Best Oscar Winners

Find the mean and standard deviation ages of a sample of 32 best actor Oscar winners (1970–2001).

31, 32, 32, 33, 35, 36, 37, 37, 38, 38, 39, 40, 40, 40, 42, 42, 43, 43, 45, 45, 46, 47, 48, 48, 51, 55, 55, 56, 60, 60, 61, 76

Solution:

We use the Excel functions AVERAGE() and STDEV.S() to find the mean and sample standard deviation respectively. The mean is 44.7. The sample standard deviation is 10.3.


14. Example: Mean and SD - Calculation By Hand

Source: https://www.geogebra.org/m/DS6PUaXy


Practice: Standard Deviation - GPA

A sample of GPAs from ten students random chosen from a college are recorded as follows.

1.90, 3.00, 2.53, 3.71, 2.12, 1.76, 2.71, 1.39, 4.00, 3.33

Find the standard deviation of this sample.


Practice: Mean, Standard Deviation and Variance


15. Mean and SD under Linear Transformation

Source: https://www.geogebra.org/m/r25rDxYZ


16. Effect of Changes of Data

Source: https://www.geogebra.org/m/fenbj3qZ


Practice: Changes Under Transformations

A sample of the highest temperature of 10 days has a standard deviation \(5^\circ\mathrm{C}\) in Celsius.

  1. If we want to know the standard deviation in Fahrenheit, do we need to recalculate using the sample?

  2. What is the standard deviation in Fahrenheit.


17. Standardization

When comparing variables that may have different measurement, it is better to use standardized values. Convert a value to a standard value is called standardization which doesn’t change the distribution of the data set.

Let \(x\) be a data value, \(\bar{x}\) the mean, and \(s\) the standard deviation of a data set. The standard value (or \(z\)-score) of \(x\) is defined as $$z = \frac{x-\bar{x}}{s}, \quad\text{equivalently}\quad x=zs+\bar{x}.$$

In Excel, the \(z\)-score can be obtained by =STANDARDIZE(x, mean, sd).


18. Example: Standardized Test Scores

Consider a data set of scores on a standardized test with a mean 70 and standard deviation of 15.

Solution:


Practice: Understanding \(Z\)-score


19. The Empirical Rule

If a data set has an approximately bell-shaped distribution, then

  1. approximately 68% of the data lie within one standard deviation of the mean.
  2. approximately 95% of the data lie within two standard deviations of the mean.
  3. approximately 99.7% of the data lies within three standard deviations of the mean.

Empirical Rule

Image source: Figure 2.16 “The Empirical Rule” in Introductoray Statistics


20. Chebyshev’s Theorem

For any numerical data set, at least \(1−1/k^2\) of the data lie within \(k\) standard deviations of the mean, where \(k\) is any positive whole number that is at least 2.

Empirical Rule

Image source: Figure 2.19 “Chebyshev’s Theorem” in Introductoray Statistics


21. Example: Applications of the Empirical Rule

A population data set with a bell-shaped distribution has mean \(\mu = 6\) and standard deviation \(\sigma = 2\). Find the approximate proportion of observations in the data set that lie:

  1. between 4 and 8;
  2. below 4.

Solution:

Apply the Empirical Rule, there are 68% of data lie between 6-2=4 and 6+2=8. Since the distribution is symmetric, then 34% of data lie between 4 and 6, and 34% of data lie between 6 and 8. Then there are only 50%-34%=26% of data lie below 4.


22. Example: Applications of Chebyshev’s Theorem

A sample data set has mean \(\bar{x}=6\) and standard deviation \(s = 2\). Find the minimum proportion of observations in the data set that must lie between 2 and 10.

Solution:

Apply Chebyshev’s theorem, there are 75% of data are between \(\bar{x}-2s=2\) amd \(\bar{x}+2s=10\).


Practice: The Empirical Rule


Practice: Chebyshev’s Theorem

A sample data set has mean \(\bar{x}=10\) and standard deviation \(s = 3\). Find the minimum proportion of observations in the data set that must lie between 1 and 19.

Source: 2.5 The Empirical Rule and Chebyshev’s Theorem in Introductory Statistics.


Practice: Change of Measures on Transformation of Data

A teacher decide to curve the final exam by adding 10 points for each student. Which of the following statistic will NOT change:

A. median, B. mean, C. interquartile range, D. standard deviation?

Please explain your conclusion.


Practice: Understand Standard Deviation From Graphs

Which distribution of data has the SMALLEST standard deviation? Please explain your conclusion.

Distributions with different standard deviation


Lab Instruction in Excel


23. Mean, Median, Quartiles and Standard Deviation


24. How to Create a Boxplot in Excel

For more information, see Create a box and whisker chart in Excel 365


Lab Practice: Speeds of Cars

Consider the following sample that consists of speeds of 25 cars.

19, 4, 17, 22, 23, 8, 20, 19, 10, 10, 13, 13, 15, 12, 20, 14, 9, 20, 12, 11, 11, 17, 14, 7, 19

  1. Use Excel to find the mean, median, quartiles and standard deviation of the sample.
  2. Create a box-plot for the sample.