class: center, middle, inverse, title-slide .title[ # Lesson 7: Measures of Variation ] .author[ ### Fei Ye ] .date[ ### May, 2024 ] --- class: center middle
## Unit 6B: Measures of Variation *Primary Source:* PPT for the book "Using & Understanding Mathematics". --- ## Why Variation Matters Consider the following waiting times for 11 customers at 2 banks. Big Bank ( three lines ): 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank ( one line ): 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 Which bank is likely to have more unhappy customers? ![](data:image/png;base64,#img/image-20200818144444961.png) **Answer:** Big Bank, due to more surprise long waits. --- ## Range The range of a data set is the difference between its highest and lowest data values. `$$\text{range} = \text{highest value (max)} - \text{lowest value (min)}$$` --- ## Example: Misleading Range (1 of 2) Consider the following two sets of quiz scores for nine students. Which set has the greater range? Would you also say that the scores in the set are more varied? Quiz 1: 1 10 10 10 10 10 10 10 10 Quiz 2: 2 3 4 5 6 7 8 9 10 --- ## Example: Misleading Range (2 of 2) **Solution:** The range for Quiz 1 is 10 - 1 = 9 points, which is greater than the range for Quiz 2 of 10 - 2 = 8 points. However, aside from a single low score (an outlier), Quiz 1 has no variation at all because every other student got a 10. In contrast, no two students got the same score on Quiz 2, and the scores are spread throughout the list of possible scores. The scores on Quiz 2 are more varied even though Quiz 1 has the greater range. --- ## Quartiles The **lower quartile** (or **first quartile**) divides the lowest fourth of a data set from the upper three-fourths. It is the median of the data values in the lower half of a data set. The **middle quartile** (or **second quartile**) is the median of the entire data. The **upper quartile** (or **third quartile**) divides the lower three-fourths of a data set from the upper fourth. It is the median of the data values in the upper half of a data set. --- ## The Five-Number Summary The five-number summary for a data set consists of the following five numbers: .center[low value `\(\quad\)` lower quartile `\(\quad\)` median `\(\quad\)` upper quartile `\(\quad\)` high value ] A boxplot shows the five-number summary visually, with a rectangular box enclosing the lower and upper quartiles, a line marking the median, and whiskers extending to the low and high values that are not outliers. --- ## Example: Best bank (1 of 2) Consider the following waiting times for 11 customers at 2 banks. Big Bank ( three lines ): .center[4.1, 5.2, 5.6, 6.2, 6.7, 7.2, 7.7, 7.7, 8.5, 9.3, 11.0] Best Bank ( one line ): .center[6.6, 6.7, 6.7, 6.9, 7.1, 7.2, 7.3, 7.4, 7.7, 7.8, 7.8] **Solution:** As the data sets are sorted, we can find the five-number summary of the waiting times at each bank by their definitions. --- ## Example: Best bank (2 of 2) .pull-left[ **Big Bank** low value (min) = 4.1 lower quartile = 5.6 median = 7.2 upper quartile = 8.5 high value (max) = 11.0 ] .pull-right[ **Best Bank** low value (min) = 6.6 lower quartile = 6.7 median = 7.2 upper quartile = 7.7 high value (max) = 7.8 ] The corresponding boxplot: .center[ ![:resize , 80%](data:image/png;base64,#img/image-20200818144736953.png) ] --- ## Standard Deviation The standard deviation is the single number most commonly used to describe variation. Roughly speaking, it is the average difference of the data set away from the mean. The difference `\(x-\bar{x}\)` between a data value `\(x\)` and the mean `\(\bar{x}\)` is called the **deviation**. The **standard deviation** is defined as $$ \text{Standard deviation}=\sqrt{\dfrac{\text{sum of squares of deviations}}{\text{total number of values}-1}}. $$ **Remark:** The above defined standard deviation is known as the sample standard deviation. For a population, the standard deviation is defined in the same way but without subtracting 1 from the denominator. --- ## Calculation of (Sample) Standard Deviation The sample standard deviation is calculated by completing the following steps: 1. Compute the mean of the data set. 2. Then find the deviation from the mean for every data value. `$$\text{deviation from the mean} = \text{data value} – \text{mean}$$` 3. Find the squares of all the deviations from the mean. 4. Add all the squares of the deviations from the mean. 5. Divide this sum by the total number of data values minus 1. 6. The standard deviation is the square root of this quotient. --- ## Example: Standard deviation of GPA (1 of 2) Calculate the standard deviation of the GPA of a sample of 6 students. .center[ 2.6, 2.7, 3.72, 3, 3.44 ] **Solution:** First, calculate the mean. The mean GPA is $$ \text{mean}=\dfrac{2.6+2.7+3.72+3+3.44}{5}=3.092 $$ Now we may use a table to calculate the standard deviation. --- ## Example: Standard deviation of GPA (2 of 2) | GPA `\(x\)` | Deviation `\((x-\text{mean})\)` | Square of Deviation `\((x-\text{mean})^2\)` | | :---------: | :------------------------------------------------------------: | :------------------------------------------: | | 2.6 | -0.492 | 0.242 | | 2.70 | -0.392 | 0.154 | | 3.72 | 0.628 | 0.394 | | 3.00 | -0.092 | 0.008 | | 3.44 | 0.348 | 0.121 | | | Sum of Sq. of Dev. `\(\sum(x-\text{mean})^2\)` | 0.920 | | | Average of Sq. of Dev. `\(\frac{\sum(x-\text{mean})^2}{n-1}\)` | 0.230 | | | Standard Deviation `\(\sqrt{\frac{\sum(x-\text{mean})^2}{n-1}}\)` | 0.479 | The standard deviation of GPA sample is 0.479. --- ## The Range Rule of Thumb The standard deviation is approximately related to the range of a distribution by the range rule of thumb: $$ \text{standard deviation}\approx\dfrac{\text{range}}{4} $$ If we know the standard deviation for a data set, we estimate the low and high values as follows: $$ `\begin{aligned} \text{low value}=&\text{mean} - 2(\text{standard deviation})\\ \text{high value}=&\text{mean} + 2(\text{standard deviation}) \end{aligned}` $$ --- ## Example: Estimating a Range Studies of the gas mileage of a Prius under varying driving conditions show that it gets a mean of 45 miles per gallon with a standard deviation of 4 miles per gallon. Estimate the minimum and maximum gas mileage. **Solution:** The mean is 45 mpg and standard deviation is 4 mpg. By the range rule of thumb, we can estimate the low and high values. .pull-left[ $$ `\begin{aligned} &\text{low value}\\ =&\text{mean} - 2(\text{standard deviation})\\ =& 45 - 2\cdot4\\ =& 37 \end{aligned}` $$ ] .pull-right[ $$ `\begin{aligned} &\text{high value}\\ =&\text{mean} + 2(\text{standard deviation})\\ =& 45 + 2\cdot4\\ =& 53 \end{aligned}` $$ ] The range of gas mileage for the car is roughly from a minimum of 37 miles per gallon to a maximum of 53 miles per gallon. --- ## Practice: 5-number summary and boxplot .iframecontainer[ <iframe src="https://www.myopenmath.com/embedq2.php?id=600312&seed=2020&showansafter" width="100%" height="400px" data-external="1"></iframe> ] .footmark[ <a href="https://www.myopenmath.com/embedq2.php?id=600312&seed=2020&showansafter" target="_blank">Click here to open the practice in a new window</a> ] --- ## Practice: Standard deviation .iframecontainer[ <iframe src="https://www.myopenmath.com/embedq2.php?id=677009&seed=2020&showansafter" width="100%" height="400px" data-external="1"></iframe> ] .footmark[ <a href="https://www.myopenmath.com/embedq2.php?id=677009&seed=2020&showansafter" target="_blank">Click here to open the practice in a new window</a> ]