class: center, middle, inverse, title-slide .title[ # Lesson 3: Statistical Tables and Graph ] .author[ ### Fei Ye ] .date[ ### May, 2024 ] --- class: center middle
## Unit 5C: Statistical Tables and Graphs *Primary Source:* PPT for the book "Using & Understanding Mathematics". --- ## Frequency Tables - A **basic frequency table** has two columns: - The first column lists all the categories of data. - The second column lists the frequency of each category, which is the number of data values in the category. - The **relative frequency** of any category is the fraction (or percentage) of the data values that fall in that category: `$$\text{relative frequency} = \dfrac{\text{frequency in category}}{\text{total frequency}}$$` - The **cumulative frequency** of any category is the number of data values in that category and all preceding categories. --- ## Example: Frequency Table The following frequency table describes the distribution of hair color of 592 statistics students. <table class="table" style="font-size: 18px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Frequency </th> <th style="text-align:center;"> Relative Frequency </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Black </td> <td style="text-align:center;"> 108 </td> <td style="text-align:center;"> 0.182 </td> </tr> <tr> <td style="text-align:left;"> Brown </td> <td style="text-align:center;"> 286 </td> <td style="text-align:center;"> 0.483 </td> </tr> <tr> <td style="text-align:left;"> Red </td> <td style="text-align:center;"> 71 </td> <td style="text-align:center;"> 0.120 </td> </tr> <tr> <td style="text-align:left;"> Blond </td> <td style="text-align:center;"> 127 </td> <td style="text-align:center;"> 0.215 </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:center;"> 592 </td> <td style="text-align:center;"> 1.000 </td> </tr> </tbody> </table> --- ## Practice: Complete the Frequency Table <iframe src="https://www.myopenmath.com/embedq2.php?id=60325&seed=2021&showansafter&allowregen" width="100%" height="480px" data-external="1"></iframe> --- ## Data Types and Binning - Qualitative (categorical) data consists of values that can be placed into nonnumerical categories. - Quantitative data represent counts or measurements. When dealing with quantitative data categories, it is often useful to group, or bin, the data into categories that cover a range of possible values. --- ## Example: Data Types Classify each of the following types of data as either qualitative or quantitative. 1. Brand names of shoes in a consumer survey 2. Heights of students 3. Audience ratings of a film on a scale of 1 to 5, where 5 means excellent. -- **Solution:** 1. Brand names are nonnumerical. So these data are qualitative. 2. Heights are measurements. So these data are quantitative. 3. The numbers represent subjective opinions about a film, not counts or measurements. These data are therefore qualitative. --- ## Practice: Qualitative or Quantitative <iframe src="https://www.myopenmath.com/embedq2.php?id=101194&seed=2021&showansafter&allowregen" width="100%" height="480px" data-external="1"></iframe> --- ## Summarizing Raw Data When summarizing a large data set of quantatitive type, instead of counting frequencies of individual elements, it is better to first group data into equal size bins (interval to be more precise), and then count frequencies of elemetns in bins. Here is a way to create bins for a data set. 1. Choose an approperiate bin size. To better summarize the data, one should avoid choosing a bin size that produces too few or too many bins. 2. Starting from the number that less than or equal to the smallest element in the data, adding the bin width to create the left bounds of bins. The right bound of a bin can be chosed to be slightly less than the left bound of the next bin. --- ## Example: Frequency Table Using Appropriate Bins Consider the following 20 scores from a 100-point exam: .center[ 80, 78, 76, 94, 75, 98, 77, 84, 88, 81, 72, 91, 72, 74, 86, 79, 88, 72, 75 ] Determine appropriate bins and make a frequency table including columns for relative and cumulative frequency. <table class="table" style="font-size: 18px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Bins </th> <th style="text-align:center;"> Frequency </th> <th style="text-align:center;"> Relative Frequency </th> <th style="text-align:center;"> Cumulative Frequency </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 95 to 99 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 0.05 = 5% </td> <td style="text-align:center;"> 1 </td> </tr> <tr> <td style="text-align:center;"> 90 to 94 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 0.10 = 10% </td> <td style="text-align:center;"> 3 </td> </tr> <tr> <td style="text-align:center;"> 85 to 89 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0.15 = 15% </td> <td style="text-align:center;"> 6 </td> </tr> <tr> <td style="text-align:center;"> 80 to 84 </td> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 0.15 = 15% </td> <td style="text-align:center;"> 9 </td> </tr> <tr> <td style="text-align:center;"> 75 to 79 </td> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 0.35 = 35% </td> <td style="text-align:center;"> 16 </td> </tr> <tr> <td style="text-align:center;"> 70 to 74 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 0.20 = 20% </td> <td style="text-align:center;"> 20 </td> </tr> <tr> <td style="text-align:center;"> Total </td> <td style="text-align:center;"> 20 </td> <td style="text-align:center;"> 1.00 = 100% </td> <td style="text-align:center;"> 20 </td> </tr> </tbody> </table> --- ## Practice: What's wrong with bined frequency table <iframe src="https://www.myopenmath.com/embedq2.php?id=60352&seed=2021&showansafter&allowregen" width="100%" height="480px" data-external="1"></iframe> --- ## Bar and Pie Graphs - A **bar chart** shows each category with a bar whose length corresponds to its frequency or relative frequency. - **Pie charts** are used primarily for relative frequencies, because the total pie must always represent the total relative frequency of 100%. The size of each wedge is proportional to the relative frequency of the category it represents. --- ## Example: Bar and Pie Graphs The bar chart and pie chart below both show the data from the table. .row[ .onethird-left[ <table class="table" style="font-size: 18px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Grade </th> <th style="text-align:center;"> Frequency </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> A </td> <td style="text-align:center;"> 4 </td> </tr> <tr> <td style="text-align:center;"> B </td> <td style="text-align:center;"> 7 </td> </tr> <tr> <td style="text-align:center;"> C </td> <td style="text-align:center;"> 9 </td> </tr> <tr> <td style="text-align:center;"> D </td> <td style="text-align:center;"> 3 </td> </tr> <tr> <td style="text-align:center;"> F </td> <td style="text-align:center;"> 2 </td> </tr> <tr> <td style="text-align:center;"> Total </td> <td style="text-align:center;"> 25 </td> </tr> </tbody> </table> ] .onethird-center[ ![](data:image/png;base64,#img/image-20200818140917058.png) ] .onethird-right[ ![](data:image/png;base64,#img/image-20200818141022354.png) ] ] --- ## Important Labels for Graphs - **Title/caption:** The graph should have a title or caption (or both) that explains what is being shown and, if applicable, lists the source of the data. - **Vertical scale and title:** Numbers along the vertical axis should clearly indicate the scale. The numbers should line up with the tick marks. Include a label that describes the variable. - **Horizontal scale and title:** The categories should be clearly indicated along the horizontal axis; tick marks are not necessary for qualitative data, but should be used with quantitative data. Include a label that describes the variable that the categories represent. - **Legend:** If multiple data sets are displayed on a single graph, include a legend or key to identify the individual data sets. --- ## Histograms and Line Charts - A histogram is a bar graph for quantitative data categories. The bars have a natural order and the bar widths have a specific meaning. - A line chart shows the data value for each category as a dot, and the dots are connected with lines. For each dot, the horizontal position is the center of the bin it represents and the vertical position is the data value for the bin. - A time-series graph is a line chart or histogram in which the horizontal axis represents time. --- ## Example: Histogram and Line Chart The histogram and line chart below both show the same data. ![](data:image/png;base64,#img/image-20200818141050868.png) --- ## Example: Oscar-Winning Female Actors (1 of 2) The table shows the ages (at the time when they won the award) of all Academy Award winning actresses through 2017. Make a histogram and a line chart to display these data. Discuss the results. <table class="table" style="font-size: 18px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Age </th> <th style="text-align:center;"> Number of Actresses </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 20-29 </td> <td style="text-align:center;"> 32 </td> </tr> <tr> <td style="text-align:center;"> 30-39 </td> <td style="text-align:center;"> 34 </td> </tr> <tr> <td style="text-align:center;"> 40-49 </td> <td style="text-align:center;"> 14 </td> </tr> <tr> <td style="text-align:center;"> 50-59 </td> <td style="text-align:center;"> 2 </td> </tr> <tr> <td style="text-align:center;"> 60-69 </td> <td style="text-align:center;"> 6 </td> </tr> <tr> <td style="text-align:center;"> 70-79 </td> <td style="text-align:center;"> 1 </td> </tr> <tr> <td style="text-align:center;"> 80-89 </td> <td style="text-align:center;"> 1 </td> </tr> <tr> <td style="text-align:center;"> Total </td> <td style="text-align:center;"> 90 </td> </tr> </tbody> </table> --- ## Example: Oscar-Winning Female Actors (2 of 2) - The data are quantitative and organized in 10-year bins. The figure below shows the data as both a histogram and a line chart. .center[ ![:resize , 45%](data:image/png;base64,#img/image-20200818141120013.png) ] - The data show that most actresses win the award at a fairly young age. **Note:** Histogram bars touch one another because there are no gaps between the intervals. --- ## Practice: Histogram of Test Scores The following frequency table describes the distribution 20 scores from a 100-point exam. Create a histogram and discuss the result. <table class="table" style="font-size: 18px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Bins </th> <th style="text-align:center;"> Frequency </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 95 to 99 </td> <td style="text-align:center;"> 1 </td> </tr> <tr> <td style="text-align:center;"> 90 to 94 </td> <td style="text-align:center;"> 2 </td> </tr> <tr> <td style="text-align:center;"> 85 to 89 </td> <td style="text-align:center;"> 3 </td> </tr> <tr> <td style="text-align:center;"> 80 to 84 </td> <td style="text-align:center;"> 3 </td> </tr> <tr> <td style="text-align:center;"> 75 to 79 </td> <td style="text-align:center;"> 7 </td> </tr> <tr> <td style="text-align:center;"> 70 to 74 </td> <td style="text-align:center;"> 4 </td> </tr> <tr> <td style="text-align:center;"> Total </td> <td style="text-align:center;"> 20 </td> </tr> </tbody> </table> --- ## Example: Reading Time-Series Diagrams (1 of 2) The figure shows a time-series graph of homicide rates in the United States. Briefly summarize what it shows. ![](data:image/png;base64,#img/image-20200818141136560.png) --- ## Example: Reading Time-Series Diagrams (2 of 2) - The graph shows how the *homicide rate* per 100,000 people has *changed since 1960*. - We see that the homicide rate *rose dramatically more than doubling from a minimum around 1962 to a first peak around 1974*. - It then *remained high*, with some variations, *through about 1993*. - *After 1993, it fell dramatically to the year 2000*, then *stayed nearly constant* until a *slight drop from 2008 through 2012*. --- ## Practice: Read data from times series <iframe src="https://www.myopenmath.com/embedq2.php?id=67860&seed=2024&showansafter&allowregen" width="100%" height="480px" data-external="1"></iframe> --- ## Practice: Time Series of the Population The population of a small village is shown below. ![:resize , 50%](data:image/png;base64,#img/population_time_series.png) Briefly summarize what it shows.