Topic 2: Describing Data Graphically

Fei Ye

November 2024

Learning Goals


1 Distribution of Numerical Data

Concepts used in the description of a distribution

The center of a distribution may refer to the mean, the weight balancing point; the median, the 50/50 breaking points; or a mode, a peak. For a geometric explanation, see Mean, Median and Mode in Distributions: Geometric Aspects


2 Dot Plots

A dot plot includes all values from the data set, with one dot for each occurrence of an observed value from the set.

How to Construct

  1. Draw a horizontal line and mark it with an appropriate measurement scale.
  2. Locate each value in the data set along the measurement scale, and represent it by a dot. Stack the dots vertically if the value appear multiple times.

3 Example: Petal Lengths of Iris Flower

The data set contains 15 petal lengths of iris flower. Create a dot plot to describe the distribution of petal lengths.

1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2

Solution: For each number in the data set, we draw a dot. We stack dots of the same value from bottom to up.


Practice: Heights of Cherry Trees

The data set contains the heights of 20 Black Cherry Trees. Create a dot plot to describe the distribution of the heights.

64, 65, 66, 71, 72, 74, 74, 75, 75, 76, 76, 77, 79, 80, 80, 80, 81, 81, 86, 87


4 Histograms (1 of 2)


5 Histograms (2 of 2)


6 Conventions on Constructing Bins

  1. Histograms are usually used for continuous data. For discrete numerical variables, the using of a bar chart or a histogram depends on context.
  2. The choice of bin width (or the number of bins) has a more significant impact on the histogram than the method used to determine bin limits.
  3. It’s often helpful to visualize the data using different binning strategies.

7 Example: Histogram of mpg (1 of 3)

The following data set show the mpg (mile per gallon) of \(30\) cars. Construct a frequency table and frequency histogram for the data set using \(7\) bins. What can be concluded from the histogram?

21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 30.4, 33.9, 21.5, 15.5, 15.2, 13.3, 19.2, 27.3, 26, 30.4, 15.8, 19.7

Solution:


8 Example: Histogram of mpg (2 of 3)

  • Construct a frequency distribution table
Bin Frequency
[ 10.4 , 13.8 ) 3
[ 13.8 , 17.2 ) 7
[ 17.2 , 20.6 ) 7
[ 20.6 , 24 ) 6
[ 24 , 27.4 ) 3
[ 27.4 , 30.8 ) 2
[ 30.8 , 34.2 ) 2
  • Graph the histogram using the frequency distribution table.

The blue dot-dash curve is called the density curve, the brown dashed line is over the mean, the red dotted line is over the median.


9 Example: Histogram of mpg (3 of 3)

The following information can be obtained from the histogram.


10 Some Remarks on Histogram (1 of 2)


11 Some Remarks on Histogram (2 of 2)


Practice: Petal Lengths of Irises

The following data set show the petal length of 20 irises. Construct a frequency table and frequency histogram for the data set using 6 bins. What can you conclude from the histogram?

5.6, 5.1, 5, 6.7, 1.4, 5.9, 1.6, 1.5, 1.5, 3.9, 5.1, 1.2, 4.7, 4.3, 1.4, 4.7, 6.1, 4.2, 4.8, 6


Practice: Frequency Table and Histogram


12 Common Descriptions of Shape Distribution


Practice: Shapes of Distributions

Statistics are used to compare and sometimes identify authors. The following lists shows a simple random sample that compares the letter counts for three authors.

Terry: 7, 9, 3, 3, 3, 4, 1, 3, 2, 2

Davis: 3, 4, 4, 4, 1, 4, 5, 2, 3, 1

Maris: 2, 3, 4, 4, 4, 6, 6, 6, 8, 3

Create a dot plot for each sample and describe the shape of the distribution of each sample.

Source: Example 2.7.1, OpenStax Introductory Statistics


Practice: Shapes of Distributions


13 Centers of a Data Set


14 Mean and Median for Distributions in Different Shapes

Source: https://istats.shinyapps.io/MeanvsMedian/


Practice: Appropriate Measure of Center

A student survey was conducted at a major university. The following histogram shows distribution of alcoholic beverages consumed in a typical week.

  1. What is the typical number of drinks a student has during a week?
  2. Do the data suggest that drinking is a problem in this university?

The red line is over the median and the blue line is over the mean.


15 Frequency Distribution for Categorical Data


16 Visualization of Categorical Data


17 Example: Distribution of Majors (1 of 2)

The counts of majors of 100 students in a sample are shown in the table on the right. Visualize the data using a bar, pie and stacked bar chart.

Major Frequency (Counts)
Art 30
Engineering 50
Science 20

Solution: The relative frequency table is shown below.

Major Frequency Relative Frequency
Art 30 30%
Engineering 50 50%
Science 20 20%
Total 100 100%

18 Example: Distribution of Majors (2 of 2)

The following are the charts created in Excel.

Bar chart

Pie chart

Stacked bar chart


Practice: Passengers on Titanic

The following data table summarize passengers on Titanic. Using a chart to describe the data table.

Class Passengers
1st 325
2nd 285
3rd 706
Crew 885

Practice: Pie Chart


Lab Instructions in Excel


19 Frequency Tables

In Excel, to create a frequency table for a data array, we need a bin array. The values in a bin array in Excel are the first \(k-1\) upper bin limits. For example, if the bin array consists of 30, 40, and 50, then the bins will be \([\text{min},30]\), \((30,40]\), \((40, 50]\), \((50, \infty)\).

With a data array and a bin array, the Excel function FREQUENCY(data_array, bins_array) can be used to create a frequency table.

Suppose the data set is in column A and the bin array is in column B.

  1. In column C, select a column array of \(k\) cells, then enter =FEQUENCY(
  2. select the data values
  3. in the formula bar, enter the symbol comma ,
  4. select the bin array
  5. in the formula bar, enter ).

Hit Enter (Ctrl + Shift + Enter in older versions), you will get a frequency table.


20 Charts in Excel

Excel has many built-in chart functions. To create a charts,

  1. Select the data array/table
  2. Under the Insert tab, click on an appropriate chart in the Charts command set.

The appearance of chart can be changed after being created.


21 Histogram (1 of 2)

  1. Select the data

  2. On the Insert tab, in the Charts group, from the Insert Statistic Chart dropdown list, select Histogram:

    Note: The histogram contains a special first bin which always contains the smallest number. This is different from many textbooks.

To format the histogram chart is similar to format a Pie chart. For example, you can change bin width from Format Axis.

  1. Right-click on the horizontal axis and choose Format Axis in the popup menu:

  2. In the Format Axis pane, on the Axis Options tab, you may try different options for bins.


22 Histogram (2 of 2)


23 The Analysis ToolPak

Suppose your data set is in Column A in Excel.


24 Dotplot


Lab Practice: Home State Attending Rates

Describe the distribution of percentage of college students attending college in home states. (To be demonstrated in-class)

93, 92, 91, 91, 90, 90, 90, 90, 89, 89, 89, 89, 89, 89, 89, 88, 87, 87, 85, 85, 85, 85, 84, 84, 83, 81, 81, 81, 80, 78, 77, 77, 76, 76, 76, 76, 72, 72, 70, 68, 67, 65, 65, 64, 62, 60, 58, 57, 57, 50

Data is taken from Example 3.15 in Introduction to Statistics and Data Analysis.


25 Lab Practice: Sleep Deficit and School Start Time

Consider the frequency table on the right.

  1. Draw histograms for the distribution of sleep deficit for morning start schools and afternoon start schools.
  2. What conclusion can you draw from the histograms?
Sleep Deficit Morning Start Afternoon Start
(in hours) Rel. Freq. Rel Freq.
−6 to < −4 0.007 0.02
−4 to < −2 0.028 0.05
−2 to < 0 0.065 0.19
0 to < 2 0.442 0.57
2 to < 4 0.364 0.12
4 to < 6 0.078 0.04
6 to < 8 0.015 0.01

Source: Example 3.16 in Textbook Introduction to Statistics and Data Analysis | 6th Edition.


Lab Practice: Distribution of Random Numbers

Use Excel to complete the following tasks:

  1. Create a random sample of 30 two-digit integers.

  2. Create a histogram with 6 bins for the sample.

  3. Describe the shape of the distribution of the sample of 30 two-digit integers.