Sampling Distributions

1 Learning Goals

Demonstrate understanding of the sampling distribution of a statistic.
Explain how the central limit theorem applies in inference.
Determine whether a sampling distribution is approximately a normal distribution.
Calculate key characteristics (mean, standard error) of the sampling distribution of a statistic.
Estimate the probability of an event using the sampling distribution.

2 Sampling Distribution

When using sample statistics to estimate population parameter, there will be a chance error $$\text{Population Parameter}=\text{Sample Statistic}+\text{Chance Error}.$$
To understand the chance error, we need to know how sample statistics distribute. Consider samples of the same size $n$ randomly chosen from the population with replacement.
The probability distribution of a sample statistic is called a sampling distribution.

3 Sampling Distribution of a Discrete Variable

Source: https://istats.shinyapps.io/SampDist_discrete/

4 Sampling Distribution of a Continuous Variable

Source: https://istats.shinyapps.io/sampdist_cont/

5 Sample Size Affects Standard Error

The sampling distribution varies as the sample size changes. In general, A larger sample size will result a smaller standard deviation of the sampling distribution.
The standard deviation of a sampling distribution is also called the standard error.

6 Central Limit Theorem for Mean

As the sample size $n$ increases, the sampling distribution of the sample mean, from a population with the mean $\mu$ and the standard deviation $\sigma$, will approach to a normal distribution with the mean $\mu_{\bar{X}}=\mu$ and the standard deviation $\sigma_{\bar{X}}=\dfrac{\sigma}{\sqrt{n}}$.

In terms of standardization, the central limit theorem says that the random variable $\bar{Z}=\dfrac{\bar{x}-\mu}{\sigma/\sqrt{n}}$ has an approximately standard normal distribution.

7 But What is the Central Limit Theorem?

8 Required Sample Size for Mean

For most distributions (not highly skewed), when sample size $n>30$, the sampling distribution of the sample mean $\bar{X}$ can be approximated reasonably well by a normal distribution. The larger the sample size, the better the approximation will be.
When the population is normally distributed, the sampling distribution of the sample means will be normally distributed for any sample size.
If the population distribution is highly skewed, relying on CLT can be risky.

See https://stats.stackexchange.com/questions/3734 for a discussion on intuitive explanation.

9 Example: Sampling Distribution of Small Data (1 of 2)

Randomly draw samples of size 2 with replacement from the numbers 1, 3, 4.

List all possible samples and calculate the mean of each sample.
Find the mean, and standard deviation of the sample means.
Find the mean, and standard deviation of the population.

Solution: Using the Excel function AVERAGE(), we may find means of samples and means of sample means.

Using the Excel function STDEV.P(), we may find the standard deviation of the population and the standard deviation of sample means.

$\color{red}{\mu}$	$\color{red}{\sigma}$	$\color{blue}{\mu_{\bar{X}}}$	$\color{blue}{\sigma_{\bar{X}}}$
2.7	1.25	2.7	0.88

sample	(1,1)	(1,3)	(1,4)	(3,1)	(3,3)	(3,4)	(4,1)	(4,3)	(4,4)
$\bar{X}$	1	2	2.5	2	3	3.5	2.5	3.5	4

It can be verified that $\mu_{\bar{X}}=\mu$ and $\sigma/\sqrt{n}=1.25/\sqrt{2}\approx 0.88=\sigma_{\bar{X}}$.

10 Example: Sampling Distribution of Small Data(2 of 2)

The following are the distribution of the population and the distribution of sample means.

11 Example: Mean Length of Time on Hold

Suppose the mean length of time that a caller is placed on hold when telephoning a customer service center is 23.8 seconds, with standard deviation 4.6 seconds. Find the probability that the mean length of time on hold in a random sample of 1,000 calls will be within 0.5 second of the population mean.

Solution: Since the sample size $n=1000>30$ is large enough, by the Central Limit Theorem, we know that the mean length of time is approximately normally distributed.

The mean of the sampling distribution is $\mu_{\bar{X}}=\mu=23.8$.
The standard deviation of the sampling distribution is $\mu_{\bar{X}}=\dfrac{\sigma}{\sqrt{n}}=\dfrac{4.6}{\sqrt{1000}}$.
Then the probability is $P(23.8-0.5<\bar{X}<23.8+0.5) = P(\bar{X}<24.3)-P(\bar{X}<23.3) \approx 0.9994$ which can be obtained by the following Excel formula: NORM.DIST(24.3, 23.8, 4.6/SQRT(1000),TRUE)-NORM.DIST(23.3, 23.8, 4.6/SQRT(1000),TRUE)

12 Example: Normal vs Sampling Distribution

Suppose speeds of vehicles on a particular stretch of roadway are normally distributed with mean 36.6 mph and standard deviation 1.7 mph.

Find the probability that the speed $X$ of a randomly selected vehicle is between 35 and 40 mph.
Find the probability that the mean speed $\bar{X}$ of 10 randomly selected vehicles is between 35 and 40 mph.

Solution: Since the population is normally distributed $\mu=36.6$ and $\sigma=1.7$, the sampling distribution of the sample mean is also normal distributed but with $\mu_{\bar{x}}=\mu=36.6$ and $\sigma_{\bar{X}}=\sigma/\sqrt{n}=1.7/\sqrt{10}$.

The probability that the speed of a vehicle is between 35 and 40 is $P(35< X< 40)=P(X< 40)-P(X<35)\approx 0.8039$ which can be obtained by NORM.DIST(40, 36.6, 1.7, TRUE)-NORM.DIST(35, 36.6, 1.7, TRUE).
The probability getting a sample of size 10 with the mean between 35 and 40 is $P(35<\bar{X}< 40)=P(\bar{X}< 40)-P(\bar{X}<35)\approx 0.9985$ which can be obtained by NORM.DIST(40, 36.6, 1.7/SQRT(10), TRUE)-NORM.DIST(35, 36.6, 1.7/SQRT(10), TRUE)

13 Sampling Distribution of a Sample Proportion

The proportion of a specific characteristic in a data set can be viewed as the mean of the data set by identifying the specific characteristic with 1 and others with $0$.

Example: Consider the following data set

1, 0, 1, 1, 0, 0, 1, 0, 1, 1

The proportion of red numbers is $\frac{6}{10}=0.6$ which is the same as the mean of the data set: $\frac{6\cdot 1 + 4\cdot 0}{10}=0.6$.
Consider a population consisting of 1s and 0s. Let $p$ be the proportion of 1s. Then standard deviation is $$\sigma=\sqrt{(1-p)^2p+(0-p)^2(1-p)}=\sqrt{p(1-p)}.$$

14 Central Limit Theorem for Proportion

For a sampling distribution of sample proportion, we write $\hat{P}$ for the random variable of sample proportions.

For large samples, the distribution of sample proportions $\hat{P}$ is approximately normal, with the mean $\mu_{\hat{P}}=p$ and standard deviation $\sigma_{\hat{P}}=\sqrt{\frac{p(1-p)}{n}}$, where $p$ is the population proportion.

15 Required Sample Size for Proportion

As a sample proportion is always between 0 and 1, and 99.7% of sample proportions lie within 3 standard deviation away from the population proportion, when using the central limit theorem for proportion, we require the sample size $n$ satisfying the following condition: the interval $\left[p-3\sqrt{\frac{p(1-p)}{n}}, p+3\sqrt{\frac{p(1-p)}{n}}\right]$ lies wholly in the interval $[0, 1]$.
In practice, if $n$ satisfies the following two inequalities: $np\ge 10$ and $n(1-p)\ge 10$, then we consider $n$ is large enough for assuming that the sampling distribution of the sample proportion is approximately normal.
When the population proportion $p$ is unknown, to apply the central limit theorem for proportion, we require the sample size $n$ satisfying the same conditions with $p$ replaced by the sample proportion $\hat{p}$. That is, the sample size $n$ should satisfies $n\hat{p}\ge 10$ and $n(1-\hat{p})\ge 10$.

16 Example: Sampling Voters

Suppose that in a population of voters in a certain region 53% are in favor of a particular law. Nine hundred randomly selected voters are asked if they favor the law.

Find the probability that the sample proportion computed from a random sample of size 900 will be at least 2% above true population proportion.

Solution: We first verify that the sampling distribution is approximately normal.

Since $p=0.53$ and $n=900$, $np=900\cdot 0.53>10$ and $n(1-p)=900(1-0.53)>10$. By the central limit theorem, the sampling distribution is approximately normal.

The standard deviation of the sampling distribution is $\sigma_{\hat{P}}=\sqrt{\frac{0.53(1-0.53)}{900}}\approx 0.017$.

Then the probability that the random sample has a proportion at least 2% above 53% is $P(\hat{P}>0.55)=1-P(\hat{P}\le 0.55)\approx 0.1197$ which can be obtained by 1-NORM.DIST(0.55, 0.53, SQRT(0.53*(1-0.53)/900),TRUE).

17 Example: Traffic Accidents

Suppose that in 36% of all car accidents involve injury. Find the probability that the injury rate in a random sample of 250 car accidents is between 30% and 45%.

Solution: The injury rate of all car accidents is $p=36\%=0.36$ and the sample size is $250$. Because $np=250\cdot 0.36=90>10$ and $n(1-p)=250-90=160>10$, the sample size is considered large enough. By the Central Limit Theorem, the sample proportion $\hat{P}$ is approximately normally distributed with the mean $\mu_{\hat{P}}=p=0.36$ and standard deviation $\sigma_{\hat{P}}=\sqrt{\frac{p(1-p)}{n}}\approx 0.03$

Then the probability of a random sample of 250 car accidents with the injury rate between 30% and 45% is $\textstyle P(0.30<\hat{P}<0.45)=P(\hat{P}<0.45)-P(\hat{P}<0.30)=$ =NORM.DIST(30%, 36%,0.03, TRUE)-NORM.DIST(45%, 36%,0.03, TRUE) $\approx 0.976$

Practice: Sample Mean of GPA

The numerical population of grade point averages at a college has mean 2.61 and standard deviation 0.5. If a random sample of size 100 is taken from the population, what is the probability that the sample mean will be between 2.51 and 2.71?

Source: Example 4 in Section 6.2 in Introductory Statistics

Practice: Proportion of Red Candy

Practice: Minimal Mean Weight of a Particular Fruit

More Practice

Practice: Sampling Unknown Population

A population has mean 73.5 and standard deviation 2.5.

Find the mean and standard deviation of $\bar{X}$ for samples of size 30.
Find the probability that the mean of a sample of size 30 will be less than 72.

Source: Exercise 3 in Section 6.2 in Introductory Statistics.

Practice: Sampling Normal Population

A normally distributed population has mean 57.7 and standard deviation 12.1.

Find the probability that a single randomly selected element X of the population is less than 45.
Find the mean and standard deviation of $\bar{X}$ for samples of size 16.
Find the probability that the mean of a sample of size 16 drawn from this population is less than 45.

Source: Exercise 6 in Section 6.2 in Introductory Statistics.

Practice: Cholesterol Level in Large Eggs

Suppose the mean amount of cholesterol in eggs labeled “large” is 186 milligrams, with standard deviation 7 milligrams. Find the probability that the mean amount of cholesterol in a sample of 144 eggs will be within 2 milligrams of the population mean.

Source: Exercise 15 in Section 6.2 in Introductory Statistics.

Practice: Color Blindness Rate

Suppose that 8% of all males suffer some form of color blindness. Find the probability that in a random sample of 250 men at least 10% will suffer some form of color blindness.

Source: Exercise 13 in Section 6.3 in Introductory Statistics.

Practice: Proportion of Voting

In a mayoral election, based on a poll, a newspaper reported that the current mayor received 45% of the vote. If this is true, what is the probability that a random sample of 100 voters had less than 35% voting for the current mayor?

Lab Instructions in Excel

18 The `NORM.DIST()` Function

Let $X$ be a normal random variable with mean $\mu$ and standard deviation $\sigma$, that is $X\sim \mathcal{N}(\mu, \sigma^2)$. In Excel, $P(X<x)$ is given by NORM.DIST(x, mean, sd, TRUE).
Recall the mean of a data set can obtained by the Excel function AVERAGE().
Given the population mean $\mu$ and standard deviation $\sigma$, if the sample size $n$ is bigger than 30 and the sample mean is $\bar{x}$. The probability of getting another sample of the same size but smaller mean can be obtained by the following Excel function: NORM.DIST( $\bar{x},\mu,\sigma$ /sqrt(n),TRUE).

Lab Practice: Testing an Airline’s Claim

An airline claims that 72% of all its flights to a certain region arrive on time. In a random sample of 30 recent arrivals, 19 were on time. You may assume that the normal distribution applies.

Compute the sample proportion.
Assuming the airline’s claim is true, find the probability of a sample of size 30 producing a sample proportion so low as was observed in this sample.

Source: Exercise 17 in Section 6.3 in Introductory Statistics.

Sampling Distributions

Fei Ye

November 2024

1 Learning Goals

2 Sampling Distribution

3 Sampling Distribution of a Discrete Variable

4 Sampling Distribution of a Continuous Variable

5 Sample Size Affects Standard Error

6 Central Limit Theorem for Mean

7 But What is the Central Limit Theorem?

8 Required Sample Size for Mean

9 Example: Sampling Distribution of Small Data (1 of 2)

10 Example: Sampling Distribution of Small Data(2 of 2)

11 Example: Mean Length of Time on Hold

12 Example: Normal vs Sampling Distribution

13 Sampling Distribution of a Sample Proportion

14 Central Limit Theorem for Proportion

15 Required Sample Size for Proportion

16 Example: Sampling Voters

17 Example: Traffic Accidents

Practice: Sample Mean of GPA

Practice: Proportion of Red Candy

Practice: Minimal Mean Weight of a Particular Fruit

Practice: Sampling Unknown Population

Practice: Sampling Normal Population

Practice: Cholesterol Level in Large Eggs

Practice: Color Blindness Rate

Practice: Proportion of Voting

18 The `NORM.DIST()` Function

Lab Practice: Testing an Airline’s Claim

Sampling Distributions

Fei Ye

November 2024

1 Learning Goals

2 Sampling Distribution

3 Sampling Distribution of a Discrete Variable

4 Sampling Distribution of a Continuous Variable

5 Sample Size Affects Standard Error

6 Central Limit Theorem for Mean

7 But What is the Central Limit Theorem?

8 Required Sample Size for Mean

9 Example: Sampling Distribution of Small Data (1 of 2)

10 Example: Sampling Distribution of Small Data(2 of 2)

11 Example: Mean Length of Time on Hold

12 Example: Normal vs Sampling Distribution

13 Sampling Distribution of a Sample Proportion

14 Central Limit Theorem for Proportion

15 Required Sample Size for Proportion

16 Example: Sampling Voters

17 Example: Traffic Accidents

Practice: Sample Mean of GPA

Practice: Proportion of Red Candy

Practice: Minimal Mean Weight of a Particular Fruit

Practice: Sampling Unknown Population

Practice: Sampling Normal Population

Practice: Cholesterol Level in Large Eggs

Practice: Color Blindness Rate

Practice: Proportion of Voting

18 The NORM.DIST() Function

Lab Practice: Testing an Airline’s Claim

18 The `NORM.DIST()` Function