Sampling Distributions

Fei Ye

November 2024

1 Learning Goals


2 Sampling Distribution


3 Sampling Distribution of a Discrete Variable

Source: https://istats.shinyapps.io/SampDist_discrete/


4 Sampling Distribution of a Continuous Variable

Source: https://istats.shinyapps.io/sampdist_cont/


5 Sample Size Affects Standard Error


6 Central Limit Theorem for Mean

As the sample size \(n\) increases, the sampling distribution of the sample mean, from a population with the mean \(\mu\) and the standard deviation \(\sigma\), will approach to a normal distribution with the mean \(\mu_{\bar{X}}=\mu\) and the standard deviation \(\sigma_{\bar{X}}=\dfrac{\sigma}{\sqrt{n}}\).

In terms of standardization, the central limit theorem says that the random variable \(\bar{Z}=\dfrac{\bar{x}-\mu}{\sigma/\sqrt{n}}\) has an approximately standard normal distribution.


7 But What is the Central Limit Theorem?


8 Required Sample Size for Mean

See https://stats.stackexchange.com/questions/3734 for a discussion on intuitive explanation.


9 Example: Sampling Distribution of Small Data (1 of 2)

Randomly draw samples of size 2 with replacement from the numbers 1, 3, 4.

Solution: Using the Excel function AVERAGE(), we may find means of samples and means of sample means.

Using the Excel function STDEV.P(), we may find the standard deviation of the population and the standard deviation of sample means.

\(\color{red}{\mu}\) \(\color{red}{\sigma}\) \(\color{blue}{\mu_{\bar{X}}}\) \(\color{blue}{\sigma_{\bar{X}}}\)
2.7 1.25 2.7 0.88
sample (1,1) (1,3) (1,4) (3,1) (3,3) (3,4) (4,1) (4,3) (4,4)
\(\bar{X}\) 1 2 2.5 2 3 3.5 2.5 3.5 4

It can be verified that \(\mu_{\bar{X}}=\mu\) and \(\sigma/\sqrt{n}=1.25/\sqrt{2}\approx 0.88=\sigma_{\bar{X}}\).


10 Example: Sampling Distribution of Small Data(2 of 2)

The following are the distribution of the population and the distribution of sample means.


11 Example: Mean Length of Time on Hold

Suppose the mean length of time that a caller is placed on hold when telephoning a customer service center is 23.8 seconds, with standard deviation 4.6 seconds. Find the probability that the mean length of time on hold in a random sample of 1,000 calls will be within 0.5 second of the population mean.

Solution: Since the sample size \(n=1000>30\) is large enough, by the Central Limit Theorem, we know that the mean length of time is approximately normally distributed.


12 Example: Normal vs Sampling Distribution

Suppose speeds of vehicles on a particular stretch of roadway are normally distributed with mean 36.6 mph and standard deviation 1.7 mph.

Solution: Since the population is normally distributed \(\mu=36.6\) and \(\sigma=1.7\), the sampling distribution of the sample mean is also normal distributed but with \(\mu_{\bar{x}}=\mu=36.6\) and \(\sigma_{\bar{X}}=\sigma/\sqrt{n}=1.7/\sqrt{10}\).


13 Sampling Distribution of a Sample Proportion


14 Central Limit Theorem for Proportion

For a sampling distribution of sample proportion, we write \(\hat{P}\) for the random variable of sample proportions.

For large samples, the distribution of sample proportions \(\hat{P}\) is approximately normal, with the mean \(\mu_{\hat{P}}=p\) and standard deviation \(\sigma_{\hat{P}}=\sqrt{\frac{p(1-p)}{n}}\), where \(p\) is the population proportion.


15 Required Sample Size for Proportion


16 Example: Sampling Voters

Suppose that in a population of voters in a certain region 53% are in favor of a particular law. Nine hundred randomly selected voters are asked if they favor the law.

Find the probability that the sample proportion computed from a random sample of size 900 will be at least 2% above true population proportion.

Solution: We first verify that the sampling distribution is approximately normal.

Since \(p=0.53\) and \(n=900\), \(np=900\cdot 0.53>10\) and \(n(1-p)=900(1-0.53)>10\). By the central limit theorem, the sampling distribution is approximately normal.

The standard deviation of the sampling distribution is \(\sigma_{\hat{P}}=\sqrt{\frac{0.53(1-0.53)}{900}}\approx 0.017\).

Then the probability that the random sample has a proportion at least 2% above 53% is \(P(\hat{P}>0.55)=1-P(\hat{P}\le 0.55)\approx 0.1197\) which can be obtained by 1-NORM.DIST(0.55, 0.53, SQRT(0.53*(1-0.53)/900),TRUE).


17 Example: Traffic Accidents

Suppose that in 36% of all car accidents involve injury. Find the probability that the injury rate in a random sample of 250 car accidents is between 30% and 45%.

Solution: The injury rate of all car accidents is \(p=36\%=0.36\) and the sample size is \(250\). Because \(np=250\cdot 0.36=90>10\) and \(n(1-p)=250-90=160>10\), the sample size is considered large enough. By the Central Limit Theorem, the sample proportion \(\hat{P}\) is approximately normally distributed with the mean \(\mu_{\hat{P}}=p=0.36\) and standard deviation \(\sigma_{\hat{P}}=\sqrt{\frac{p(1-p)}{n}}\approx 0.03\)

Then the probability of a random sample of 250 car accidents with the injury rate between 30% and 45% is \(\textstyle P(0.30<\hat{P}<0.45)=P(\hat{P}<0.45)-P(\hat{P}<0.30)=\) =NORM.DIST(30%, 36%,0.03, TRUE)-NORM.DIST(45%, 36%,0.03, TRUE) \(\approx 0.976\)


Practice: Sample Mean of GPA

The numerical population of grade point averages at a college has mean 2.61 and standard deviation 0.5. If a random sample of size 100 is taken from the population, what is the probability that the sample mean will be between 2.51 and 2.71?

Source: Example 4 in Section 6.2 in Introductory Statistics


Practice: Proportion of Red Candy


Practice: Minimal Mean Weight of a Particular Fruit


More Practice


Practice: Sampling Unknown Population

A population has mean 73.5 and standard deviation 2.5.

  1. Find the mean and standard deviation of \(\bar{X}\) for samples of size 30.
  2. Find the probability that the mean of a sample of size 30 will be less than 72.

Source: Exercise 3 in Section 6.2 in Introductory Statistics.


Practice: Sampling Normal Population

A normally distributed population has mean 57.7 and standard deviation 12.1.

  1. Find the probability that a single randomly selected element X of the population is less than 45.
  2. Find the mean and standard deviation of \(\bar{X}\) for samples of size 16.
  3. Find the probability that the mean of a sample of size 16 drawn from this population is less than 45.

Source: Exercise 6 in Section 6.2 in Introductory Statistics.


Practice: Cholesterol Level in Large Eggs

Suppose the mean amount of cholesterol in eggs labeled “large” is 186 milligrams, with standard deviation 7 milligrams. Find the probability that the mean amount of cholesterol in a sample of 144 eggs will be within 2 milligrams of the population mean.

Source: Exercise 15 in Section 6.2 in Introductory Statistics.


Practice: Color Blindness Rate

Suppose that 8% of all males suffer some form of color blindness. Find the probability that in a random sample of 250 men at least 10% will suffer some form of color blindness.

Source: Exercise 13 in Section 6.3 in Introductory Statistics.


Practice: Proportion of Voting

In a mayoral election, based on a poll, a newspaper reported that the current mayor received 45% of the vote. If this is true, what is the probability that a random sample of 100 voters had less than 35% voting for the current mayor?


Lab Instructions in Excel


18 The NORM.DIST() Function


Lab Practice: Testing an Airline’s Claim

An airline claims that 72% of all its flights to a certain region arrive on time. In a random sample of 30 recent arrivals, 19 were on time. You may assume that the normal distribution applies.

  1. Compute the sample proportion.
  2. Assuming the airline’s claim is true, find the probability of a sample of size 30 producing a sample proportion so low as was observed in this sample.

Source: Exercise 17 in Section 6.3 in Introductory Statistics.