Confidence Intervals for Proportion

1 Learning Goals for Confidence Intervals

Construct and interpret a confidence interval for one population proportion.
Describe how the following will affect the width of the confidence interval:
- increasing the sample size;
- increasing the confidence level.

2 Confidence Interval for a Proportion (1 of 2)

Recall that the standard error of sample proportions is $\sigma_{\hat{P}}=\sqrt{\frac{p(1-p)}{n}}$, where $n$ is the sample size and $p$ is the population proportion. As a consequence, when estimating the population proportion $p$, we only have a point estimate $\hat{p}$ (phat) to use. For the standard error, we use the estimation $\textstyle \sigma_{\hat{p}}\approx\hat{\sigma}_{\hat{p}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$ =SQRT(phat*(1-phat)/n).
Since the standard error is determined by $\hat{p}$, the standardization of $\hat{p}$ is determined completely by $\hat{p}$. If the sample proportion is approximately normally distributed, then we can use the standard normal distribution to find the critical value: $z_{\alpha/2}$ =NORM.S.INV(0.5+ Confidence Level/2).
The margin of error for $p$ is $$\textstyle E=\text{critical value}*\text{standard error}=z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}.$$

3 Confidence Interval for a Proportion (2 of 2)

The confidence interval for $p$ is defined by $$[\hat{p}-E,\hat{p}+E]=\left[\hat{p}-z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right].$$
In practice, the sample proportion is approximately normal if the sample size $n$ satisfies $n\hat{p}\ge 10$ and $n(1-\hat{p})=n-n\hat{p}\ge 10$.
The above defined confidence interval is known as the normal approximation (or Wald’s) confidence interval. It is popular in introductory statistics books. However, it is unreliable when the sample size is small or the sample proportion is close to 0 or 1. Indeed, if the sample proportion is 0 or 1, the confidence interval defined here will have zero length.

4 Distribution of Confidence Intervals

By the central limit theorem, the random variable $\hat{p}$ is normally distributed. The chance that $$p\in \left[\hat{p}-z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right]$$ is the same as the chance that $$\hat{p}\in \left[p-z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}, p+z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}\right].$$

It follows that $z_{\alpha/2}$ satisfies the following equation $$P(-z_{\alpha/2}<\dfrac{\hat{p}-p}{\sqrt{\frac{p(1-p)}{n}}}<z_{\alpha/2})=1-\alpha.$$

5 Example: Proportion of Students Taking Busses (1 of 2)

In a random sample of 100 students in college, 65 said that they come to college by bus.

Give a point estimate of the proportion of all students who come to college by bus.
Construct a 99% confidence interval for that proportion.

Solution: A good point estimate would be a sample proportion. Here the sample proportion is $\hat{p}=65/100=0.65$.

As $n\hat{p}=100\cdot 0.65=65>10$ and $n(1-\hat{p})=100\cdot 0.35=35>10$, which implies the sample is large enough, approximately the standard error is $$\hat{\sigma}_{\hat{P}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}=\sqrt{\frac{0.65(1-0.65)}{100}}\approx0.048.$$

6 Example: Proportion of Students Taking Busses (2 of 2)

At 99% level of confidence, $\alpha/2=(1-\text{confidence level})/2$, the critical value $z_{\alpha/2}$ is determined by the equation $P(Z<z_{\alpha/2})=1-\alpha/2=0.5+\text{confidence level}/2$. In Excel, the critical value is $z_{\alpha/2}$ =NORM.S.INV(0.5+99%/2) $\approx 2.576$.

The marginal error is $E=z_{\alpha/2}\cdot \hat{\sigma}_{\hat{P}}=2.576\cdot 0.048=0.123,$ and the confidence interval at 99% level is $$[\hat{p}-E, \hat{p}+E]\approx [0.65-0.123, 0.65+0.123]=[0.527, 0.773].$$

Conclusion: we are 99% confident that the proportion of all students at the college who take bus is in the interval $[0.527, 0.773]$.

}

7 Factors Affect the Width of Confidence Intervals

The width of a confidence interval, equals twice the standard error, gives a measure of precision of the estimation.
Recall, for population proportion and mean, $$\text{Marginal Error} = \text{Critical Value}\cdot \frac{\text{(estimated) Population SD}}{\sqrt{\text{Sample Size}}}$$
The formula tells us the precision of a confidence interval is affected by the confidence level, the variability, and the sample size.
- Larger the confidence levels give larger critical values and errors.
- Populations (and samples) with more variability gives larger errors.
- Larger sample sizes give smaller errors.

8 Sample Size Determination

In practice, we may desire a marginal error of $E$. With a fixed confidence level $100(1-\alpha)\%$, the larger the sample size the smaller the marginal error.
When estimating population proportion, if we can produce a reasonable guess $\hat{p}$ for population proportion, then an appropriate minimum sample size for the study is determined by $$n=\left(\frac{z_{\alpha/2}}{{E}}\right)^2\cdot \hat{p}(1-\hat{p}).$$
When estimating population mean, if we can produce a reasonable guess $\sigma$ for the population standard deviation, then an appropriate minimum sample size is given by $$n=\left(\dfrac{z_{\alpha/2}\cdot \sigma}{{E}}\right)^2.$$

9 Example: Minimum Sample Size - Error in Proportion

Suppose you want to estimate the proportion of students at QCC who live in Queens. By surveying your classmates, you find around 70% live in Queens. Use this as a guess to determine how many students would need to be included in a random sample if you wanted the error of margin for a 95% confidence interval to be less than or equal to 2%.

Solution: We may use $\hat{p}=0.7$ as a reasonable guess for the population proportion.

At the 95% level, the critical value is $z_{\alpha/2}=$ NORM.S.INV(0.5+0.95/2) $\approx 1.96$.

Since the marginal error is $E=0.02$, the appropriate minimal sample size is determined by $$n=\left(\frac{z_{\alpha/2}}{{E}}\right)^2\cdot \hat{p}(1-\hat{p})=(1.96/0.02)^2\cdot 0.7\cdot(1-0.7)=2016.84.$$

Since the sample size has to be an integer, to get a error no more than 2% at the level 95%, the minimal sample size should be at least 2017.

10 Example: Minimum Sample Size - Error in Mean

Find the minimum sample size necessary to construct a 99% confidence interval for the population mean with a margin of error $E =0.2$. Assume that the estimated population standard deviation is $\sigma=1.3$.

Solution: At the 99% level, the critical value $z_{\alpha/2}=$ NORM.S.INV(0.5+0.99/2) $\approx 2.576$.

The desired marginal error is ${E}=0.2$.

The estimated population standard deviation is $\sigma=1.3$.

Then the minimal sample size is approximately $$n=\left(\dfrac{z_{\alpha/2}\cdot \sigma}{{E}}\right)^2\approx (2.576\cdot 1.3/0.2)^2 \approx 280.4.$$

To get a error no more than 0.2 at the level 95%, the minimal sample size should be at least 281.

Practice: Confidence Interval of Proportion of Kids

Practice: Confidence Intervals for Product Quality

To understand the reason for returned goods, the manager of a store examines the records on 40 products that were returned during the last year. Reasons were coded by 1 for “defective,” 2 for “unsatisfactory,” and 0 for all other reasons, with the results shown in the table.

0	0	0	0	2	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0
0	0	0	1	0	0	0	0	2	0	2	0	0	0	0	0	0	2	0	0

Give a point estimate of the proportion of all returns that are because of something wrong with the product, that is, either defective or performed unsatisfactorily.
Construct an 80% confidence interval for the proportion of all returns that are because of something wrong with the product.

Practice: Sample Size for Mean with Given Error

Practice: Sample Size for Proportion with Given Error

Practice: Confidence Interval of Proportion Given Table

Lab Instructions in Excel

11 Normal Distributions and Marginal Errors

When a cumulative probability $p=P(Z<z)$ of a standard normal random variable $Z$ is given, we can find $z$ using NORM.S.INV(p).
If a sample of size $n$ has the proportion $\hat{p}$ and the sampling distribution is approximately normal, the marginal error for the proportion can be obtained by the Excel function CONFIDENCE.NORM(1-confidence level, SQRT(phat*(1-phat)), n)

Lab Practice: Confidence Interval for Proportion

Foothill College’s athletic department wants to calculate the proportion of students who have attended a women’s basketball game at the college. They use student email addresses, randomly choose 220 students, and email them. Of the 145 who responded, 22 had attended a women’s basketball game.

Calculate and interpret the approximate 90% confidence interval for the proportion of all Foothill College students who have attended a women’s basketball game.