Construct and interpret a confidence intervals for one population proportion.
Describe how the following will affect the width of the confidence interval:
increasing the sample size;
increasing the confidence level.
Recall that the standard error of sample proportions is \(\sigma_{\hat{P}}=\sqrt{\frac{p(1-p)}{n}}\), where \(n\) is the sample size and \(p\) is the population proportion. As a consequence, when estimating the population proportion \(p\), we only have a point estimate \(\hat{p}\) (phat) to use. For the standard error, we use the estimation $$\textstyle \sigma_{\hat{p}}\approx\hat{\sigma}_{\hat{p}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}.$$
Based on the central limit theorem, when \(n\) is large enough, at the \(100(1-\alpha)\%\) level, the margin of error for \(p\) is defined as
$$\textstyle E=z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$
In Excel, \(z_{\alpha/2}\)=NORM.S.INV((1 + confidence level)/2)
. The marginal error can also be obtained by
CONFIDENCE.NORM(1-confidence level, SQRT(phat*(1-phat)/n, n)
.
The confidence interval for \(p\) is defined by $$[\hat{p}-E,\hat{p}+E]=\left[\hat{p}-z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right],$$ where the critical value \(z_{\alpha/2}\) satisfies that \(P(Z< z_{\alpha/2})=1-\alpha/2\) for the standard normal variable \(Z\).
In practical, the sample size \(n\) is considered large enough if \(n\hat{p}\ge 10\) and \(n(1-\hat{p})\ge 10\).
The above defined confidence interval is known as the normal approximation (or Wald’s) confidence interval. It is popular in introductory statistics books. However, it is unreliable when the sample size is small or the sample proportion is close to 0 or 1. Indeed, if the sample proportion is 0 or 1, the confidence interval defined here will have zero length.
By the central limit theorem, the random variable \(\hat{p}\) is normally distributed. The chance that $$p\in \left[\hat{p}-z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right]$$ is the same as the chance that $$\hat{p}\in \left[p-z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}, p+z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}\right].$$
It follows that \(z_{\alpha/2}\) satisfies the following equation $$P(-z_{\alpha/2}<\dfrac{\hat{p}-p}{\sqrt{\frac{p(1-p)}{n}}}<z_{\alpha/2})=1-\alpha.$$
In a random sample of 100 students in college, 65 said that they come to college by bus.
Give a point estimate of the proportion of all students who come to college by bus.
Construct a 99% confidence interval for that proportion.
Solution: A good point estimate would be a sample proportion. Here the sample proportion is \(\hat{p}=65/100=0.65\).
As \(n\hat{p}=100\cdot 0.65=65>10\) and \(n(1-\hat{p})=100\cdot 0.35=35>10\), which implies the sample is large enough, approximately the standard error is $$\hat{\sigma}_{\hat{P}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}=\sqrt{\frac{0.65(1-0.65)}{100}}\approx0.048.$$
At 99% level of confidence, the value \(1-\alpha/2=(1+\text{confidence level})/2=(1+0.99)/2=0.995\). The critical value \(z_{\alpha/2}\) is determined by the equation \(P(Z<z_{\alpha/2})=0.9955\). Using the Excel function NORM.S.INV(0.995)
, we find the critical value \(z_{\alpha/2}\approx 2.576\).
Thus the marginal error is \(E=z_{\alpha/2}\cdot \hat{\sigma}_{\hat{P}}=2.576\cdot 0.048=0.123,\) and the confidence interval at 99% level is $$[\hat{p}-E, \hat{p}+E]\approx [0.65-0.123, 0.65+0.123]=[0.527, 0.773].$$
Conclusion: we are 99% confident that the proportion of all students at the college who take bus is in the interval \([0.527, 0.773]\).
Note: The marginal error can also be obtained by the Excel function
CONFIDENCE.NORM(1-0.99, SQRT(65/100*(1-65/100)/100), 100)
.
The width of a confidence interval, equals twice the standard error, gives a measure of precision of the estimation.
Recall, for population proportion and mean, $$\text{Marginal Error} = \text{Critical Value}\cdot \frac{\text{(estimated) Population SD}}{\sqrt{\text{Sample Size}}}$$
The formula tells us the precision of a confidence interval is affected by the confidence level, the variability, and the sample size.
Larger the confidence levels give larger critical values and errors.
Populations (and samples) with more variability gives larger errors.
Larger sample sizes give smaller errors.
In practice, we may desire a marginal error of \(E\). With a fixed confidence level \(100(1-\alpha)\%\), the larger the sample size the smaller the marginal error.
When estimating population proportion, if we can produce a reasonable guess \(\hat{p}\) for population proportion, then an appropriate minimum sample size for the study is determined by $$n=\left(\frac{z_{\alpha/2}}{{E}}\right)^2\cdot \hat{p}(1-\hat{p}).$$
When estimating population mean, if we can produce a reasonable guess \(\sigma\) for the population standard deviation, then an appropriate minimum sample size is given by $$n=\left(\dfrac{z_{\alpha/2}\cdot \sigma}{{E}}\right)^2.$$
Suppose you want to estimate the proportion of students at QCC who live in Queens. By surveying your classmates, you find around 70% live in Queens. Use this as a guess to determine how many students would need to be included in a random sample if you wanted the error of margin for a 95% confidence interval to be less than or equal to 2%.
Solution: We may use \(\hat{p}=0.7\) as a reasonable guess for the population proportion.
At the 95% level, the critical value is \(z_{\alpha/2}=\) NORM.S.INV((1+0.95)/2)
\(\approx 1.96\).
Since the marginal error is \(E=0.02\), the appropriate minimal sample size is determined by $$n=\left(\frac{z_{\alpha/2}}{{E}}\right)^2\cdot \hat{p}(1-\hat{p})=(1.96/0.02)^2\cdot 0.7\cdot(1-0.7)=2016.84.$$
Since the sample size has to be an integer, to get a error no more than 2% at the level 95%, the minimal sample size should be at least 2017.
Find the minimum sample size necessary to construct a 99% confidence interval for the population mean with a margin of error \(E =0.2\). Assume that the estimated population standard deviation is \(\sigma=1.3\).
Solution: At the 99% level, the critical value \(z_{\alpha/2}=\) NORM.S.INV((1+0.99)/2)
\(\approx 2.576\).
The desired marginal error is \({E}=0.2\).
The estimated population standard deviation is \(\sigma=1.3\).
Then the minimal sample size is approximately $$n=\left(\dfrac{z_{\alpha/2}\cdot \sigma}{{E}}\right)^2\approx (2.576\cdot 1.3/0.2)^2 \approx 280.4.$$
To get a error no more than 0.2 at the level 95%, the minimal sample size should be at least 281.
To understand the reason for returned goods, the manager of a store examines the records on 40 products that were returned in the last year. Reasons were coded by 1 for “defective,” 2 for “unsatisfactory,” and 0 for all other reasons, with the results shown in the table.
0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
Give a point estimate of the proportion of all returns that are because of something wrong with the product, that is, either defective or performed unsatisfactorily.
Construct an 80% confidence interval for the proportion of all returns that are because of something wrong with the product.
Lab Instructions in Excel
When a cumulative probability \(p=P(Z<z)\) of a standard normal random variable \(Z\) is given, we can find \(z\) using NORM.S.INV(p)
.
If a sample of size \(n\) has the proportion \(\hat{p}\) and the sampling distribution is approximately normal, the marginal error for the proportion can be obtained by the Excel function
CONFIDENCE.NORM(1-confidence level, SQRT(phat*(1-phat)), n)
Foothill College’s athletic department wants to calculate the proportion of students who have attended a women’s basketball game at the college. They use student email addresses, randomly choose 220 students, and email them. Of the 145 who responded, 22 had attended a women’s basketball game.
Calculate and interpret the approximate 90% confidence interval for the proportion of all Foothill College students who have attended a women’s basketball game.