Determine whether the study meets the conditions under which inferences on a population parameter may be performed.
Demonstrate understating of confidence level \(1-\alpha\).
Explain when and why to use the normal distribution or the t-distribution for a given study.
Determine the appropriate degrees of freedom associated with the t-distribution.
Determine the critical values using tables or Excel functions.
Describe how the following will affect the width of the confidence interval:
Construct and interpret a confidence interval for one population mean.
When estimating a population parameter, we may consider the statistic of a random sample as an estimate of the population parameter. But we expect some chance error.
Estimating an unknown parameter by a single number calculated from a sample is called a point estimation. The single number (statistic) from the sample is called a point estimate.
Point estimate gives no indication of how reliable the estimate is or how large the error is.
From a box of 20 pencils of two colors, black and blue, 10 pencils were randomly drawn. 6 out of the 10 pencils are black. What proportion of black pencils are in the box.
Solution: Since the sample proportion is 0.6, one may make a point estimation that 60% of the box, or 12 are black pencils. However, we don’t know how close the sample proportion is to the population proportion.
To increase the chance, we estimate an unknown parameter using intervals that are obtained by adding chance errors to a point estimate.
Estimating an unknown parameter using an interval of values which likely contains the true value of the parameter is called an interval estimation. The interval is called an interval estimate.
The reliability of an interval estimate is measured by the probability \(1-\alpha\) that the interval estimate will capture the true value of the parameter. This probability \(1-\alpha\) is called the confidence level.
The 90%, 95% and 99% level of confidence are frequently used in statistical study. The 95% level of confidence is usually the standard choice of confidence level for scientific polls published in the media and online.
Recall that the standard error of a statistic, denoted by SE, is the standard deviation of the sampling distribution.
A randomly selected 100 students at a college have an average GPA 3.0. How likely does the interval \([3.0-2\cdot\text{SE}, 3.0+2\cdot\text{SE}]\) contain the average GPA \(\mu\) of that college?
Solution: The probability that the interval \([3.0-2\cdot\text{SE}, 3.0+2\cdot\text{SE}]\) contains the population mean \(\mu\) equals the probability that the sample statistic 3.0 lies in the interval \([\mu-2\cdot\text{SE}, \mu+2\cdot\text{SE}]\). Since, \([\mu-2\cdot\text{SE}, \mu+2\cdot\text{SE}]\) contains 95.5% of data of the population.
That means, we can be 95.5% confidence that the average GPA \(\mu\) of that college is in the interval \([3.0-2\cdot\text{SE}, 3.0+2\cdot\text{SE}]\).
When the sampling distribution of a statistic is approximately symmetric, we take interval estimates in the following form \([\text{Statistic}- \text{E}, \text{Statistic}+ \text{E}],\) where the value \(\text{E}\) is called the marginal error or margin of error.
Given a confidence level \(100(1-\alpha)\%\), the marginal error \(\text{E}\) is the value such that \(100(1-\alpha)\%\) of the intervals \([\text{Statistic}- \text{E}, \text{Statistic}+ \text{E}]\) contains the true parameter \(\mu_\text{par}\). Equivalently, the marginal error \(\text{E}\) is the value such that \(100(1-\alpha)\%\) of statistics are in the interval \([\mu_\text{par}- \text{E}, \mu_\text{par}+ \text{E}]\).
Denote by \(X\) the random variable for the sample statistic. Then \(\text{E}\) is determined the following probability equation $$P(\mu_\text{par}-\text{E}< X < \mu_\text{par}+\text{E})=1-\alpha.$$
If the distribution of \(X\) is symmetric, then \(E\) satisfies \(P(X-\mu_\text{par}<\text{E})=1-\alpha/2.\)
Because the parameter \(\mu_\text{par}\) is unknown. If we standardize the random variable \(X\) by \(Z=\frac{X-\mu_\text{par}}{\text{SE}}\), we get $$\textstyle P\left(-\frac{\text{E}}{\text{SE}}<Z<\frac{\text{E}}{\text{SE}}\right)=1-\alpha,$$ where the random variable \(Z\) has a mean \(0\) and standard deviation \(1\).
The above probability equation suggests the following formula $$\textstyle \text{Marginal Error}=\text{Critical value}\cdot \text{Standard Error},$$ where the critical value is the value \(z_{\alpha/2}\) so that \(P(-z_{\alpha/2}<Z<z_{\alpha/2})=1-\alpha\).
Let \(X\) be a point estimate, we call the interval \([X-z_{\alpha/2}\text{SE}, X+z_{\alpha/2}\text{SE}]\) a confidence interval at the \(100(1-\alpha)\%\) level of confidence.
Check if the central limit theorem applies: need \(n>30\) or that the population distribution is approximately normal.
Find the critical value \(z_{\alpha/2}\) for the given confidence level: \(z_{\alpha/2}=\) =NORM.S.INV(0.5+confidence level/2)
.
Find the standard error \(\text{SE}=\sigma/\sqrt{n}=\) =(Population SD)/SQRT(Sample Size)
.
Calculate the marginal error E (or ME): \(E=\text{critical value}*\text{standard error}\).
Find the left bound and right bound of the confidence interval: \(\text{Left bound}=\bar{x}-\text{E}\) and \(\text{Right bound}=\bar{x}+\text{E}\).
Draw a conclusion: With \(100(1-\alpha)\%\) confidence, we can conclude that the population mean is in the interval \([\bar{x}-\text{E}, \bar{x}+\text{E}]\).
The critical value \(z_{\alpha/2}\) satisfies that \(P(Z<z_{\alpha/2})=1-\alpha/2=(1+\text{confidence level})/2=0.5 + \text{confidence level}/2\) for the standard normal variable \(Z\).
A sample of size 15 drawn from a normally distributed population with the standard deviation 6. Find the critical value \(z_{\alpha/2}\) needed in construction of a confidence interval:
Solution: To find the critical value \(z_{\alpha/2}\) with a given confidence level, the Excel function =NORM.S.INV(0.5+confidence level/2)
can be used.
At the 90% level of confidence the critical value is
\(z_{\alpha/2}=\) =NORM.S.INV(0.5+0.9/2)
\(=1.6449\).
At the 98% level of confidence the critical value is
\(z_{\alpha/2}=\) =NORM.S.INV(0.5+0.9/2)
\(=2.3263\).
A random sample of 50 students from a college gives a mean GPA 2.51. Suppose the standard deviation of GPA of all students at the college is 0.43. Construct a 99% confidence interval for the mean GPA of all students at the college.
Solution: We first gather information from the question:
Now let’s find the critical value, the standard error, the margin of error, and bounds of the confidence interval.
NORM.S.INV(0.5+0.99/2)
\(\approx 2.576\)Conclusion: With 99% confidence, we can assure that the average GPA of all students is between \(2.35\) and \(2.67\). Conclusion: With 99% confidence, we may conclude that the mean GPA of all students at the college is between 2.35 and 2.67.
}When the population standard deviation is unknown, we may replace \(\sigma\) by the sample standard deviation \(s\) and use \(s/\sqrt{n}\) as an estimate to the standard error for the sampling distribution of the sample mean.
When we use the estimated standard error \(s / \sqrt{n}\) to build a confidence interval, the normal distribution may NOT be accurate for calculating the critical value.
If the random variable \(\bar{x}\) is approximately normal, then the random variable \(t=\dfrac{\bar{x}-\mu}{s / \sqrt{n}}\) has a Student’s \(t\)-distribution with the degree of freedom \(n-1\).
Unlike in the case of a sample proportion, the sample standard deviation \(s\) is not determined by the sample mean \(\bar{x}\).
This result was discovered by William Gosset, an employee of the Guinness brewing company, who published his result using the name Student.
The \(t\)-distributions is a family of curves, called \(t\)-curves, parameterized by the degrees of freedom.
The \(t\)-distribution has the following important properties.
In practice, when the sample size is large enough \(n>30\), some textbooks use normal distribution as an approximation for the Student \(t\)-distribution.
Check if the central limit theorem applies: need \(n>30\) or that the population distribution is approximately normal.
Find the critical value \(t_{\alpha/2}\) for the given confidence level using \(t\)-distribution: \(t_{\alpha/2}=\) =T.INV(0.5+confidence level/2, n-1)
.
Find the estimated standard error \(\text{SE}=s/\sqrt{n}=\) =(Sample SD)/SQRT(Sample Size)
.
Calculate the marginal error E (or ME): \(E=\text{critical value}*\text{standard error}\).
Find the left bound and right bound of the confidence interval: \(\text{Left bound}=\bar{x}-\text{E}\) and \(\text{Right bound}=\bar{x}+\text{E}\).
Draw a conclusion: With \(100(1-\alpha)\%\) confidence, we can conclude that the population mean is in the interval \([\bar{x}-\text{E}, \bar{x}+\text{E}]\).
A sample of size 15 drawn from a normally distributed population. Find the critical value \(t_{\alpha/2}\) needed in construction of a confidence interval:
Solution: To find the critical value \(t_{\alpha/2}\), we may use the Excel function T.INV(left tail area, df)
or T.INV.2T(tail areas, df)
.
Since the confidence level is \(1-\alpha=0.99\), the critical value is
\(t_{\alpha/2}\) ==T.INV(0.5+0.99/2, 15-1)
=2.9768.
Since the confidence level is \(1-\alpha=0.95\), the critical value is
\(t_{\alpha/2}\) ==T.INV(0.5+0.95/2, 15-1)
=2.1448.
A sample of size 16 is randomly drawn from a normally distributed population. The sample has a mean 79 and standard deviation 7. Construct a confidence interval for that population mean at the 90% level of confidence.
Solution: Since the population is normally distributed, and the population standard deviation is unknown, we apply the formula \(\text{E}=t_{\alpha/2}\cdot\dfrac{s}{\sqrt{n}}\) for marginal error.
At 90% confidence level, the critical value is \(t_{\alpha/2}=\) T.INV(0.5+0.9/2, 16-1)
\(\approx 1.753\).
Then the marginal error is \(\text{E}=1.753\cdot 7/\sqrt{16}\approx 3\). Thus \(\bar{x}-\text{E}=79-3=76\) and \(\bar{x}+\text{E}=79+3=82\).
With 90% confidence, we may conclude that the population mean is in the interval \([76, 82]\).
}The data blow shows numbers of hours worked from 40 randomly selected employees from several grocery stores in the county.
30 | 26 | 33 | 26 | 26 | 33 | 31 | 31 | 21 | 37 | 27 | 20 | 34 | 35 | 30 | 24 | 38 | 34 | 39 | 31 |
22 | 30 | 23 | 23 | 31 | 44 | 31 | 33 | 33 | 26 | 27 | 28 | 25 | 35 | 23 | 32 | 29 | 31 | 25 | 27 |
Construct 99% confidence interval for the mean worked time.
Solution: Since the sample size is 40 (>30), by the central limit theorem, the sample mean is approximately normally distributed.
Applying the Excel functions AVERAGE()
and STDEV.S()
to the data, we find that the sample mean \(\bar{x}\approx 29.6\) and the sample standard deviation \(s\approx 5.3\).
Since the population standard deviation is unknown, we use the \(t\)-distribution to find the critical value \(t_{\alpha/2}=\) T.INV(0.5+0.99/2, 40-1)
\(\approx 2.7\). The marginal error is \(\text{E}=t_{\alpha/2}\cdot\dfrac{s}{\sqrt{n}}=\) =T.INV(0.5+0.99/2, 40-1)*STDEV.S(5.3)/SQRT(40)
\(\approx 2.3\). Thus, \(\bar{x}-\text{E}=29.6-2.3=27.3\) and \(\bar{x}+\text{E}=29.6+2.3=31.9\)
With a 99% confidence, one may conclude that the average worked hours of employees in all grocery stores is between 27.3 and 31.9 hours.
Population is approximately normally distributed.
Population distribution unknown, but sample size is large enough, i.e. \(n>30\).
Warning: when the population distribution unknown and the sample size is small, either the \(t\)-distribution nor the normal distribution is reliable.
Decide whether the following statements are true or false. Explain your reasoning.
Lab Instructions in Excel
Suppose a Student’s \(t\)-distribution has the degree of freedom \(\text{df}=n-1\).
Find a probability for a given \(t\)-value.
The area of the left tail of the \(t\)-value may be calculated by the function T.DIST(t,df,true)
.
The area of the right tail of the \(t\)-value may be calculated by the function T.DIST.RT(t,df)
.
The area of two tails of the \(t\)-value (here \(t\)>0) may be calculated by function T.DIST.2T(t,df)
.
Find the critical value for a given probability \(p\).
When the area of the left tail is given, the function T.INV(p,df)
may be used.
When the area of both tails is given, the function T.INV.2T(p,df)
may be used. This function is good for construction confidence interval.
If the population standard deviation \(\sigma\) is given and the sampling distribution is approximately normal, the marginal error can be obtained by the Excel function
CONFIDENCE.NORM(1-confidence level, population SD, sample size)
If the population standard deviation \(\sigma\) is NOT given and the sampling distribution is approximately normal, the marginal error can be obtained by the Excel function, the marginal error can be obtained by the Excel function
CONFIDENCE.T(1-confidence level, sample SD, sample size)
Four hundred randomly selected working adults in a certain state, including those who worked at home, were asked the distance from their home to their workplace. The average distance was 8.84 miles with standard deviation 2.70 miles.
Construct a 98% confidence interval for the mean distance from home to work for all residents of this state.