Confidence Intervals for Mean

1 Learning Goals

Determine whether the study meets the conditions under which inferences on a population parameter may be performed.
Demonstrate understating of confidence level $1-\alpha$.
Explain when and why to use the normal distribution or the t-distribution for a given study.
Determine the appropriate degrees of freedom associated with the t-distribution.
Determine the critical values using tables or Excel functions.
Describe how the following will affect the width of the confidence interval:
- increasing the sample size;
- increasing the confidence level.
Construct and interpret a confidence interval for one population mean.

2 Point Estimation

When estimating a population parameter, we may consider the statistic of a random sample as an estimate of the population parameter. But we expect some chance error.
Estimating an unknown parameter by a single number calculated from a sample is called a point estimation. The single number (statistic) from the sample is called a point estimate.
Point estimate gives no indication of how reliable the estimate is or how large the error is.

3 Example: Estimating Population Proportion

From a box of 20 pencils of two colors, black and blue, 10 pencils were randomly drawn. 6 out of the 10 pencils are black. What proportion of black pencils are in the box.

Solution: Since the sample proportion is 0.6, one may make a point estimation that 60% of the box, or 12 are black pencils. However, we don’t know how close the sample proportion is to the population proportion.

4 Interval Estimation

To increase the chance, we estimate an unknown parameter using intervals that are obtained by adding chance errors to a point estimate.
Estimating an unknown parameter using an interval of values which likely contains the true value of the parameter is called an interval estimation. The interval is called an interval estimate.
The reliability of an interval estimate is measured by the probability $1-\alpha$ that the interval estimate will capture the true value of the parameter. This probability $1-\alpha$ is called the confidence level.
The 90%, 95% and 99% level of confidence are frequently used in statistical study. The 95% level of confidence is usually the standard choice of confidence level for scientific polls published in the media and online.

5 Example: Interval Estimate of Average GPA

Recall that the standard error of a statistic, denoted by SE, is the standard deviation of the sampling distribution.

A randomly selected 100 students at a college have an average GPA 3.0. How likely does the interval $[3.0-2\cdot\text{SE}, 3.0+2\cdot\text{SE}]$ contain the average GPA $\mu$ of that college?

Solution: The probability that the interval $[3.0-2\cdot\text{SE}, 3.0+2\cdot\text{SE}]$ contains the population mean $\mu$ equals the probability that the sample statistic 3.0 lies in the interval $[\mu-2\cdot\text{SE}, \mu+2\cdot\text{SE}]$. Since, $[\mu-2\cdot\text{SE}, \mu+2\cdot\text{SE}]$ contains 95.5% of data of the population.

That means, we can be 95.5% confidence that the average GPA $\mu$ of that college is in the interval $[3.0-2\cdot\text{SE}, 3.0+2\cdot\text{SE}]$.

6 Confidence Interval (1 of 2)

When the sampling distribution of a statistic is approximately symmetric, we take interval estimates in the following form $[\text{Statistic}- \text{E}, \text{Statistic}+ \text{E}],$ where the value $\text{E}$ is called the marginal error or margin of error.
Given a confidence level $100(1-\alpha)\%$, the marginal error $\text{E}$ is the value such that $100(1-\alpha)\%$ of the intervals $[\text{Statistic}- \text{E}, \text{Statistic}+ \text{E}]$ contains the true parameter $\mu_\text{par}$. Equivalently, the marginal error $\text{E}$ is the value such that $100(1-\alpha)\%$ of statistics are in the interval $[\mu_\text{par}- \text{E}, \mu_\text{par}+ \text{E}]$.
Denote by $X$ the random variable for the sample statistic. Then $\text{E}$ is determined the following probability equation $$P(\mu_\text{par}-\text{E}< X < \mu_\text{par}+\text{E})=1-\alpha.$$

If the distribution of $X$ is symmetric, then $E$ satisfies $P(X-\mu_\text{par}<\text{E})=1-\alpha/2.$

7 Confidence interval (2 of 2)

Because the parameter $\mu_\text{par}$ is unknown. If we standardize the random variable $X$ by $Z=\frac{X-\mu_\text{par}}{\text{SE}}$, we get $$\textstyle P\left(-\frac{\text{E}}{\text{SE}}<Z<\frac{\text{E}}{\text{SE}}\right)=1-\alpha,$$ where the random variable $Z$ has a mean $0$ and standard deviation $1$.
The above probability equation suggests the following formula $$\textstyle \text{Marginal Error}=\text{Critical value}\cdot \text{Standard Error},$$ where the critical value is the value $z_{\alpha/2}$ so that $P(-z_{\alpha/2}<Z<z_{\alpha/2})=1-\alpha$.
Let $X$ be a point estimate, we call the interval $[X-z_{\alpha/2}\text{SE}, X+z_{\alpha/2}\text{SE}]$ a confidence interval at the $100(1-\alpha)\%$ level of confidence.

8 Distribution of Confidence Intervals

9 Confidence Interval with Known Population SD

Check if the central limit theorem applies: need $n>30$ or that the population distribution is approximately normal.
Find the critical value $z_{\alpha/2}$ for the given confidence level: $z_{\alpha/2}=$ =NORM.S.INV(0.5+confidence level/2).
Find the standard error $\text{SE}=\sigma/\sqrt{n}=$ =(Population SD)/SQRT(Sample Size).
Calculate the marginal error E (or ME): $E=\text{critical value}*\text{standard error}$.
Find the left bound and right bound of the confidence interval: $\text{Left bound}=\bar{x}-\text{E}$ and $\text{Right bound}=\bar{x}+\text{E}$.
Draw a conclusion: With $100(1-\alpha)\%$ confidence, we can conclude that the population mean is in the interval $[\bar{x}-\text{E}, \bar{x}+\text{E}]$.

The critical value $z_{\alpha/2}$ satisfies that $P(Z<z_{\alpha/2})=1-\alpha/2=(1+\text{confidence level})/2=0.5 + \text{confidence level}/2$ for the standard normal variable $Z$.

10 Example: Find Critical Values

A sample of size 15 drawn from a normally distributed population with the standard deviation 6. Find the critical value $z_{\alpha/2}$ needed in construction of a confidence interval:

when the level of confidence is 90%;
when the level of confidence is 98%.

Solution: To find the critical value $z_{\alpha/2}$ with a given confidence level, the Excel function =NORM.S.INV(0.5+confidence level/2) can be used.

At the 90% level of confidence the critical value is

$z_{\alpha/2}=$ =NORM.S.INV(0.5+0.9/2) $=1.6449$.
At the 98% level of confidence the critical value is

$z_{\alpha/2}=$ =NORM.S.INV(0.5+0.9/2) $=2.3263$.

Checkout this normal distribution interactive app

11 Example: Mean GPA (1 of 2)

A random sample of 50 students from a college gives a mean GPA 2.51. Suppose the standard deviation of GPA of all students at the college is 0.43. Construct a 99% confidence interval for the mean GPA of all students at the college.

Solution: We first gather information from the question:

The sample size is $n=50$,
The sample mean is $\bar{x}=2.51$,
The population standard deviation is $\sigma=0.43$, and
The confidence level is $1-\alpha=0.99$.

12 Example: Mean GPA (2 of 2)

Now let’s find the critical value, the standard error, the margin of error, and bounds of the confidence interval.

The critical value $z_{\alpha/2}$=NORM.S.INV(0.5+0.99/2) $\approx 2.576$
The standard error is $\sigma_{\bar{x}}=\sigma/\sqrt{n}=0.43/\sqrt{50}\approx 0.06.$
The marginal error is $\text{E}=z_{\alpha/2}\cdot\sigma_{\bar{x}}=2.576\cdot 0.06\approx 0.16.$
The left bound of the confidence interval is $\bar{x}-\text{E}=2.51-0.16=2.35$ and the right bound is $\bar{x}+\text{E}=2.51+0.16=2.67$.
The marginal error is $\text{E}=z_{\alpha/2}\cdot\sigma_{\bar{x}}=2.576\cdot 0.06\approx 0.16.$
The lower bound of the confidence interval is $\bar{x}-\text{E}=2.51-0.16=2.35$.
The upper bound of the confidence interval is $\bar{x}+\text{E}=2.51+0.16=2.67$.

Conclusion: With 99% confidence, we can assure that the average GPA of all students is between $2.35$ and $2.67$. Conclusion: With 99% confidence, we may conclude that the mean GPA of all students at the college is between 2.35 and 2.67.

}

13 Student’s $t$-Distribution

When the population standard deviation is unknown, we may replace $\sigma$ by the sample standard deviation $s$ and use $s/\sqrt{n}$ as an estimate to the standard error for the sampling distribution of the sample mean.
When we use the estimated standard error $s / \sqrt{n}$ to build a confidence interval, the normal distribution may NOT be accurate for calculating the critical value.
If the random variable $\bar{x}$ is approximately normal, then the random variable $t=\dfrac{\bar{x}-\mu}{s / \sqrt{n}}$ has a Student’s $t$-distribution with the degree of freedom $n-1$.

t-curves

14 Remarks

Unlike in the case of a sample proportion, the sample standard deviation $s$ is not determined by the sample mean $\bar{x}$.
This result was discovered by William Gosset, an employee of the Guinness brewing company, who published his result using the name Student.

15 Properties of Student’s $t$-Distribution

The $t$-distributions is a family of curves, called $t$-curves, parameterized by the degrees of freedom.
The $t$-distribution has the following important properties.
1. Similar to the standard normal curve, it is symmetric about 0 and the total area under a $t$-curve is 1.
2. The $t$-distribution has slightly more variation (i.e. $t$-curves are slightly “fatter”) than the standard normal distribution.
3. When the degree of freedom increases, the $t$-distribution becomes closer to the standard normal distribution.
In practice, when the sample size is large enough $n>30$, some textbooks use normal distribution as an approximation for the Student $t$-distribution.

16 Visualization: $t$-distributions

Source: https://rpsychologist.com/d3/tdist/

17 Confidence Intervals with Unknown Population SD

Check if the central limit theorem applies: need $n>30$ or that the population distribution is approximately normal.
Find the critical value $t_{\alpha/2}$ for the given confidence level using $t$-distribution: $t_{\alpha/2}=$ =T.INV(0.5+confidence level/2, n-1).
Find the estimated standard error $\text{SE}=s/\sqrt{n}=$ =(Sample SD)/SQRT(Sample Size).
Calculate the marginal error E (or ME): $E=\text{critical value}*\text{standard error}$.
Find the left bound and right bound of the confidence interval: $\text{Left bound}=\bar{x}-\text{E}$ and $\text{Right bound}=\bar{x}+\text{E}$.
Draw a conclusion: With $100(1-\alpha)\%$ confidence, we can conclude that the population mean is in the interval $[\bar{x}-\text{E}, \bar{x}+\text{E}]$.

18 Example: Critical Values for $t$-Distributions

A sample of size 15 drawn from a normally distributed population. Find the critical value $t_{\alpha/2}$ needed in construction of a confidence interval:

when the level of confidence is 99%;
when the level of confidence is 95%.

Solution: To find the critical value $t_{\alpha/2}$, we may use the Excel function T.INV(left tail area, df) or T.INV.2T(tail areas, df).

Since the confidence level is $1-\alpha=0.99$, the critical value is $t_{\alpha/2}$ ==T.INV(0.5+0.99/2, 15-1)=2.9768.
Since the confidence level is $1-\alpha=0.95$, the critical value is $t_{\alpha/2}$ ==T.INV(0.5+0.95/2, 15-1)=2.1448.

Checkout this t-distribution interactive app

19 Example: Confidence Interval with Unknown $\sigma$

A sample of size 16 is randomly drawn from a normally distributed population. The sample has a mean 79 and standard deviation 7. Construct a confidence interval for that population mean at the 90% level of confidence.

Solution: Since the population is normally distributed, and the population standard deviation is unknown, we apply the formula $\text{E}=t_{\alpha/2}\cdot\dfrac{s}{\sqrt{n}}$ for marginal error.

At 90% confidence level, the critical value is $t_{\alpha/2}=$ T.INV(0.5+0.9/2, 16-1) $\approx 1.753$.

Then the marginal error is $\text{E}=1.753\cdot 7/\sqrt{16}\approx 3$. Thus $\bar{x}-\text{E}=79-3=76$ and $\bar{x}+\text{E}=79+3=82$.

With 90% confidence, we may conclude that the population mean is in the interval $[76, 82]$.

}

20 Example: Average Working Hours

The data blow shows numbers of hours worked from 40 randomly selected employees from several grocery stores in the county.

30	26	33	26	26	33	31	31	21	37	27	20	34	35	30	24	38	34	39	31
22	30	23	23	31	44	31	33	33	26	27	28	25	35	23	32	29	31	25	27

Construct 99% confidence interval for the mean worked time.

Solution: Since the sample size is 40 (>30), by the central limit theorem, the sample mean is approximately normally distributed. Applying the Excel functions AVERAGE() and STDEV.S() to the data, we find that the sample mean $\bar{x}\approx 29.6$ and the sample standard deviation $s\approx 5.3$.

Since the population standard deviation is unknown, we use the $t$-distribution to find the critical value $t_{\alpha/2}=$ T.INV(0.5+0.99/2, 40-1) $\approx 2.7$. The marginal error is $\text{E}=t_{\alpha/2}\cdot\dfrac{s}{\sqrt{n}}=$ =T.INV(0.5+0.99/2, 40-1)*STDEV.S(5.3)/SQRT(40) $\approx 2.3$. Thus, $\bar{x}-\text{E}=29.6-2.3=27.3$ and $\bar{x}+\text{E}=29.6+2.3=31.9$

With a 99% confidence, one may conclude that the average worked hours of employees in all grocery stores is between 27.3 and 31.9 hours.

21 Choose Normal Distribution or $t$-Distribution

Population is approximately normally distributed.
- the population standard deviation $\sigma$ is known: use the normal distribution.
- the population standard deviation $\sigma$ is unknown: use the $t$-distribution.
Population distribution unknown, but sample size is large enough, i.e. $n>30$.
- the population standard deviation $\sigma$ is known: use normal distribution.
- the population standard deviation $\sigma$ is unknown: either one can be used but the $t$-distribution is more accurate.
Warning: when the population distribution unknown and the sample size is small, either the $t$-distribution nor the normal distribution is reliable.

For small samples, there is method called “The Shapiro–Wilk test” which can be used to determine if we may assume the sampling distribution is approximately normal.
Even when $n>30$, a visual inspection of the normality is necessary.

Practice: Conceptual Questions on Confidence Intervals

Decide whether the following statements are true or false. Explain your reasoning.

The statement, “the 95% confidence interval for the population mean is (350, 400)” means that 95% of the population values are between 350 and 400.
For a given standard error, lower confidence levels produce wider confidence intervals.
If you increase sample size, the width of confidence intervals will increase.
If you take large random samples over and over again from the same population, and make 95% confidence intervals for the population average, about 95% of the intervals should contain the population average.

Source: Conceptual Questions on Confidence Intervals

Practice: Find the Critical $z$-Value

Practice: Find the Marginal Error with known $\sigma$

Practice: Confidence Interval for SAT Scores with Known $\sigma$

Practice: Find the Critical $t$-Value

Practice: Find the Marginal Error with Unknown $\sigma$

Practice: Confidence Interval from a Data Set

Lab Instructions in Excel

22 Excel Functions for $t$-Distributions

Suppose a Student’s $t$-distribution has the degree of freedom $\text{df}=n-1$.

Find a probability for a given $t$-value.
- The area of the left tail of the $t$-value may be calculated by the function T.DIST(t,df,true).
- The area of the right tail of the $t$-value may be calculated by the function T.DIST.RT(t,df).
- The area of two tails of the $t$-value (here $t$>0) may be calculated by function T.DIST.2T(t,df).
Find the critical value for a given probability $p$.
- When the area of the left tail is given, the function T.INV(p,df) may be used.
- When the area of both tails is given, the function T.INV.2T(p,df) may be used. This function is good for construction confidence interval.

23 Excel Functions for Marginal Errors

If the population standard deviation $\sigma$ is given and the sampling distribution is approximately normal, the marginal error can be obtained by the Excel function CONFIDENCE.NORM(1-confidence level, population SD, sample size)
If the population standard deviation $\sigma$ is NOT given and the sampling distribution is approximately normal, the marginal error can be obtained by the Excel function, the marginal error can be obtained by the Excel function CONFIDENCE.T(1-confidence level, sample SD, sample size)

Lab Practice: Confidence Interval Without $\sigma$

Four hundred randomly selected working adults in a certain state, including those who worked at home, were asked the distance from their home to their workplace. The average distance was 8.84 miles with standard deviation 2.70 miles.

Construct a 98% confidence interval for the mean distance from home to work for all residents of this state.

Source: Exercise 8 in Section 7.1 in Introductory Statistics

Confidence Intervals for Mean

Fei Ye

November 2024

1 Learning Goals

2 Point Estimation

3 Example: Estimating Population Proportion

4 Interval Estimation

5 Example: Interval Estimate of Average GPA

6 Confidence Interval (1 of 2)

7 Confidence interval (2 of 2)

8 Distribution of Confidence Intervals

9 Confidence Interval with Known Population SD

10 Example: Find Critical Values

11 Example: Mean GPA (1 of 2)

12 Example: Mean GPA (2 of 2)

13 Student’s \(t\)-Distribution

14 Remarks

15 Properties of Student’s \(t\)-Distribution

16 Visualization: \(t\)-distributions

17 Confidence Intervals with Unknown Population SD

18 Example: Critical Values for \(t\)-Distributions

19 Example: Confidence Interval with Unknown \(\sigma\)

20 Example: Average Working Hours

21 Choose Normal Distribution or \(t\)-Distribution

Practice: Conceptual Questions on Confidence Intervals

Practice: Find the Critical \(z\)-Value

Practice: Find the Marginal Error with known \(\sigma\)

Practice: Confidence Interval for SAT Scores with Known \(\sigma\)

Practice: Find the Critical \(t\)-Value

Practice: Find the Marginal Error with Unknown \(\sigma\)

Practice: Confidence Interval from a Data Set

22 Excel Functions for \(t\)-Distributions

23 Excel Functions for Marginal Errors

Lab Practice: Confidence Interval Without \(\sigma\)

30	26	33	26	26	33	31	31	21	37	27	20	34	35	30	24	38	34	39	31
22	30	23	23	31	44	31	33	33	26	27	28	25	35	23	32	29	31	25	27

30	26	33	26	26	33	31	31	21	37	27	20	34	35	30	24	38	34	39	31
22	30	23	23	31	44	31	33	33	26	27	28	25	35	23	32	29	31	25	27

30	26	33	26	26	33	31	31	21	37	27	20	34	35	30	24	38	34	39	31
22	30	23	23	31	44	31	33	33	26	27	28	25	35	23	32	29	31	25	27