Hypothesis Testing for Mean and Proportion

Fei Ye

September 2023

1. Learning Goals for Hypothesis Tests


2. Hypothesis Testing Procedure

  1. Check if the sample size is large enough and determine if a \(Z\)-test or \(t\)-test can be performed. For proportion, \(Z\)-test may be used. For mean, if \(\sigma\) is known, the \(Z\)-test may be used. If \(\sigma\) is unknown, the \(t\)-test may be used.

  2. State the null and alternative hypothesis. The null hypothesis always contains the equal sign (and possibly together with a less than or greater than symbol, depending on \(H_a\).)

  3. Set a significance level \(\alpha\). Commonly used levels are \(\alpha=0.01\), \(\alpha=0.05\) and \(\alpha=0.1\).

  4. Calculate the standardized test statistic: the \(Z\)-test statistic or the \(t\)-test statistic.

  5. Calculate the \(P\)-value.

Sign in \(H_a\) \(\ne\) \(<\) \(>\)
Test Two-tailed Left-tailed Right-tailed
  1. Make a test decision about the null hypothesis \(H_0\). We reject \(H_0\) if the \(P\)-value less than the significance level \(\alpha\).

  2. State an overall conclusion.


3. Example: Test a Mean with Known SD (1 of 2)

Residences on a certain street claim that the mean speed of automobiles run through the street is greater than the speed limit of 25 miles per hour. A random sample of 100 automobiles has a mean speed of 26 miles per hour. Assume the population standard deviation is 4 miles per hour. Is there enough evidence to support the claim of the residences at the significance level \(\alpha = 0.05\)?

Solution: The sample size is \(n=100>30\). So the sampling distribution of sample means is approximately normal by the central limit theorem.

To test the claim of the residences, we set \(H_0:\mu=25\) and \(H_a: \mu >25\).

Because \(H_a\) contains the \(>\) sign and \(\sigma\) is known, we use right-tailed \(Z\)-test.

Since the population standard deviation is \(\sigma=4\). We use the standard normal distribution to find the \(P\)-value.


4. Example: Test a Mean with Known SD (2 of 2)

The \(Z\)-test statistic is \(z=\frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}=\frac{26-25}{4/\sqrt{100}}=2.5\).

Since \(H_a:\mu>25\), the \(P\)-value is the right-tailed area calculated as \(P(Z>2.5)=\)1-NORM.S.DIST(2.5,TRUE)\(\approx 0.006\).

Because the \(P\)-value is less than the significance level $\alpha=0.05%

At 5% level of significance, there is enough evidence to support the claim of the residences that the average speed of automobile is above the speed limit.

Note: Without calculating the \(Z\)-test statistic, the \(P\)-value can also be calculated by the Excel function 1-NORM.DIST(26, 25, 4/SQRT(100), TRUE).


5. Example: Test a Mean with Unknown SD (1 of 2)

Example: A certain manufacturer claims that average numbers of candies in a certain sized bag that they produce is 20. To test the claims, you collected a random sample of 10 bags and find the mean is 18 and the standard deviation is 2.7. Assume the numbers of candies are normally distributed. At the significance level \(\alpha=0.05\), does your analysis support the manufacturer’s claim?

Solution: Since the population is normally distributed, the sampling distribution for the sample mean is approximately normal.

Set \(H_0: \mu=20\) and \(H_a: \mu\neq 20\).

Since \(H_a\) has the \(\ne\) sign and the population standard deviation is unknown, we use two-tailed \(t\)-test. We will find the \(P\)-value.


6. Example: Test a Mean with Unknown SD (2 of 2)

The \(t\)-test statistics is \(t=\frac{18-20}{2.7/\sqrt{10}}\approx-2.342.\) Using the Excel, we find that the \(P\)-value is \(p\approx\)2*T.DIST(-2.342,9,TRUE)=0.0439.

Since the \(P\)-value is smaller than the 5% significance level, we reject \(H_0\) can conclude that there is enough evidence to deny the manufacturer’s claim.


7. Example: Test a Mean from a Data Set (1 of 2)

An instructor would like to know if the students enrolled in a math course in the current semester performed better than students in the last semester. The mean final exam from last semester is 75.5. The final exam scores of 40 randomly selected 40 students were obtained

93 88 69 74 76 81 78 77 74 63 67 81 80 82 68 88 76 69 75 78
75 77 94 87 74 88 63 75 94 88 91 77 76 68 80 88 68 83 72 72

Do the data provide evidence that the students in this semester performed significantly better on the final than last semester?

Solution: The sample size is \(n=40\) which is large enough so that the sampling distribution for the sample mean is approximately normal. We will take the \(P\)-value approach.

Set \(H_0: \mu=75.5\) and \(H_a: \mu>75.5\).

Using Excel functions AVERAGE() and STDEV.S(), we find the sample mean is \(\bar{x}\approx 78.17\) and sample standard deviation is \(s\approx 8.39\).


8. Example: Test a Mean from a Data Set (2 of 2)

The \(t\)-test statistic is calculated by $$t=\frac{\bar{x}-\mu_0}{s/\sqrt{n}}=\frac{(78.17-75.5)}{8.39/\sqrt{40}}\approx 2.013.$$

Because \(H_a\) contains the \(>\) sign and \(\sigma\) is unknown, we use the right-tailed \(t\)-test.

The degree of freedom is \(\text{df}=40-1\) The \(P\)-value is the right tail area under the \(t\)-curve, that is 1-T.DIST(2.013, 39, TRUE)=0.0255 which can also be obtained by T.DIST.RT(2.013, 39).

Since the \(P\)-value is less than 5%, at the 5% level of significance, we may reject \(H_0\). So at 5% level of significance, there is enough evidence to support the claim that the students in this semester performed significantly better on the final than last semester.

However, if the 2% level of significance is used, with the given data, we fail to reject \(H_0\). Then, at the 2% level of significance, there is not enough evidence to support the claim.


9. Example: Fairness of a Coin

Suppose you want to determine if a coin is fair. You toss the coin 50 times and observe 16 heads and 34 tails. At the significant level 0.01, do you think that the coin is fair? If not, does the coin favor the head or tail?

Solution: Since \(n\hat{p}=16\) and \(n(1-\hat{p})=34\), a \(Z\)-test is valid.

To test the coin, we set the null hypothesis as \(H_0\): \(p=0.5\). The experiment suggests that we should set the alternative hypothesis as \(H_a\): \(p<0.5\).

The test statistic is \(\hat{p}=\frac{16}{50}=0.32\) and the standardization is $$\textstyle z=\frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}=\frac{0.32-0.5}{\sqrt{0.5(1-0.5)/50}}\approx -2.55.$$

Because \(H_a\) contains the \(<\) sign, the test is left-tailed and the \(P\)-value is then NORM.S.DIST(-2.55,TRUE)\(\approx 0.008\).

Because \(0.008<0.01=\alpha\), we reject the null hypothesis \(H_0\). Therefore, at the significance level 0.01, there is enough evidence to claim that the coin favors the tail.


10. Example: Proportion of Newborns (1 of 2)

Globally the long-term proportion of newborns who are male is 51.46%. A researcher believes that the proportion of boys at birth changes under severe economic conditions. To test this belief randomly selected birth records of 5,000 babies born during a period of economic recession were examined. It was found in the sample that 52.55% of the newborns were boys. Determine whether there is sufficient evidence, at the 10% level of significance, to support the researcher’s belief.

Solution: Since \(n\hat{p}\approx 2628\) and \(n(1-\hat{p})\approx 2372\), a \(Z\)-test is valid. we will use the \(P\)-value to test the hypothesis.

To test the researcher’s claim, we set the null hypothesis as \(H_0\): \(p=0.5146\). The experiment suggests that we should set the alternative hypothesis as \(H_a\): \(p\neq 0.5146\).


11. Example: Proportion of Newborns (2 of 2)

The standard test statistic is $$ z=\dfrac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}=\dfrac{0.5255-0.5146}{\sqrt{0.5146\cdot(1-0.5146)/5000}}\approx1.5422. $$

From \(H_a\), we know that the test is two-tailed. The \(P\)-value is then $$2\cdot(1-P(Z<0.5255))\approx0.124$$ which can be calculated by the Excel formula 2*(1-NORM.S.DIST(0.5255,TRUE)).

Since the significance level is \(\alpha=0.1\) and the \(P\)-value \(0.122>0.1=\alpha\), we fail to reject the null hypothesis \(H_0\).

At the significance level 0.01, there is not enough evidence to support the researcher’s belief that the proportion of newborns who are male changes.


12. A Remark on the SE for Sample Proportion

In some books, the standard error of the sample distribution of sample proportions assuming that \(p=p_0\) is calculated using the approximation $$ \sigma_{\hat{p}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}. $$

An arguable explanation is that using the above value for SE will be consistent with the approach to a hypothesis testing using a confidence interval in the case that a two-tailed test is preformed.


Practice: Testing the Mean GPA with known \(\sigma\)


Practice: Testing the Mean GPA with Unknown \(\sigma\)


Practice: Testing the Proportion of People Who Own Cats


Practice: Testing the Mean Recovery Time

The average number of days to complete recovery from a particular type of knee operation is 123.7 days. From his experience a physician suspects that use of a topical pain medication might be lengthening the recovery time. He randomly selects the records of seven knee surgery patients who used the topical medication. The times to total recovery were:

128, 135, 121, 142, 126, 151, 123]

Assuming a normal distribution of recovery times, perform the relevant test of hypotheses at the 10% level of significance.

Would the decision be the same at the 5% level of significance?

Source: Exercise 15 in Section 8.4 in Introductory Statistics.


Lab Instructions in Excel


13. Normal Distribution for Hypothesis Testing


14. \(t\)-Distributions for Hypothesis Testing

Suppose a Student’s \(t\)-distribution has the degree of freedom \(\text{df}=n-1\).


Lab Practice: A Car Manufacturer’s Claim

A car manufacturer claims that a new fuel injection design increases the mean mileage on a certain model of car above its current 28.5 miles per gallon level. Twenty-five of the new designs were checked and the mean recorded as 30.0 miles per gallon with a standard deviation of 3.8 miles per gallon. Assume that mean mileages are approximately normally distributed. Evaluate this claim at the 5% level of significance.