class: center, middle, inverse, title-slide .title[ # Lesson 9: Statistical Inference ] .author[ ### Fei Ye ] .date[ ### May, 2024 ] --- class: center middle
## Unit 6D: Statistical Inference *Primary Source:* PPT for the book "Using & Understanding Mathematics". --- ## Statistical Significance A set of measurements or observations in a statistical study is said to be **statistically significant** if it is unlikely to have occurred by chance. **Levels of significance** are ***pre-chosen*** probabilities used to determine if a result is likely due to chance or to other factors. Common levels of significance: - At the 0.05 level: The probability of an observed difference occurring by chance is 5% or less. - At the 0.01 level: The probability of an observed difference occurring by chance is 1% or less. --- ## Example: Significant Events (1 of 2) Are the following events statistically significant? Explain. a. The major league baseball team with the worst win-loss record defeats the team with the best record. b. In terms of the global average temperature, the 16 years from 2001-2016 were 16 of the 17 hottest years on record since 1880. --- ## Example: Significant Events (2 of 2) **Solution:** We expect a team with a poor win-loss record to lose most of its games. However, we also expect it to win occasionally, even against the team with the best record. This event is not statistically significant. Having 16 of the 17 hottest years on record occur in a single 16-year period is statistically significant. Indeed, having such a streak of hot years is very unlikely to have occurred by chance alone and therefore provides significant evidence of a warming Earth. --- ## Example: Statistical Significance in Experiments (1 of 2) A researcher conducts a double-blind experiment that tests whether a new herbal formula is effective in preventing colds. During a three-month period, the 100 randomly selected people in a treatment group take the herbal formula while the 100 randomly selected people in a control group take a placebo. The results show that 30 people in the treatment group get colds, compared to 32 people in the control group. Can we conclude that the herbal formula is effective in preventing colds? --- ## Example: Statistical Significance in Experiments (2 of 2) **Solution:** Whether a person gets a cold during any 3-month period depends on many unpredictable factors. Therefore, we should not expect the number of people with colds in any two groups of 100 people to be exactly the same. In this case, the difference between 30% of the treatment group and 32% of the control group getting colds is small enough to be explained by chance. That is, the difference is not statistically significant, and we should not conclude that the treatment made any difference at all. --- ## Margin of Error and Confidence Intervals for Population Proportion - Suppose you draw a single sample of size `\(n\)` from a large population and measure its sample proportion. The margin of error for 95% confidence is approximately (indeed less than) $$ \text{margin of error}\approx\dfrac{1}{\sqrt{n}}. $$ - The 95% confidence interval is found by subtracting and adding the margin of error from the sample proportion. - Interpretation of confidence interval: you can be 95% confident that the true population proportion lies within the confidence interval. - As the sample ***size increases***, the margin of ***error decreases***. --- ## Example: Unemployment Rate (1 of 2) Suppose the Bureau of Labor Statistics finds 3420 unemployed people in a sample of n = 60,000 people. Estimate the population unemployment rate and give a 95% confidence interval. **Solution:** The sample proportion is the unemployment rate for the sample: $$ \dfrac{3420}{60000}=0.057. $$ The margin of error is approximately $$ \dfrac{1}{\sqrt{60000}}\approx 0.004 $$ --- ## Example: Unemployment Rate (2 of 2) **Solution: (continued)** We add and subtract the margin of error of 0.004 from the sample proportion of 0.057, yielding a 95% confidence interval from 0.053 to 0.061, or 5.3% to 6.1%. We can conclude with 95% confidence that the true unemployment rate for the population is between 5.3% and 6.1%. --- ## Practice: Poll Margins and Confidence Interval .iframecontainer[ <iframe src="https://www.myopenmath.com/embedq2.php?id=677044&seed=2021&showansafter" width="100%" height="400px" data-external="1"></iframe> ] .footmark[ <a href="https://www.myopenmath.com/embedq2.php?id=677044&seed=2021&showansafter" target="_blank">Click here to open the practice in a new window</a> ] --- ## Hypothesis Testing Hypothesis testing in statistics is a way to test a claim or a statement of a population parameter, such as the a poll result. In a hypothesis test, we deal with two claims about the population: the null hypothesis and the alternative hypothesis. - The **null hypothesis** claims a specific value for a population parameter. (It is often the value expected in the case of no special effect.) It takes the form $$ \textbf{null hypothesis: } \text{population parameter} = \text{claimed value}. $$ - An **alternative hypothesis** is a claim that the population parameter has a value different from that claimed by the null hypothesis. --- ## Two Outcomes of a Hypothesis Test - **Rejecting the null hypothesis** We have evidence that supports the alternative hypothesis. - **Not rejecting the null hypothesis** We lack sufficient evidence to support the alternative hypothesis. --- ## Example: Gender Choice Outcomes (1 of 2) A company claimed that its Gender Choice product could increase a woman's chance of giving birth to a baby girl. 1. State the null hypothesis and the alternative hypothesis. 2. Describe the two possible outcomes of a hypothesis test concerning the Gender Choice product. **Solution:** The expected chance of having a baby girl is 50%. The company's claim is that the chance is greater than 50%. - Null hypothesis: the proportion of girl babies is the expected 50%. - Alternative hypothesis: the proportion of girl babies is greater than 50%. --- ## Example: Gender Choice Outcomes (2 of 2) There are two possible outcomes: - We can reject the null hypothesis and accept the alternative hypothesis. In this case, we conclude that the percentage of girl babies is greater than 50% for people using Gender Choice and that the product actually works. - We cannot to reject the null hypothesis. This means we have no grounds for doubting the null hypothesis, but it does not prove that the null hypothesis is true. For Gender Choice, it means that, while we lack evidence that the product works, we have not proven that it does not work. --- ## Practice: 2-year Graduation Rate A community college president claims that the 2-year graduation rate at the college is higher than the state average 48%. 1. State the null hypothesis and the alternative hypothesis. 2. Describe the two possible outcomes of a hypothesis test concerning the graduation rate. --- ## Hypothesis Test Decisions Under the assumption that the null hypothesis is true, a hypothesis test decision can be made by compare an actual sample result (mean or proportion) to the expected result associated to a significance level. Let `\(P\)` be the chance of getting a sample result that is at least as extreme as the observed result. - If `\(P\)` is less than 1%, then the test is significant at the 0.01 level, which means that there is strong evidence to reject the null hypothesis. - If `\(P\)` is less than 5%, then the test is significant at the 0.05 level, which means that there is moderate evidence to reject the null hypothesis. - If `\(P\)` is greater than 5%, then the test is not significant, which means that there is not sufficient evidence to reject the null hypothesis. --- ## Example: Birth Weight Significance (1 of 3) A county health official believes that the mean birth weight of male babies at a local hospital is greater than the national average of 3.39 kilograms. A random sample of 145 male babies born at that hospital has a mean birth weight of 3.61 kilograms. Assuming that the mean birth weight of all male babies born at the hospital is the national average of 3.39 kilograms, a calculation shows that the probability of selecting a sample with a mean birth weight of at least 3.61 kilograms is 0.032. 1. Formulate the null and alternative hypotheses. 2. Discuss whether the sample provides evidence for rejecting or not rejecting the null hypothesis. --- ## Example: Birth Weight Significance (2 of 3) **Solution:** - Null hypothesis: the mean birth weight of all male babies born at this hospital is the national average of 3.39 kilograms, that is, $$ \text{mean birth weight}= 3.39 \text{kg} $$ - Alternative hypothesis: the health official-that the hospital mean is higher than the national average, that is, $$ \text{mean birth weight} > 3.39 \text{kg}. $$ --- ## Example: Birth Weight Significance (3 of 3) Under the assumption that the null hypothesis is true, the chance of observing a sample with a mean of at least 3.61 kilograms is 0.032. This is less than 0.05 but greater than 0.01, so the result is statistically significant at the 0.05 level but not at the 0.01 level. Statistical significance at the 0.05 level provides moderate evidence for rejecting the null hypothesis, in which case the county official's claim would be supported. --- ## Practice: Economic trending .iframecontainer[ <iframe src="https://www.myopenmath.com/embedq2.php?id=677235&seed=2021&showansafter" width="100%" height="400px" data-external="1"></iframe> ] .footmark[ <a href="https://www.myopenmath.com/embedq2.php?id=677235&seed=2021&showansafter" target="_blank">Click here to open the practice in a new window</a> ] --- ## Practice: Mean Household Income of HDTV Owners A marketing company claims that the mean household income of HDTV owners across the population is greater than $50,000. A random sample of 2000 household with HDTVs shows that the mean household is $51,012. Assuming that the true mean is $50,000, the probability of selecting a sample with mean income of $51,012 or more is 0.008. 1. Formulate the null and alternative hypotheses. 2. Discuss whether the sample provides evidence for rejecting or not rejecting the null hypothesis.