class: center, middle, inverse, title-slide .title[ # Lesson 5: Correlation and Causality ] .author[ ### Fei Ye ] .date[ ### May, 2024 ] --- class: center middle
## Unit 5E: Correlation and Causality *Primary Source:* PPT for the book "Using & Understanding Mathematics". --- ## Correlation A major type of statistical study is to determine if a factor statistically tend to affect another factor, or generally, if they are correlated. A **correlation** exists between two variables when higher values of one variable consistently go with higher values of another variable or when higher values of one variable consistently go with lower values of another variable. --- ## Examples of Correlations There is a correlation between the variables **height** and **weight** for people. That is, taller people tend to weigh more than shorter people. There is a correlation between the variables **demand** for apples and **price** of apples. That is, demand tends to decrease as prices increase. There is a correlation between **practice time** and **skill** among piano players. That is, those who practice more tend to be more skilled. --- ## Scatterplot A scatterplot is a graph in which each point represents the values of two variables. ![:resize , 75%](data:image/png;base64,#img/image-20200818142238014.png) --- ## Linear Relationships Between Two Data Variables - **No linear correlation:** There is no apparent linear relationship between the two variables. - **Positive linear correlation:** Both variables tend to increase (or decrease) together. - **Negative linear correlation:** One variable increases while the other decreases. - **Strength of a linear correlation:** The more closely two variables follow the general trend, the stronger the correlation. In a perfect correlation, all data points lie on a straight line. --- ## Example: Positive Linear Correlation A scatter diagram shows that higher diamond weight generally goes with higher price. ![image-20200818142315189](data:image/png;base64,#img/image-20200818142315189.png) --- ## Example: Negative Linear Correlation A scatter diagram shows that higher life expectancy generally goes with lower infant mortality. ![image-20200818142325294](data:image/png;base64,#img/image-20200818142325294.png) --- ## Practice: Type of correlation State whether the correlation is positive or negative if you believe they are correlated. Otherwise, explain your reasoning. - Age of person and time spent on social networking sites - A person's height and their favorite color - Years of education and salary - The square footage of a home and its price - The speed of a car and the time to its destination --- ## Example: Accuracy of Weather Forecasts (1 of 2) The scatterplots shown below show two weeks of data comparing the actual high temperature for the day with the same-day forecast (left diagram) and the three-day forecast (right diagram). Discuss the types of correlation on each diagram. ![image-20200818142337022](data:image/png;base64,#img/image-20200818142337022.png) --- ## Example: Accuracy of Weather Forecasts (2 of 2) **Solution:** Both scatterplots show a general trend in which higher predicted temperatures mean higher actual temperatures. That is, both show positive correlations. The points in the left diagram lie more nearly on a straight line, indicating a stronger correlation than in the right diagram. This makes sense, because we expect weather forecasts to be more accurate on the same day than three days in advance. --- ## Possible Explanations for a Correlation 1. The correlation may be a coincidence. 2. Both variables might be directly influenced by some common underlying cause. 3. One of the correlated variables may actually be a cause of the other. Note that, even in this case, it may be only one of several causes. --- ## Example: Explanation for a Correlation (1 of 2) Consider the negative correlation between infant mortality and life expectancy. ![image-20200818142325294](data:image/png;base64,#img/image-20200818142325294.png) --- ## Example: Explanation for a Correlation (2 of 2) Which of the three possible explanations for a correlation (in the previous slides) applies? **Solution:** The negative correlation is probably due to a common underlying cause, the quality of health care. In countries where health care is better in general, infant mortality is lower and life expectancy is higher. --- ## Example: Lurking variable The scatterplot below shows the relationship between the number of firefighters sent to fires (x) and the amount of damage caused by fires (y) in a certain city. .pull-left[ ![](data:image/png;base64,#img/scatterplot-firefigters.gif) ] .pull-right[ ![resize, 60%](data:image/png;base64,#img/fire-fighter-lurking.gif) ] Can we conclude that the increase in firefighters causes the increase in damage? *Answer:** No. Fires is a lurking variable that has impact on both the number of firefighters and the amount of damage. .small[ Source: [Causation and Lurking Variables in Concepts in Statistics](https://courses.lumenlearning.com/wmopen-concepts-statistics/chapter/causation-and-lurking-variables-1-of-2/) ] --- ## Guidelines for Establishing Causality (1 of 3) If you suspect that a particular variable (the suspected cause) is causing some effect (consequence): 1. Look for situations in which the effect is correlated with the suspected cause even while other factors vary. **Example:** Researchers found correlations between smoking and lung cancer among many groups. 2. Among groups that differ only in the presence or absence of the suspected cause, check that the effect is similarly present or absent. **Example:** Among groups that seemed identical, lung cancer was found to be more rare in nonsmokers. --- ## Guidelines for Establishing Causality (2 of 3) 3. Look for evidence that larger amounts of the suspected cause produce larger amounts of the effect. **Example:** People smoked more and for longer periods were found to have higher rates of lung cancer. 4. If the effect might be produced by other potential causes (besides the suspected cause), make sure that the effect still remains after accounting for these other potential causes. **Example:** When researchers accounted for other potential causes, they found that almost all the remaining lung cancer cases occurred among smokers. --- ## Guidelines for Establishing Causality (3 of 3) 5. If possible, test the suspected cause with an experiment. If the experiment cannot be performed with humans for ethical reasons, consider doing the experiment with animals, cell cultures, or computer models. **Example:** Randomly chosen treatment and control groups will be used. In the lung cancer case, an experiment can be used to eliminate, for instance, genetic factor. 6. Try to determine the physical mechanism by which the suspected cause produces the effect. **Example:** Researchers may study samples of human lung tissue. --- ## Case Study: Air Bags and Children (1 of 3) By the mid-1990s, passenger-side air bags had become commonplace in cars. Statistical studies showed that the air bags saved many lives in moderate- to high-speed collisions. But a disturbing pattern also appeared. In at least some cases, young children, especially infants and toddlers in child car seats, were killed by air bags in low-speed collisions. --- ## Case Study: Air Bags and Children (2 of 3) At first, many safety advocates found it difficult to believe that air bags could be the cause of the deaths. But the observational evidence became stronger, meeting the first four guidelines for establishing causality. For example, the greater risk to infants in child car seats fit Guideline 3, because it indicated that being closer to the air bags increased the risk of death. (A child car seat sits on top of the built-in seat, thereby putting a child closer to the air bags than the child would be otherwise.) --- ## Case Study: Air Bags and Children (3 of 3) To seal the case, safety experts undertook experiments using dummies. They found that children, because of their small size, often sit where they could be easily hurt by the explosive opening of an air bag. The experiments also showed that an air bag could impact a child car seat hard enough to cause death, thereby revealing the physical mechanism by which the deaths occurred. --- ## Practice: Correlation and a cause-and-effect relationship <iframe src="https://www.myopenmath.com/embedq2.php?id=307471&seed=2024&showansafter&allowregen" width="100%" height="480px" data-external="1"></iframe>