class: center, middle, inverse, title-slide .title[ # Lesson 1: Fundamentals of Statistics ] .author[ ### Fei Ye
Department of Mathematics and Computer Science
] .date[ ### May, 2024 ] --- ## Unit 5A: Fundamentals of Statistics .cente[ *Primary Source:* PPT for the book "Using & Understanding Mathematics". ] --- ## Definitions of Statistics - Statistics is the science of collecting, organizing, interpreting data and using data to make inference. - The **population** in a statistical study is the complete set of objects being studied. - A **sample** is a subset of a population. - **Population parameters** are specific numbers measuring and/or characterizing the population. - **Sample statistics** are numbers describing the characteristics of raw data sets collected from the samples. - A **census** is a survey of an entire population. --- ## Example: Population and Sample (1 of 2) **Example:** Describe the population, sample, population parameters, and sample statistics. Agricultural inspectors for Jefferson County measure the average levels of residue from three common pesticides on 25 ears of corn from each of the 104 corn-producing farms in the county. **Solution:** - The inspectors seek to learn about levels of residue from the **population** of all ears of corn grown in the county. --- ## Example: Population and Sample (2 of 2) > Agricultural inspectors for Jefferson County measure the average levels of residue from three common pesticides on 25 ears of corn from each of the 104 corn-producing farms in the county. - Agricultural inspectors measure the levels of residue from three common pesticides on a **sample** consists of 25 ears of corn from each of the 104 farms. - The **population parameters** are the average levels of residue from the three pesticides on all corn grown in the county. - The **sample statistics** describe the average levels of residue that are actually measured on the corn in the sample. --- ## Practice: Identify basic statistical concepts {.unnumbered} <iframe src="https://www.myopenmath.com/embedq2.php?id=676993&seed=2021&showansafter" width="100%" height="400px" data-external="1"></iframe> .footmark[ <a href="https://www.myopenmath.com/embedq2.php?id=676993&seed=2021&showansafter" target="_blank">Click here to open the practice in a new window</a> ] --- ## Statistical Study In general, a statistical study consists of the following basic Steps: 1. State the goal of the study. 2. Choose a representative sample from the population. 3. Collect raw data from the sample and summarize these data by finding sample statistics of interest. 4. Use the sample statistics to infer the population parameters. 5. Draw conclusions. --- ## The Circle of a Statistical Study ![](data:image/png;base64,#img/image4.png) .footmark[ *Source:* [Concepts of Statistics](https://courses.lumenlearning.com/wmopen-concepts-statistics/chapter/why-it-matters-why-it-matters-types-of-statistical-studies-and-producing-data/) ] --- ## Elements of a Statistical Study ![](data:image/png;base64,#img/image5.png) --- ## Example: Unemployment Survey (1 of 2) **Example:** Each month, the U.S. Labor Department surveys 60,000 households to determine characteristics of the U.S. work force. One population parameter of interest is the U.S. unemployment rate, defined as the percentage of people who are unemployed among all those who are either employed or actively seeking employment. Describe how the five basic steps of a statistical study apply to this research. **Solution:** Step 1. The **goal** of the research is to learn about the employment (or unemployment) within the population of all Americans who are either employed or actively seeking employment. Step 2. The Labor Department chooses a **sample** consisting of people employed or seeking employment in 60,000 households. --- ## Example: Unemployment Survey (2 of 2) Step 3. The Labor Department asks questions of the people in the sample, and their responses constitute the raw data for the research. The department then **consolidates these data into sample statistics**, such as the percentage of people in the sample who are unemployed. Step 4. Based on the sample statistics, the Labor Department **makes estimates** of the corresponding population parameters, such as the unemployment rate for the entire United States. Step 5. The Labor Department **draws conclusions** based on the population parameters and other information. For example, it might use the current and past unemployment rates to draw conclusions about whether jobs have been created or lost. --- ## Practice: Statistical Study The president of a college wants to know the average age of all freshmen of the college. Please design a statistical study to address this question. --- ## Definition Representative Sample and Common Sampling Methods - A **representative sample** is a sample in which the relevant characteristics of the sample members are generally the same as those of the population. - **Simple random sampling**: We choose a sample of items in such a way that every sample of the same size has an equal chance of being selected. - **Systematic sampling**: We use a simple system to choose the sample, such as selecting every 10th or every 50th member of the population. --- ## Common Sampling Methods - **Stratified sampling**: We use this method when we are concerned about differences among subgroups, or strata, within a population. We first identify the subgroups and then draw a simple random sample within each subgroup. The total sample consists of all the samples from the individual subgroups. - **Convenience sampling**: We choose a sample that is convenient to select, such as people who happen to be in the same classroom. - **Voluntary response sampling**: We collect a sample from volunteers, such as people who choose to answer an online survey. --- ## Common Sampling Techniques ![](data:image/png;base64,#img/image6.png) --- ## Example: Sampling Methods (1 of 4) **Example:** Identify the type of sampling used in each of the following cases and comment on whether the sample is likely to be representative of the population. a. You are conducting a survey of students in a dormitory. You choose your sample by knocking on the door of every 10th room. **Solution:** Choosing every 10th room makes this a **systematic sample**. The sample may be representative, as long as students were randomly assigned to rooms. --- ## Example: Sampling Methods (2 of 4) b. To survey opinions on a proposed new water line, a research firm randomly draws the addresses of 150 homeowners from a public list of all homeowners. **Solution:** The records presumably list all homeowners, so drawing randomly from this list produces a **simple random sample**. It has a good chance of being representative of the population. --- ## Example: Sampling Methods (3 of 4) c. Agricultural inspectors for Jefferson County check the levels of residue from three common pesticides on 25 ears of corn from each of the 104 corn-producing farms in the county. **Solution:** Each farm may have different pesticide use, so the inspectors consider corn from each farm as a subgroup (stratum) of the full population. By checking 25 ears of corn from each of the 104 farms, the inspectors are using **stratified sampling**. If the ears are collected randomly on each farm, each set of 25 is likely to be representative of its farm. --- ## Example: Sampling Methods (4 of 4) d. Anthropologists determine the average brain size of early Neanderthals in Europe by studying skulls found at three sites in southern Europe. **Solution:** By studying skulls found at selected sites, the anthropologists are using a **convenience sample**. They have little choice, because only a few skulls remain from the many Neanderthals who once lived in Europe. It seems reasonable to assume that these skulls are representative of the larger population. However, the study might not be very reliable. --- ## Practice: Sampling Methods What sampling method was used? 1. Administrators had a computer generate 40 random student identification numbers and calculated their average GPA. 2. A website randomly selects 30 customers who answered a satisfaction survey. 3. To survey voters in a town, a polling company randomly selects 5 people from every block to interview. --- ## Practice: Match sampling methods .iframecontainer[ <iframe src="https://www.myopenmath.com/embedq2.php?id=676968&seed=2021&showansafter" width="100%" height="400px" data-external="1"></iframe> ] .footmark[ <a href="https://www.myopenmath.com/embedq2.php?id=676968&seed=2021&showansafter" target="_blank">Click here to open the practice in a new window</a> ] --- ## Definition: Bias A statistical study suffers from **bias** if its design or conduct tends to favor certain results. That is, not every member of the population has equal likelihood of being in the sample. Two typical types of statistical bias are the **selection (sampling) bias** and the **participant (response) bias**. --- ## Example: Selection bias A college student wants to know what's the major difficulty that students of the college have on online learning. He asks all of his classmates only. Is the statistical study reliable? Why or why not? **Solution:** This sampling method is biased as the result may not represents experience of students from other classes or departments. --- ## Practice: What type of bias .iframecontainer[ <iframe src="https://www.myopenmath.com/embedq2.php?id=652014&seed=2021&showansafter" width="100%" height="400px" data-external="1"></iframe> ] .footmark[ <a href="https://www.myopenmath.com/embedq2.php?id=652014&seed=2021&showansafter" target="_blank">Click here to open the practice in a new window</a> ] --- ## Types of Statistical Study - In an **observational study**, researchers observe or measure characteristics of the sample members but do not attempt to influence or modify these characteristics. - In an **experiment**, researchers apply a treatment to some or all of the sample members and then observe the effects of the treatment. --- ## Practice: Observational or Experimental Study <iframe src="https://www.myopenmath.com/embedq2.php?id=202319&seed=2024&showansafter&allowregen" width="100%" height="480px" data-external="1"></iframe> --- ## Treatment and Control Groups - The **treatment group** in an experiment is the group of sample members who *receive the treatment* being tested. - The **control group** in an experiment is the group of sample members who *do not receive the treatment* being tested. - It is important for the treatment and control groups to be selected randomly and to be alike in all respects except for the treatment. --- ## Placebos and the Placebo Effect - A **placebo** lacks the active ingredients of the treatment being tested in a study, but looks or feels enough like the treatment so that participants cannot distinguish whether they are receiving the placebo or the real treatment. - The **placebo effect** refers to the situation in which patients improve simply because they believe they are receiving a useful treatment. --- ## Blinding in Experiments - In statistical terminology, the practice of keeping people in the dark about who is in the treatment group and who is in the control group is called blinding. - An experiment is **single-blind** if *the participants do not know* whether they are members of the treatment group or members of the control group, but the experimenters do know. - An experiment is **double-blind** if *neither the participants nor the experimenters* (people administering the treatment) know who belongs to the treatment group and who belongs to the control group. --- ## Example: What's Wrong with This Experiment? (1 of 2) For each of the experiments described, identify any problems and explain how the problems could have been avoided. a. A chiropractor performs adjustments on 25 patients with back pain. Afterward, 18 of the patients say they feel better. He concludes that the adjustments are an effective treatment. **Solution:** The 25 patients who receive adjustments represent a treatment group, but this study *lacks a control group*. The patients may be feeling better because of a placebo effect rather than any real effect of the adjustments. The chiropractor might have improved his study by hiring an actor to do a fake adjustment (one that feels like a real manipulation, but doesn't actually conform to chiropractic guidelines) on a control group. Then he could have compared the results in the two groups to see whether a placebo effect was involved. --- ## Example: What's Wrong with This Experiment? (2 of 2) b. A new drug for a type of attention deficit disorder is supposed to make the affected children less disruptive. Randomly selected children suffering from the disorder are divided into treatment and control groups. Those in the control group receive a placebo that looks just like the real drug. The experiment is single-blind. Experimenters interview the children one on one to decide whether they became more polite. **Solution:** Because the experimenters know which children received the real drug, during the interviews they may inadvertently speak differently or interpret behavior differently with these children. *The experiment should have been double-blind*, so that the experimenters conducting the interviews would not have known which children received the real drug and which children received the placebo. --- ## Practice: Experimental Study Researchers conduct an experiment to determine whether students will perform better using redesigned materials for a certain course. Two groups of 100 students was selected. One group of randomly selected from classes that were taught in the usual way. The other group of randomly selected from classes whose instructors have many years of teaching experience. Is this experiments well-designed? What's wrong? --- ## Retrospective Study A **retrospective study** (also known as case-control study) is an observational study that uses data from the past, such as official records or past interviews, and in which the sample naturally divides into a group of cases who engaged in the behavior under study and a group of controls who did not. Retrospective study inspects individuals by outcome. .footmark[ Further reading [Difference between Case control study and Retrospective cohort study](https://www.amritaakhouri.com/single-post/2018/02/10/Difference-between-Case-control-study-and-Retrospective-cohort-study) ] --- ## Example: Which Type of Study? (1 of 4) **Example:** For each of the following questions, what type of statistical study is most likely to lead to an answer? Why? a. What is the average income of stock brokers? **Solution:** An observational study can tell us the average income of stock brokers. We need only survey (observe) the brokers. --- ## Example: Which Type of Study? (2 of 4) b. Do seat belts save lives? **Solution:** It would be unethical to do an experiment in which some people were told to wear seat belts and others were told not to wear them. Instead, we can conduct a retrospective study. People who wore seat belts in crashes represent the cases and people who did not wear them are the controls. By comparing the death rates in accidents between cases and controls, we can learn whether seat belts save lives. (They do.) --- ## Example: Which Type of Study? (3 of 4) c. Can lifting weights improve runners' times in a 10-kilometer race? **Solution:** We need an experiment to determine whether lifting weights can improve runners' 10K times. One group of runners will be put on a weight-lifting program, and a control group will be asked to stay away from weights. We must try to ensure that all other aspects of their training are similar. Then we can see whether the runners in the lifting group improve their times more than those in the control group. Note that we cannot use blinding in this experiment because there is no way to prevent participants from knowing whether they are lifting weights. --- ## Example: Which Type of Study? (4 of 4) d. Can a new herbal remedy reduce the severity of colds? **Solution:** We should use a double-blind experiment, in which some participants get the actual remedy while others get a placebo. We need double-blind conditions because the severity of a cold may be affected by mood or other factors that experimenters might inadvertently influence. --- ## Practice: Type of Study Is the study observational or experimental? 1. The temperature of a city on randomly selected days throughout the year was measured. 2. Two groups of students are randomly selected. One group was told to listen to music while taking a quiz and their results are compared to the other group not listening to music. --- ## Statistical Inference - Confidence Interval - The margin of error is used to describe a confidence interval that is likely to contain the true population parameter. - A confidence interval is an interval `\((a, b)\)`, where $$ a=\text{sample statistic} - \text{margin of error}, $$ $$ b=\text{sample statistic} + \text{margin of error}). $$ - The value of margin of error depends on 1. the level of confidence (which is usually taken to be 95%), 2. the sample size, and 3. variation (the measure of spread of data). --- ## Example: Close Election An election eve poll finds that 52% of surveyed voters plan to vote for Smith, and she needs a majority (more than 50%) to win without a runoff. At the 95% level, the margin of error in the poll is 3 percentage points. Will she win? **Solution:** The confidence interval is from `\(52\% - 3\% = 49\%\)` to `\(52\% + 3\% = 55\%\)`. We can be 95% confident that the actual percentage of people planning to vote for her is between 49% and 55%. Because this confidence interval leaves open the possibility of either a majority or less than a majority, this election is too close to call. --- ## Practice: Confidence interval for average GPA A random sample 100 of college students has the average GPA is 2.9 with an margin of error 0.2 at the 95% level of confidence. What can we say about the average GPA of all students in that college? --- ## Practice: Confidence interval for proportion .iframecontainer[ <iframe src="https://www.myopenmath.com/embedq2.php?id=668843&seed=2021&showansafter" width="100%" height="400px" data-external="1"></iframe> ] .footmark[ <a href="https://www.myopenmath.com/embedq2.php?id=668843&seed=2021&showansafter" target="_blank">Click here to open the practice in a new window</a> ]