Topic 1: Statistical Studies

Fei Ye

January 2024

References


Learning Goals


1. Why Study Statistics?


2. Process of Statistical Studies

  1. Understanding the nature of the problem.
  2. Deciding what to measure and how to measure it.
  3. Collecting data.
  4. Data summarization and preliminary analysis.
  5. Formal data analysis.
  6. Interpretation of results

A picture show how statistics works

Image source: Concepts in Statistics (lumen learning)


3. Population vs Sample


4. Example: Identify Statistic Concepts

  1. Determine if the group is a population or sample
    1. The grade of all students in a math class.
    2. 10 students in a math class earned “A”.

Answer:

  1. Population
  2. Sample.
  1. Identify statistic concepts in the following study: To learn the percentage of students go to school by public transportation, 500 students at a college were surveyed. 50% say they go to school by public transportation

    Answer:

    • Population: all students at the college
    • Sample: 500 being surveyed
    • Parameter: unknown percentage
    • Statistic: 50%

5. Type of Variables


6. Example: College Choice Do-Over

The Higher Education Research Institute at UCLA surveys over 20,000 college seniors each year. One question on the survey asks seniors the following question: If you could make your college choice over, would you still choose to enroll at your current college? Possible responses are definitely yes (DY), probably yes (PY), probably no (PN), and definitely no (DN).

Question:

  1. Identify a variable in the study. Is it categorical or numerical?
  2. Identify a data set. Is it univariate or bivariate or multivariate?

Answer:

  1. A variable in the study is students do-over choice. It is categorical variable.
  2. A data set is the collection of do-over choice of some or all students being surveyed. The data set would be univariate.

Practice: Basic Statistical Concepts

Identify the population, sample, the variable of study, the type of the variable, the population parameter and the sample statistics.

An administrator wishes to estimate the passing rate of a certain course. She takes a random sample of 50 students and obtains their letter grades of that course. Among the 50 students, 32 students earned a grade C or better.


7. Types of Statistical Studies

A statistical study can usually be categorized as an observational study or an experiment by the mean of study.

In an observational study, it is not possible to draw clear cause-and-effect conclusions because we cannot rule out the possibility that the observed effect is due to some other variables not being studied, known as extraneous variables.


8. Example: Types of Statistical Studies

Which type of study will answer the question.

  1. What proportion of all college students in the United States have taken classes at a community college?

  2. Does use of computer-aided instruction in college math classes improve test scores?

Answer:

  1. Observational
  2. Experimental

See Types of Statistical Studies (2 of 4) in the textbook Concepts in Statistics for more examples.


Practice: Observational vs Experimental I

Identify the type of statistical study:

  1. A study took random sample of adults and asked them about their bedtime habits. The data showed that people who drank a cup of tea before bedtime were more likely to go to sleep earlier than those who didn’t drink tea.

  2. Another study took a group of adults and randomly divided them into two groups. One group was told to drink tea every night for a week, while the other group was told not to drink tea that week. Researchers then compared when each group fell asleep.

Source: Khan Academy


Practice: Observational vs Experimental II


9. Questions about Population (1 of 2)

Type of Research Question Examples
Make an estimate about the population (often an estimate about an average value or a proportion with a given characteristic) What is the average number of hours that community college students work each week? What proportion of all U.S. college students are enrolled at a community college?
Test a claim about the population (often a claim about an average value or a proportion with a given characteristic) Is the average course load for a community college student greater than 12 units? Do the majority of community college students qualify for federal student loans?

10. Questions about Population (2 of 2)

Type of Research Question Examples
Compare two populations (often a comparison of population averages or proportions with a given characteristic) In community colleges, do female students have a higher GPA than male students? Are college athletes more likely than non-athletes to receive academic advising?
Investigate a relationship between two variables in the population Is there a relationship between the number of hours high school students spend each week on Facebook and their GPA? Is academic counseling associated with quicker completion of a college degree?

11. Question on Cause-and-Effect


12. Example: Cause-and-Effect or Correlation

Determine if the question is a cause-and-effect question? What are the explanatory and response variables?

  1. Does use of computer-aided instruction in college math classes improve test scores?
  2. Does tutoring correlate with improved performance on exams?

Answer:

  1. This question investigates a cause-and-effect relationship. The explanatory variable is computer-aided instruction and the response variable is the test scores.

  2. This question investigates a correlation between variables in a population and is not a cause-and-effect question. The explanatory variable is tutoring, and the response variable is the performance.


13. Example: Appropriate Conclusion

In general, we should not make cause-and-effect statements from observational studies unless impact of confounding variables can be significantly decreased.

Example: A researcher studies the medical records of 500 randomly selected patients. Based on the information in the records, he divides the patients into two groups: those given the recommendation to take an aspirin every day and those with no such recommendation. He reports the percentage of each group that developed heart disease.
Determine whether the study supports the conclusion that taking aspirin lowers the risk of heart attacks.

Answer: The conclusion claims a cause-and-effect relationship. To answer the question, we need an experimental study. However, the study has no control on data which makes it inappropriate.


Practice: Cause-and-Effect or Correlation

Does higher education attainment lead to higher salary?

  1. Determine if the question is a cause-and-effect question?
  2. What are the explanatory and response variables?
  3. If a student want to study this question, what type of statistical study can be used? What kind of conclusion can be drawn?

Practice: Correlation or Causation


14. Sampling Plans

To make accurate inference, the sample must be representative of the population.


15. Methods of Random Sampling (1 of 2)


16. Methods of Random Sampling (2 of 2)


Practice: Sampling Methods

Determine the type of sampling method.

  1. A market researcher polls every tenth person who walks into a store.

  2. 100 students whose student id numbers matches 100 numbers generated by a computer randomization program.

  3. The first 30 people who walk into a sporting event are polled on their television preferences.


17. Common Types of Selection Bias in Sampling


18. Example: Appropriate Sampling Design

Suppose that you want to estimate the proportion of students at your college that use the library.

Which sampling plan will produce the most reliable results?

  1. Select 100 students at random from students in the library.
  2. Select 200 students at random from students who use the Tutoring Center.
  3. Select 300 students who have checked out a book from the library.
  4. Select 50 students at random from the college.

Answer: The 4th sampling plan is the most reliable plan. The first three and undercover the college.

In general, the larger sample size, the more accurate of conclusion. However, we have to avoid bad sampling.


Practice: Sampling Techniques

Click the link to open the practice in a new window.

Practice on Sampling Techniques


19. Elements of Experimental Design (1 of 2)


20. Elements of Experimental Design (2 of 2)


21. Confounding Variable vs Lurking Variable

Both confounding and lurking variables are extraneous variables which are variable other than the explanatory variables that may have an effect on the response variable.


Practice: Principles of Experimental Design


Practice: Experimental Design

There is an ongoing debate about how many spaces should be placed after a period in typed documents. Alana read about a study where 100 participants all read the same document typed in Courier New font. Half of the participants were randomly assigned the document with one space after each period, and the other half were given the document with two spaces after each period.

Participants who read the document with two spaces after each period were able to finish reading significantly faster than those with one space after each period. Alana concluded that using two spaces after each period will help people read all documents faster.

Is this study appropriate? Why?

Source: Khan Academy


Practice: Confounding Variable Definition


Lab Instructions in Excel


22. Introduction to Excel Spreadsheets

Click the link to open in a new window.


23. Random Numbers by Excel


24. Example: Usage of RAND()

Randomly generate a number between 0 and 1.

Howto:

Alternatively, you may also manually enter the function: =rand() in the cell and hit enter.


25. Example: Usage of RANDBETWEEN()

Generate 10 random integers of 2 digits.

Howto:


26. Example: Usage of RANDARRAY()

Generate 10 random integers of 2 digits without repetition.

Howto:

In the cell with 9 empty cells below it, say A1, apply the Excel function =UNIQUE(RANDARRAY(10, 1, 10, 99, TRUE)). You will get a column array of 10 integers without duplication.


Practice: Random Numbers

  1. Generate a real number between 1 and 2.

  2. Generate 10 integers of 2 digits that are less than 50.

  3. Generate 10 integers of 2 digits that are less than 50 and without duplication.


27. Install the Analysis ToolPak (Optional)

If you have a desktop version Excel, you may install the Excel add-in, Analysis Toolpak which is frequently used for analyzing data.