Hypothesis Testing: Making Decisions with Data

From Questions to Statistical Evidence

Narjes Mathlouthi

2025-07-30

Welcome to Hypothesis Testing

Making Decisions with Data: From Questions to Statistical Evidence

“The goal is not to eliminate uncertainty, but to make informed decisions despite it”

📢 Quick Announcements

📝 Quiz 2 Reminder

When:
- 📅 Date: Friday, July 25
- ⏰ Window: 7 AM – 12 AM
- ⏳ Duration: 1 hour once started

Where: 💻 Online via Canvas

📚 Today’s Focus

  • Foundation: Logic of hypothesis testing
  • Practice: Real examples with Python
  • Skills: Making statistical decisions
  • Applications: From medicine to marketing

Learning Journey Today 🎯

🧠 Conceptual Goals

  • Understand the logic of hypothesis testing
  • Get familiar with the language of statistical decisions
  • Recognize different types of errors and their consequences
  • Connect to confidence intervals from last lecture

🛠️ Practical Skills

  • Formulate hypotheses from research questions
  • Calculate and interpret p-values correctly
  • Perform hypothesis tests in Python
  • Make informed decisions using statistical evidence
  • Communicate results effectively

What is Hypothesis Testing?

What is Hypothesis Testing?

Hypothesis testing lets us use sample data to weigh competing claims about a population.

Workflow

  1. State H₀ (null) – the status‑quo or “no‑effect” position
  2. State H₁ (alternative) – the research claim you hope to support
  3. Choose α – the tolerable Type I error rate (e.g., 0.05)
  4. Compute a test statistic – compress the data into one number
  5. Find the p‑value – “How unusual is this statistic if H₀ were true?”
  6. Make a decisionreject H₀ if p ≤ α; otherwise fail to reject

The goal is not to prove anything with certainty, but to judge whether the evidence tips the scale away from H₀.

Hypothesis testing helps us answer: “Is what we observed in our sample strong enough evidence to conclude something about the population?”

The Courtroom Analogy ⚖️

Criminal Justice System

Aspect Criminal Court
Starting Position Defendant is innocent
Burden of Proof Prosecution must prove guilt
Evidence Standard Beyond reasonable doubt
Decision Options Guilty or Not Guilty
Type I Error Convict an innocent person
Type II Error Acquit a guilty person
Consequences Balance justice vs. protecting innocent

Statistical Hypothesis Testing

Aspect Hypothesis Testing
Starting Position Null hypothesis (H₀) is true
Burden of Proof Data must support alternative (H₁)
Evidence Standard p ≤ α (usually 0.05)
Decision Options Reject H₀ or Fail to reject H₀
Type I Error Reject a true null (false positive)
Type II Error Fail to reject a false null (false negative)
Consequences Balance discovery vs. false claims

Key Insight 💡 Just like in court, we never “prove” innocence or accept the null hypothesis—we only decide whether the evidence is strong enough to reject it.

The Six Steps of Hypothesis Testing 📋

Types of Alternative Hypotheses 🎯

What is a p-value?

What is a p-value?

The p-value answers the question
“If the null hypothesis were true, how likely is a result at least this extreme?”

Formally, for an observed test statistic T_{\text{obs}},

p = P\bigl(|T| \ge |T_{\text{obs}}| \;\big|\; H_0\bigr).

Smaller p-values → data less compatible with H_0 → stronger evidence against H_0.

Interpretation Cheat-Sheet

p-value Evidence vs H_0
p > 0.10 Little / none
0.05 < p \le 0.10 Weak
0.01 < p \le 0.05 Moderate
p \le 0.01 Strong

(Guidelines, not iron-clad laws.)

Common Pitfalls

  • p is not the probability that (H_0) is true
  • p is not the probability the result occurred “by chance”
  • A non-significant p does not prove H_0
  • Statistical significance ≠ practical importance

Example: In our treatment test, t = 2.5 gave p = 0.006.
If H_0 were true, such an extreme outcome would occur only 0.6 % of the time,compelling evidence favoring the new treatment.

Types of Errors: The Trade-off 🎲

Error Types Matrix

Decision ↓ / Reality → H₀ True H₀ False
Reject H₀ Type I Error (α) ✔ Correct (Power)
Fail to Reject H₀ ✔ Correct Type II Error (β)

Real‑World Consequences

Context Type I Error (False Positive) Type II Error (False Negative)
Medical Test Treat healthy patient Miss actual disease
Drug Approval Approve ineffective drug Reject effective drug
Fire Alarm False alarm evacuation Fail to detect real fire

Note

Balancing the risks

Lowering the significance level \alpha reduces the chance of Type I mistakes but increases the risk of Type II errors unless you gather more data or target a larger effect.
Choose $$ based on which error would be more costly in your scenario.

Statistical Power: Detecting True Effects 💪

Key Insight: Higher power means you’re more likely to detect a true effect when it exists. Aim for power ≥ 0.80!

Example 1: One-Sample t-test 📊

Problem Setup

Research Question: A new study technique claims to improve test scores. The current average is 75. We test 25 students using the new method.

📊 Sample Data Summary:
========================================
Sample size (n): 25
Sample mean (x̄): 77.19
Sample std (s): 7.65
Current average (μ₀): 75

Complete Hypothesis Test


🎯 DETAILED RESULTS:
==================================================
Test Statistic: t = 1.432
P-value: 0.0825
Critical Value: 1.711
Effect Size (Cohen's d): 0.286

❌ DECISION: Fail to reject H₀
📊 CONCLUSION: There is insufficient evidence (p = 0.0825) that the new study method improves test scores.
💡 PRACTICAL IMPACT: The observed difference could reasonably be due to chance.

Using Python’s Built-in Functions

🐍 PYTHON IMPLEMENTATION:
========================================
Method 1: scipy.stats.ttest_1samp
t-statistic: 1.432
p-value (two-tailed): 0.1650
p-value (one-tailed): 0.0825

Method 2: Manual with 95% Confidence Interval
95% CI: (74.03, 80.35)
Interpretation: We're 95% confident the true mean is between 74.0 and 80.4

Effect Size (Cohen's d): 0.286
Effect size interpretation: small effect

Example 2: Two-Sample t-test 📊

Problem Setup

Research Question: Compare effectiveness of two teaching methods

📊 TWO-GROUP COMPARISON:
========================================
Method A (Traditional):
  n = 30, mean = 75.45, std = 11.87

Method B (New):
  n = 28, mean = 82.72, std = 14.82

Difference in means: 7.27 points

Statistical vs Practical Significance 🤔

🔑 KEY LESSON: Statistical Significance ≠ Practical Importance
============================================================
Left: Tiny effect (0.02) but significant due to large sample
Right: Large effect (8.7) but significant with small sample

💡 Always consider BOTH statistical significance AND effect size!

Best Practices Summary 📋

Summary: Key Takeaways 🎯

🧠 Core Concepts

  • Hypothesis testing provides a framework for making decisions under uncertainty
  • P-values quantify how surprising our data would be if H₀ were true
  • Statistical significance ≠ practical importance - always consider effect size
  • Type I and II errors represent different kinds of mistakes with different costs
  • Power is the ability to detect true effects when they exist

🛠️ Practical Skills

  • Plan before you analyze - specify hypotheses and α level in advance
  • Check assumptions and use appropriate tests
  • Report effect sizes and confidence intervals, not just p-values
  • Consider practical significance alongside statistical significance
  • Be honest about limitations and acknowledge uncertainty

Resources

Thank You! 🎉

Remember: Hypothesis testing is a tool for making informed decisions under uncertainty. Use it wisely, report honestly, and always consider the practical implications of your statistical conclusions.

“The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.” - John Tukey

Next class: Regression Analysis! 📊

🏠 Back to Main Page