PSTAT 5A: Sampling and Confidence Intervals

Lecture 11 - Making Sense of Uncertainty

Author

Narjes Mathlouthi

Published

July 22, 2025

Welcome to Lecture 11

From Samples to Populations: Understanding Uncertainty


📢 Important Announcements

📝 Quiz 2 Details

When:
- 📅 Date: Friday, July 25
- ⏰ Window: 7 AM – 12 AM
- ⏳ Duration: 1 hour once started

Where: 💻 Online via Canvas

Covers: Material from Weeks 3-4

📚 What to Expect

  • Discrete & continuous distributions
  • Probability calculations
  • Expected value & variance
  • Normal distribution applications
  • Note: Upload photos of written work for calculation problems

What We’ll Learn Today 🎯

Big Ideas:

  • How sample means behave (they’re surprisingly predictable!)
  • Why we use intervals instead of single numbers
  • How confident we can be in our estimates
  • Planning studies for the right precision

Skills You’ll Gain:

  • Build confidence intervals step-by-step
  • Interpret what confidence really means
  • Choose appropriate sample sizes
  • Avoid common mistakes in interpretation

From Observation ➡️ Experimentation: Why Design Matters

  • Observational Study: Passively record what already happens — good for spotting patterns, but hidden confounders can fool us.
  • Controlled Experiment: Actively assign treatments; randomization balances the lurking variables so we can talk about cause.
  • Randomization & Replication: Twin shields that protect us from bias and one‑off flukes.
  • Probability Framework: Sample space, events, and probabilities let us quantify uncertainty.

The Big Picture: From Sample to Population

Think of this like trying to understand a huge library by reading just a few books

Key Terms Made Simple:

  • Population (\(\mu\)): Everyone we care about (like all students at UCSB)
  • Sample (\(\bar x\)): The people we actually measured (like 100 students we surveyed)
  • Parameter: The true answer we want (population mean)
  • Statistic: Our best guess (sample mean)

Why Point Estimates Aren’t Enough

Imagine asking: “What’s the average height of UCSB students?”

The Problem: Each sample gives a slightly different answer!

The Solution: Use confidence intervals to show the range of reasonable values.


Sampling Distributions

Think of sampling distributions like “What if we repeated our study 1000 times?”

Key Insights:

  1. Population can be any shape (skewed, normal, weird)

  2. One sample might not look like the population

  3. Sample means vary from sample to sample

  4. Many sample means form a beautiful normal distribution!


The Central Limit Theorem (CLT) 🎯

Central Limit Theorem: Larger Samples → More Normal!

The Magic Rule: No matter what shape your population is, sample means will be approximately normal!

The Rule of Thumb: With n ≥ 30, sample means will be approximately normal, regardless of population shape!


Standard Error

What it measures

  • Standard deviation (\(\sigma\) or \(s\)): spread of individual data points
  • Standard error (SE): spread of sample means

\[ SE = \frac{\sigma}{\sqrt{n}} \qquad(\text{if } \sigma \text{ is known}) \]

\[ SE = \frac{s}{\sqrt{n}} \qquad(\text{usual case, } \sigma \text{ unknown}) \]

Key facts

  • SE shrinks at rate \(1/\sqrt{n}\) — every more observations ⇒ ½ the SE
  • Smaller SE ⇒ narrower confidence intervals and more powerful tests
Important

Doubling your sample size doesn’t halve the error - you need 4 times the sample for half the error!

Confidence Intervals: The Intuitive Idea

Imagine you’re trying to guess someone’s height just by looking at their shadow…

Simple Interpretation: “We’re \(95\\%\) confident the true average height is between \(64.3\) and \(70.1\) inches.”

What Exactly Is a Confidence Interval? 🤓

  • A confidence interval (CI) is point estimate \(\pm\) margin of error
    \[ \text{CI} = \text{statistic} \;\pm\; \bigl(\text{critical value}\bigr)\times\bigl(\text{SE}\bigr) \]

  • The “critical value” comes from a probability model (e.g., \(z^{\star}\) or \(t^{\star}\)).

  • The standard error (SE) captures sampling variation.

Frequentist meaning

If we repeated the study infinitely many times and built a \(95 \%\) CI each time, about \(95 \%\) of those intervals would cover the true parameter.

(For any one computed interval the parameter is fixed, the process has a \(95 \%\) success rate, not the individual interval.)

What controls the width?
  • Variability in the data: larger \(\sigma\) or \(s\) ⇒ wider CI

  • Sample size \(n\): width shrinks at rate \(1/\sqrt{n}\)

  • Confidence level: 99 % CIs are wider than 90 % CIs

Common pitfalls
  • Saying “there is a \(95 \%\) probability that \(\mu\) lies in this interval” (wrong)

  • Interpreting the CI as covering \(95 \%\) of future observations (it does not)

  • Ignoring conditions (normality or CLT) before using the formulae


Building Confidence Intervals Step-by-Step

For Population Means (Most Common Case)

When we DON’T know the population standard deviation (\(\sigma\)):

The Formula: \(\bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}}\)

Breaking it down:

  • \(\bar{x}\) = our sample average (the center of our guess)

  • \(t^*\) = critical value (how many standard errors to go out)

  • \(\frac{s}{\sqrt{n}}\) = standard error (our uncertainty measure)


The t-Distribution: When \(\sigma\) is Unknown

Why not use the normal distribution? Because when we estimate \(\sigma\) with \(s\), we add extra uncertainty!

Key Points:

  • Small samples (\(n < 30\)): Use t-distribution

  • Large samples (\(n ≥ 30\)): \(t\) ≈ normal

  • Degrees of freedom (df)= \(n - 1\)


Confidence Intervals for Proportions

For Yes/No questions like: “What percentage of students prefer online classes?”

The Formula: \(\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)

Results: Sample: 60% prefer online (120/200)
95% CI: (53.2%, 66.8%) - We’re 95% confident the true percentage is in this range.


What Does “95% Confident” Really Mean? 🤔

The Biggest Misconception: “There’s a 95% chance the true mean is in our interval”

Actually: “If we repeated this study 100 times, about 95 of our intervals would contain the true mean”

Remember: The interval either contains the true value or it doesn’t - there’s no probability involved for a single interval!


Sample Size Planning: Getting the Precision You Want

The Question: “How many people do we need to survey?”

Key Formula for Means: \(n = \left(\frac{z^* \sigma}{ME}\right)^2\)

Trade-offs:

  • Want smaller margin of error? Need bigger sample

  • Want higher confidence? Need bigger sample

  • Want to save money? Accept wider intervals


Real Example: Student Sleep Study 😴

Research Question: How many hours do UCSB students sleep per night?

Bottom Line: We’re 95% confident that UCSB students sleep between 6.73 and 7.47 hours per night on average.


Common Mistakes to Avoid ⚠️

❌ Wrong Interpretations

“95% of students sleep in this range” - NO! This is about the population mean, not individual students

“There’s a 95% chance μ is in our interval” - NO! \(\mu\) is fixed; our interval varies

“We can be 95% certain” - NO! Use “confident” not “certain”

✅ Correct Approach

“We are 95% confident the population mean is in this interval”

Key Reminders:

  • Check conditions before using formulas

  • Use t-distribution when \(\sigma\) is unknown

  • Larger samples give narrower intervals

  • Higher confidence gives wider intervals

Practice Problems 📝


Problem 1: Coffee Shop Revenue

A coffee shop owner samples 36 days and finds average daily revenue of $850 with standard deviation $120.

Your turn: Calculate a 90% confidence interval for the true average daily revenue.

Given (from the prompt)
\(n = 36,\; \bar{x} = \$850,\; s = \$120,\; \text{confidence level} = 90\%\)


Step 1 – Conditions

  • \(n \ge 30\) ⇒ a \(t\)‑interval is justified by the Central Limit Theorem.
  • Assume daily revenues are independent.

Step 2 – Critical value

\(\alpha = 1-0.90 = 0.10 \;\Rightarrow\; \alpha/2 = 0.05\)
Degrees of freedom: \(df = n-1 = 35\)
\(\displaystyle t^{\star}_{0.90,\,35} \approx 1.690\)

Step 3 – Standard error

\[SE = \frac{s}{\sqrt{n}} = \frac{120}{\sqrt{36}} = \frac{120}{6} = \$20\]

Step 4 – Margin of error

\[ME = t^{\star}\; SE = 1.690 \times \$20 = \$33.8\]

Step 5 – Confidence interval

\[\bar{x} \pm ME = 850 \pm 33.8 \;\Longrightarrow\; (\$816.2,\; \$883.8)\]

Interpretation – We are 90 % confident that the true mean daily revenue lies between $816.20 and $883.80.


Problem 2: Student Survey

In a survey of 400 students, 280 say they would recommend their major to a friend.

Your turn:

  1. Calculate the sample proportion

  2. Build a \(95\%\) confidence interval

  3. Check if conditions are met

Given (from the survey)
\(n = 400,\; x = 280\) “yes” responses


Step 1 – Sample proportion

\[\hat{p} = \frac{x}{n} = \frac{280}{400} = 0.70\]

Step 2 – Conditions for a \(z\)‑interval

\(n\hat{p} = 400(0.70)=280 \ge 10\)
\(n(1-\hat{p}) = 400(0.30)=120 \ge 10\)
Both counts ≥ 10, so the normal approximation is appropriate.

Step 3 – Standard error

\[SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.70(0.30)}{400}} = \sqrt{0.000525} \approx 0.0229\]

Step 4 – Critical value & margin of error

For 95 % confidence, \(z^{\star} = 1.96\)

\[ME = z^{\star}\; SE = 1.96 \times 0.0229 \approx 0.045\]

Step 5 – Confidence interval

\[\hat{p} \pm ME = 0.70 \pm 0.045 \;\Longrightarrow\; (0.655,\; 0.745)\]

Interpretation – We are 95 % confident that between 65.5 % and 74.5 % of all students would recommend their major to a friend.

Looking Ahead: Hypothesis Testing 🔮

Next week we’ll learn:

  • How to test specific claims about populations

  • When to reject or fail to reject hypotheses

  • The connection between confidence intervals and hypothesis tests

Key Takeaways 🎯

Big Ideas:

  1. Samples vary - confidence intervals capture this uncertainty

  2. Larger samples give more precise estimates

  3. Higher confidence means wider intervals

  4. The CLT makes normal-based inference possible

Practical Skills:

  • Build CIs for means and proportions

  • Interpret confidence correctly

  • Plan sample sizes for desired precision

  • Avoid common interpretation mistakes

Comprehensive Resources 📚

📖 Required Reading

  • OpenIntro Statistics
    • Section 1.3: Sampling principles and strategies
    • Section 3.3: Confidence intervals for a mean
    • Section 4.1: Central Limit Theorem
    • Section 7.1.1 The distribution of \(\bar x\)
    • Section7.1.2 Evaluating the two conditions required for modeling \(\bar x\)
    • Section 7.1.3 Introducing the \(t\)-distribution

🎥 Video Resources

Questions? 🤔

Office Hours: Thursday 11AM (link on Canvas)
Email: nmathlouthi@ucsb.edu

“The goal is not to eliminate uncertainty, but to understand and work with it”

🏠 Back to Main Page 🏠 Back to Main Page