PSTAT 5A: Sampling and Confidence Intervals
Lecture 11 - Making Sense of Uncertainty
Welcome to Lecture 11
From Samples to Populations: Understanding Uncertainty
📢 Important Announcements
📝 Quiz 2 Details
When:
- 📅 Date: Friday, July 25
- ⏰ Window: 7 AM – 12 AM
- ⏳ Duration: 1 hour once started
Where: 💻 Online via Canvas
Covers: Material from Weeks 3-4
📚 What to Expect
- Discrete & continuous distributions
- Probability calculations
- Expected value & variance
- Normal distribution applications
- Note: Upload photos of written work for calculation problems
What We’ll Learn Today 🎯
Big Ideas:
- How sample means behave (they’re surprisingly predictable!)
- Why we use intervals instead of single numbers
- How confident we can be in our estimates
- Planning studies for the right precision
Skills You’ll Gain:
- Build confidence intervals step-by-step
- Interpret what confidence really means
- Choose appropriate sample sizes
- Avoid common mistakes in interpretation
From Observation ➡️ Experimentation: Why Design Matters
- Observational Study: Passively record what already happens — good for spotting patterns, but hidden confounders can fool us.
- Controlled Experiment: Actively assign treatments; randomization balances the lurking variables so we can talk about cause.
- Randomization & Replication: Twin shields that protect us from bias and one‑off flukes.
- Probability Framework: Sample space, events, and probabilities let us quantify uncertainty.
The Big Picture: From Sample to Population
Think of this like trying to understand a huge library by reading just a few books
Key Terms Made Simple:
- Population (\(\mu\)): Everyone we care about (like all students at UCSB)
- Sample (\(\bar x\)): The people we actually measured (like 100 students we surveyed)
- Parameter: The true answer we want (population mean)
- Statistic: Our best guess (sample mean)
Why Point Estimates Aren’t Enough
Imagine asking: “What’s the average height of UCSB students?”
The Problem: Each sample gives a slightly different answer!
The Solution: Use confidence intervals to show the range of reasonable values.
Sampling Distributions
Think of sampling distributions like “What if we repeated our study 1000 times?”
Key Insights:
Population can be any shape (skewed, normal, weird)
One sample might not look like the population
Sample means vary from sample to sample
Many sample means form a beautiful normal distribution!
The Central Limit Theorem (CLT) 🎯
Central Limit Theorem: Larger Samples → More Normal!
The Magic Rule: No matter what shape your population is, sample means will be approximately normal!
The Rule of Thumb: With n ≥ 30, sample means will be approximately normal, regardless of population shape!
Standard Error
What it measures
- Standard deviation (\(\sigma\) or \(s\)): spread of individual data points
- Standard error (SE): spread of sample means
\[ SE = \frac{\sigma}{\sqrt{n}} \qquad(\text{if } \sigma \text{ is known}) \]
\[ SE = \frac{s}{\sqrt{n}} \qquad(\text{usual case, } \sigma \text{ unknown}) \]
Key facts
- SE shrinks at rate \(1/\sqrt{n}\) — every 4× more observations ⇒ ½ the SE
- Smaller SE ⇒ narrower confidence intervals and more powerful tests
Doubling your sample size doesn’t halve the error - you need 4 times the sample for half the error!
Confidence Intervals: The Intuitive Idea
Imagine you’re trying to guess someone’s height just by looking at their shadow…
Simple Interpretation: “We’re \(95\\%\) confident the true average height is between \(64.3\) and \(70.1\) inches.”
What Exactly Is a Confidence Interval? 🤓
A confidence interval (CI) is point estimate \(\pm\) margin of error
\[ \text{CI} = \text{statistic} \;\pm\; \bigl(\text{critical value}\bigr)\times\bigl(\text{SE}\bigr) \]The “critical value” comes from a probability model (e.g., \(z^{\star}\) or \(t^{\star}\)).
The standard error (SE) captures sampling variation.
Frequentist meaning
If we repeated the study infinitely many times and built a \(95 \%\) CI each time, about \(95 \%\) of those intervals would cover the true parameter.
(For any one computed interval the parameter is fixed, the process has a \(95 \%\) success rate, not the individual interval.)
Variability in the data: larger \(\sigma\) or \(s\) ⇒ wider CI
Sample size \(n\): width shrinks at rate \(1/\sqrt{n}\)
Confidence level: 99 % CIs are wider than 90 % CIs
Saying “there is a \(95 \%\) probability that \(\mu\) lies in this interval” (wrong)
Interpreting the CI as covering \(95 \%\) of future observations (it does not)
Ignoring conditions (normality or CLT) before using the formulae
Building Confidence Intervals Step-by-Step
For Population Means (Most Common Case)
When we DON’T know the population standard deviation (\(\sigma\)):
The Formula: \(\bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}}\)
Breaking it down:
\(\bar{x}\) = our sample average (the center of our guess)
\(t^*\) = critical value (how many standard errors to go out)
\(\frac{s}{\sqrt{n}}\) = standard error (our uncertainty measure)
The t-Distribution: When \(\sigma\) is Unknown
Why not use the normal distribution? Because when we estimate \(\sigma\) with \(s\), we add extra uncertainty!
Key Points:
Small samples (\(n < 30\)): Use t-distribution
Large samples (\(n ≥ 30\)): \(t\) ≈ normal
Degrees of freedom (df)= \(n - 1\)
Confidence Intervals for Proportions
For Yes/No questions like: “What percentage of students prefer online classes?”
The Formula: \(\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)
Results: Sample: 60% prefer online (120/200)
95% CI: (53.2%, 66.8%) - We’re 95% confident the true percentage is in this range.
What Does “95% Confident” Really Mean? 🤔
The Biggest Misconception: “There’s a 95% chance the true mean is in our interval”
Actually: “If we repeated this study 100 times, about 95 of our intervals would contain the true mean”
Remember: The interval either contains the true value or it doesn’t - there’s no probability involved for a single interval!
Sample Size Planning: Getting the Precision You Want
The Question: “How many people do we need to survey?”
Key Formula for Means: \(n = \left(\frac{z^* \sigma}{ME}\right)^2\)
Trade-offs:
Want smaller margin of error? Need bigger sample
Want higher confidence? Need bigger sample
Want to save money? Accept wider intervals
Real Example: Student Sleep Study 😴
Research Question: How many hours do UCSB students sleep per night?
Bottom Line: We’re 95% confident that UCSB students sleep between 6.73 and 7.47 hours per night on average.
Common Mistakes to Avoid ⚠️
❌ Wrong Interpretations
“95% of students sleep in this range” - NO! This is about the population mean, not individual students
“There’s a 95% chance μ is in our interval” - NO! \(\mu\) is fixed; our interval varies
“We can be 95% certain” - NO! Use “confident” not “certain”
✅ Correct Approach
“We are 95% confident the population mean is in this interval”
Key Reminders:
Check conditions before using formulas
Use t-distribution when \(\sigma\) is unknown
Larger samples give narrower intervals
Higher confidence gives wider intervals
Practice Problems 📝
Problem 1: Coffee Shop Revenue
A coffee shop owner samples 36 days and finds average daily revenue of $850 with standard deviation $120.
Your turn: Calculate a 90% confidence interval for the true average daily revenue.
Given (from the prompt)
\(n = 36,\; \bar{x} = \$850,\; s = \$120,\; \text{confidence level} = 90\%\)
Step 1 – Conditions
- \(n \ge 30\) ⇒ a \(t\)‑interval is justified by the Central Limit Theorem.
- Assume daily revenues are independent.
Step 2 – Critical value
\(\alpha = 1-0.90 = 0.10 \;\Rightarrow\; \alpha/2 = 0.05\)
Degrees of freedom: \(df = n-1 = 35\)
\(\displaystyle t^{\star}_{0.90,\,35} \approx 1.690\)
Step 3 – Standard error
\[SE = \frac{s}{\sqrt{n}} = \frac{120}{\sqrt{36}} = \frac{120}{6} = \$20\]
Step 4 – Margin of error
\[ME = t^{\star}\; SE = 1.690 \times \$20 = \$33.8\]
Step 5 – Confidence interval
\[\bar{x} \pm ME = 850 \pm 33.8 \;\Longrightarrow\; (\$816.2,\; \$883.8)\]
Interpretation – We are 90 % confident that the true mean daily revenue lies between $816.20 and $883.80.
Problem 2: Student Survey
In a survey of 400 students, 280 say they would recommend their major to a friend.
Your turn:
Calculate the sample proportion
Build a \(95\%\) confidence interval
Check if conditions are met
Given (from the survey)
\(n = 400,\; x = 280\) “yes” responses
Step 1 – Sample proportion
\[\hat{p} = \frac{x}{n} = \frac{280}{400} = 0.70\]
Step 2 – Conditions for a \(z\)‑interval
\(n\hat{p} = 400(0.70)=280 \ge 10\)
\(n(1-\hat{p}) = 400(0.30)=120 \ge 10\)
Both counts ≥ 10, so the normal approximation is appropriate.
Step 3 – Standard error
\[SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.70(0.30)}{400}} = \sqrt{0.000525} \approx 0.0229\]
Step 4 – Critical value & margin of error
For 95 % confidence, \(z^{\star} = 1.96\)
\[ME = z^{\star}\; SE = 1.96 \times 0.0229 \approx 0.045\]
Step 5 – Confidence interval
\[\hat{p} \pm ME = 0.70 \pm 0.045 \;\Longrightarrow\; (0.655,\; 0.745)\]
Interpretation – We are 95 % confident that between 65.5 % and 74.5 % of all students would recommend their major to a friend.
Looking Ahead: Hypothesis Testing 🔮
Next week we’ll learn:
How to test specific claims about populations
When to reject or fail to reject hypotheses
The connection between confidence intervals and hypothesis tests
Key Takeaways 🎯
Big Ideas:
Samples vary - confidence intervals capture this uncertainty
Larger samples give more precise estimates
Higher confidence means wider intervals
The CLT makes normal-based inference possible
Practical Skills:
Build CIs for means and proportions
Interpret confidence correctly
Plan sample sizes for desired precision
Avoid common interpretation mistakes
Comprehensive Resources 📚
📖 Required Reading
- OpenIntro Statistics
- Section 1.3: Sampling principles and strategies
- Section 3.3: Confidence intervals for a mean
- Section 4.1: Central Limit Theorem
- Section 7.1.1 The distribution of \(\bar x\)
- Section7.1.2 Evaluating the two conditions required for modeling \(\bar x\)
- Section 7.1.3 Introducing the \(t\)-distribution
🎥 Video Resources
Questions? 🤔
Office Hours: Thursday 11AM (link on Canvas)
Email: nmathlouthi@ucsb.edu
“The goal is not to eliminate uncertainty, but to understand and work with it”