PSTAT 5A: Sampling and Confidence Intervals

Lecture 11 - Making Sense of Uncertainty

Author

Narjes Mathlouthi

Published

July 22, 2025

Welcome to Lecture 11

From Samples to Populations: Understanding Uncertainty

📢 Important Announcements

📝 Quiz 2 Details

When:
- 📅 Date: Friday, July 25
- ⏰ Window: 7 AM – 12 AM
- ⏳ Duration: 1 hour once started

Where: 💻 Online via Canvas

Covers: Material from Weeks 3-4

📚 What to Expect

Discrete & continuous distributions
Probability calculations
Expected value & variance
Normal distribution applications
Note: Upload photos of written work for calculation problems

What We’ll Learn Today 🎯

Big Ideas:

How sample means behave (they’re surprisingly predictable!)
Why we use intervals instead of single numbers
How confident we can be in our estimates
Planning studies for the right precision

Skills You’ll Gain:

Build confidence intervals step-by-step
Interpret what confidence really means
Choose appropriate sample sizes
Avoid common mistakes in interpretation

From Observation ➡️ Experimentation: Why Design Matters

Observational Study: Passively record what already happens — good for spotting patterns, but hidden confounders can fool us.
Controlled Experiment: Actively assign treatments; randomization balances the lurking variables so we can talk about cause.
Randomization & Replication: Twin shields that protect us from bias and one‑off flukes.
Probability Framework: Sample space, events, and probabilities let us quantify uncertainty.

The Big Picture: From Sample to Population

Think of this like trying to understand a huge library by reading just a few books

Key Terms Made Simple:

Population ($\mu$): Everyone we care about (like all students at UCSB)
Sample ($\bar x$): The people we actually measured (like 100 students we surveyed)
Parameter: The true answer we want (population mean)
Statistic: Our best guess (sample mean)

Why Point Estimates Aren’t Enough

Imagine asking: “What’s the average height of UCSB students?”

The Problem: Each sample gives a slightly different answer!

The Solution: Use confidence intervals to show the range of reasonable values.

Sampling Distributions

Think of sampling distributions like “What if we repeated our study 1000 times?”

Key Insights:

Population can be any shape (skewed, normal, weird)
One sample might not look like the population
Sample means vary from sample to sample
Many sample means form a beautiful normal distribution!

The Central Limit Theorem (CLT) 🎯

Central Limit Theorem: Larger Samples → More Normal!

The Magic Rule: No matter what shape your population is, sample means will be approximately normal!

The Rule of Thumb: With n ≥ 30, sample means will be approximately normal, regardless of population shape!

Standard Error

What it measures

Standard deviation ($\sigma$ or $s$): spread of individual data points
Standard error (SE): spread of sample means

\[ SE = \frac{\sigma}{\sqrt{n}} \qquad(\text{if } \sigma \text{ is known}) \]

\[ SE = \frac{s}{\sqrt{n}} \qquad(\text{usual case, } \sigma \text{ unknown}) \]

Key facts

SE shrinks at rate $1/\sqrt{n}$ — every 4× more observations ⇒ ½ the SE
Smaller SE ⇒ narrower confidence intervals and more powerful tests

Important

Doubling your sample size doesn’t halve the error - you need 4 times the sample for half the error!

Confidence Intervals: The Intuitive Idea

Imagine you’re trying to guess someone’s height just by looking at their shadow…

Simple Interpretation: “We’re $95\\%$ confident the true average height is between $64.3$ and $70.1$ inches.”

What Exactly Is a Confidence Interval? 🤓

A confidence interval (CI) is point estimate $\pm$ margin of error
\[ \text{CI} = \text{statistic} \;\pm\; \bigl(\text{critical value}\bigr)\times\bigl(\text{SE}\bigr) \]
The “critical value” comes from a probability model (e.g., $z^{\star}$ or $t^{\star}$).
The standard error (SE) captures sampling variation.

Frequentist meaning

If we repeated the study infinitely many times and built a $95 \%$ CI each time, about $95 \%$ of those intervals would cover the true parameter.

(For any one computed interval the parameter is fixed, the process has a $95 \%$ success rate, not the individual interval.)

What controls the width?

Variability in the data: larger $\sigma$ or $s$ ⇒ wider CI
Sample size $n$: width shrinks at rate $1/\sqrt{n}$
Confidence level: 99 % CIs are wider than 90 % CIs

Common pitfalls

Saying “there is a $95 \%$ probability that $\mu$ lies in this interval” (wrong)
Interpreting the CI as covering $95 \%$ of future observations (it does not)
Ignoring conditions (normality or CLT) before using the formulae

Building Confidence Intervals Step-by-Step

For Population Means (Most Common Case)

When we DON’T know the population standard deviation ($\sigma$):

The Formula: $\bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}}$

Breaking it down:

$\bar{x}$ = our sample average (the center of our guess)
$t^*$ = critical value (how many standard errors to go out)
$\frac{s}{\sqrt{n}}$ = standard error (our uncertainty measure)

The t-Distribution: When $\sigma$ is Unknown

Why not use the normal distribution? Because when we estimate $\sigma$ with $s$, we add extra uncertainty!

Key Points:

Small samples ($n < 30$): Use t-distribution
Large samples ($n ≥ 30$): $t$ ≈ normal
Degrees of freedom (df)= $n - 1$

Confidence Intervals for Proportions

For Yes/No questions like: “What percentage of students prefer online classes?”

The Formula: $\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$

Results: Sample: 60% prefer online (120/200)
95% CI: (53.2%, 66.8%) - We’re 95% confident the true percentage is in this range.

What Does “95% Confident” Really Mean? 🤔

The Biggest Misconception: “There’s a 95% chance the true mean is in our interval”

Actually: “If we repeated this study 100 times, about 95 of our intervals would contain the true mean”

Remember: The interval either contains the true value or it doesn’t - there’s no probability involved for a single interval!

Sample Size Planning: Getting the Precision You Want

The Question: “How many people do we need to survey?”

Key Formula for Means: $n = \left(\frac{z^* \sigma}{ME}\right)^2$

Trade-offs:

Want smaller margin of error? Need bigger sample
Want higher confidence? Need bigger sample
Want to save money? Accept wider intervals

Real Example: Student Sleep Study 😴

Research Question: How many hours do UCSB students sleep per night?

Bottom Line: We’re 95% confident that UCSB students sleep between 6.73 and 7.47 hours per night on average.

Common Mistakes to Avoid ⚠️

❌ Wrong Interpretations

“95% of students sleep in this range” - NO! This is about the population mean, not individual students

“There’s a 95% chance μ is in our interval” - NO! $\mu$ is fixed; our interval varies

“We can be 95% certain” - NO! Use “confident” not “certain”

✅ Correct Approach

“We are 95% confident the population mean is in this interval”

Key Reminders:

Check conditions before using formulas
Use t-distribution when $\sigma$ is unknown
Larger samples give narrower intervals
Higher confidence gives wider intervals

Practice Problems 📝

Problem 1: Coffee Shop Revenue

A coffee shop owner samples 36 days and finds average daily revenue of $850 with standard deviation $120.

Your turn: Calculate a 90% confidence interval for the true average daily revenue.

Given (from the prompt)
$n = 36,\; \bar{x} = \$850,\; s = \$120,\; \text{confidence level} = 90\%$

Step 1 – Conditions

$n \ge 30$ ⇒ a $t$‑interval is justified by the Central Limit Theorem.
Assume daily revenues are independent.

Step 2 – Critical value

$\alpha = 1-0.90 = 0.10 \;\Rightarrow\; \alpha/2 = 0.05$
Degrees of freedom: $df = n-1 = 35$
$\displaystyle t^{\star}_{0.90,\,35} \approx 1.690$

Step 3 – Standard error

\[SE = \frac{s}{\sqrt{n}} = \frac{120}{\sqrt{36}} = \frac{120}{6} = \$20\]

Step 4 – Margin of error

\[ME = t^{\star}\; SE = 1.690 \times \$20 = \$33.8\]

Step 5 – Confidence interval

\[\bar{x} \pm ME = 850 \pm 33.8 \;\Longrightarrow\; (\$816.2,\; \$883.8)\]

Interpretation – We are 90 % confident that the true mean daily revenue lies between $816.20 and $883.80.

Problem 2: Student Survey

In a survey of 400 students, 280 say they would recommend their major to a friend.

Your turn:

Calculate the sample proportion
Build a $95\%$ confidence interval
Check if conditions are met

Given (from the survey)
$n = 400,\; x = 280$ “yes” responses

Step 1 – Sample proportion

\[\hat{p} = \frac{x}{n} = \frac{280}{400} = 0.70\]

Step 2 – Conditions for a $z$‑interval

$n\hat{p} = 400(0.70)=280 \ge 10$
$n(1-\hat{p}) = 400(0.30)=120 \ge 10$
Both counts ≥ 10, so the normal approximation is appropriate.

Step 3 – Standard error

\[SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.70(0.30)}{400}} = \sqrt{0.000525} \approx 0.0229\]

Step 4 – Critical value & margin of error

For 95 % confidence, $z^{\star} = 1.96$

\[ME = z^{\star}\; SE = 1.96 \times 0.0229 \approx 0.045\]

Step 5 – Confidence interval

\[\hat{p} \pm ME = 0.70 \pm 0.045 \;\Longrightarrow\; (0.655,\; 0.745)\]

Interpretation – We are 95 % confident that between 65.5 % and 74.5 % of all students would recommend their major to a friend.

Looking Ahead: Hypothesis Testing 🔮

Next week we’ll learn:

How to test specific claims about populations
When to reject or fail to reject hypotheses
The connection between confidence intervals and hypothesis tests

Key Takeaways 🎯

Big Ideas:

Samples vary - confidence intervals capture this uncertainty
Larger samples give more precise estimates
Higher confidence means wider intervals
The CLT makes normal-based inference possible

Practical Skills:

Build CIs for means and proportions
Interpret confidence correctly
Plan sample sizes for desired precision
Avoid common interpretation mistakes

Comprehensive Resources 📚

📖 Required Reading

OpenIntro Statistics
- Section 1.3: Sampling principles and strategies
- Section 3.3: Confidence intervals for a mean
- Section 4.1: Central Limit Theorem
- Section 7.1.1 The distribution of $\bar x$
- Section7.1.2 Evaluating the two conditions required for modeling $\bar x$
- Section 7.1.3 Introducing the $t$-distribution

🎥 Video Resources

💻 Interactive Tools

Questions? 🤔

Office Hours: Thursday 11AM (link on Canvas)
Email: nmathlouthi@ucsb.edu

“The goal is not to eliminate uncertainty, but to understand and work with it”

🏠 Back to Main Page 🏠 Back to Main Page

Other Formats

Welcome to Lecture 11

📢 Important Announcements

📝 Quiz 2 Details

📚 What to Expect

What We’ll Learn Today 🎯

From Observation ➡️ Experimentation: Why Design Matters

The Big Picture: From Sample to Population

Why Point Estimates Aren’t Enough

Sampling Distributions

The Central Limit Theorem (CLT) 🎯

Standard Error

Confidence Intervals: The Intuitive Idea

What Exactly Is a Confidence Interval? 🤓

Building Confidence Intervals Step-by-Step

The t-Distribution: When \(\sigma\) is Unknown

Confidence Intervals for Proportions

What Does “95% Confident” Really Mean? 🤔

Sample Size Planning: Getting the Precision You Want

Real Example: Student Sleep Study 😴

Common Mistakes to Avoid ⚠️

❌ Wrong Interpretations

✅ Correct Approach

Practice Problems 📝

Problem 1: Coffee Shop Revenue

Step 1 – Conditions

Step 2 – Critical value

Step 3 – Standard error

Step 4 – Margin of error

Step 5 – Confidence interval

Problem 2: Student Survey

Step 1 – Sample proportion

Step 2 – Conditions for a \(z\)‑interval

Step 3 – Standard error

Step 4 – Critical value & margin of error

Step 5 – Confidence interval

Looking Ahead: Hypothesis Testing 🔮

Key Takeaways 🎯

Comprehensive Resources 📚

📖 Required Reading

🎥 Video Resources

💻 Interactive Tools

Questions? 🤔