PSTAT 5A: Sampling and Confidence Intervals
Lecture 9
Narjes Mathlouthi
July 29, 2025
Welcome to Lecture 9
Sampling and Confidence Intervals From samples to populations: making inferences with uncertainty
Today’s Learning Objectives
By the end of this lecture, you will be able to:
- Understand sampling distributions and their properties (Section 1.2)
- Apply the Central Limit Theorem to sampling (Section 1.4)
- Construct confidence intervals for population means (Section 1.6)
- Construct confidence intervals for population proportions (Section 1.8)
- Interpret confidence intervals correctly (Section 1.5)
- Determine appropriate sample sizes for desired precision
- Use
python
to calculate confidence intervals - Distinguish between different types of sampling methods
The Big Picture: Statistical Inference
Population vs Sample
- Population: All individuals of interest
- Sample: Subset we actually observe
- Parameter: Population characteristic (\(\mu\), \(p\))
- Statistic: Sample characteristic (\(\bar{x}\), \(\hat{p}\))
Goal: Use sample statistics to estimate population parameters
Why Confidence Intervals?
- Point estimates are rarely exactly correct
- Interval estimates capture uncertainty
- Confidence level quantifies our certainty
- Margin of error shows precision
Key Insight: We trade precision for confidence
Sampling Distributions
A sampling distribution is the distribution of a statistic (like \(\bar{x}\)) across all possible samples of size \(n\).
Key Properties:
Center:
\(E[\bar{X}] = \mu\) (unbiased)
Spread:
\(SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}\)
Shape:
Approaches normal as \(n\) increases (Central Limit Theorem)
Standard Error vs Standard Deviation:
- \(\sigma\): spread of individual observations
- \(SE = \frac{\sigma}{\sqrt{n}}\): spread of sample means
Drag the slider to see how sample size affects the sampling distribution
Central Limit Theorem in Action
Population μ: - | Sample Means μ: - | Standard Error: -
Confidence Intervals: The Concept
What is a Confidence Interval? A confidence interval provides a range of plausible values for a population parameter. 95% Confidence Interval: If we repeated our sampling process many times, about \(95\%\) of the intervals we construct would contain the true population parameter.
Click to generate new 95% confidence intervals
Confidence Intervals for Population Means
🎯 When σ is Known:
\[\bar{x} \pm z^* \cdot \frac{\sigma}{\sqrt{n}}\]
When \(\sigma\) is Unknown (more common):
\[\bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}}\]
Key Components:
- \(\bar{x}\): sample mean
- \(t^*\): critical value (df = n-1)
- \(\frac{s}{\sqrt{n}}\): standard error
Common Confidence Levels:
- 90%: z* = 1.645, more precise
- 95%: z* = 1.96, most common
- 99%: z* = 2.576, more confident
Conditions Required:
- Random sampling
- Nearly normal population OR n ≥ 30
- Independent observations
Interactive CI Demo: Confidence Intervals for Means
Current CI: Generate a sample to see CI
Captures μ? - | Margin of Error: -
Confidence Intervals for Population Proportions
🎯 Formula:
\[\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]
Key Components:
- \(\hat{p} = \frac{x}{n}\): sample proportion
- \(z^*\): critical value
- \(\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\): standard error
Conditions Required:
- Random sampling
- \(n\hat{p} \geq 10\) and \(n(1-\hat{p}) \geq 10\)
- Independent observations
- Population at least 10× sample size
Conservative Approach:
Use \(\hat{p} = 0.5\) for planning when true proportion unknown (maximizes margin of error)
Interactive CI Demo: Confidence Intervals for Proportions
Current CI: Generate a sample to see CI
Captures p? - | Sample Proportion: -
Practice Problem 1: CI for Mean
A random sample of 25 college students shows a mean daily screen time of 6.2 hours with a standard deviation of 1.8 hours. (a) Construct a 95% confidence interval for the mean daily screen time. (b) Interpret the confidence interval in context. (c) What would happen to the interval width if we used 99% confidence instead?
Solution. (a)
Given: \(n = 25\), \(\bar{x} = 6.2\), \(s = 1.8\), 95% confidence
For \(df = 24\), \(t^* = 2.064\)
\(SE = \frac{s}{\sqrt{n}} = \frac{1.8}{\sqrt{25}} = 0.36\)
\(CI = 6.2 \pm 2.064 \times 0.36 = 6.2 \pm 0.743 = (5.46, 6.94)\) hours
(b)
We are 95% confident that the true mean daily screen time for all college students is between \(5.46\) and \(6.94\) hours.
(c)
For 99% confidence, we use \(t^* = 2.797\), giving a wider interval: \((5.19, 7.21)\) hours.
Practice Problem 2: CI for Proportion
In a survey of 400 voters, 240 support a particular candidate. (a) Construct a 90% confidence interval for the true proportion of supporters. (b) Check if the conditions for inference are met. (c) How large a sample would be needed for a margin of error of 0.03 with 95% confidence?
Solution. (a)
\(\hat{p} = \frac{240}{400} = 0.6\), \(n = 400\), 90% confidence, \(z^* = 1.645\)
\(SE = \sqrt{\frac{0.6 \times 0.4}{400}} = \sqrt{\frac{0.24}{400}} = 0.0245\)
\(CI = 0.6 \pm 1.645 \times 0.0245 = 0.6 \pm 0.0403 = (0.560, 0.640)\)
(b)
Check conditions: \(n\hat{p} = 400 \times 0.6 = 240 \geq 10\) ✓
\(n(1-\hat{p}) = 400 \times 0.4 = 160 \geq 10\) ✓
(c)
Sample size calculation:
\(n = \frac{(z^*)^2 \hat{p}(1-\hat{p})}{ME^2} = \frac{(1.96)^2 \times 0.6 \times 0.4}{(0.03)^2} = \frac{0.9216}{0.0009} = 1024\) people
Practice Problem 3: Sample Size Planning
A market researcher wants to estimate the average amount spent on coffee per week by college students. (a) How large a sample is needed for a 95% CI with margin of error $2 if \(\sigma\) = $8? (b) If the budget only allows for 100 students, what confidence level gives a $2 margin of error? (c) What’s the trade-off between sample size, confidence level, and precision?
Solution. (a)
For means:
\(n = \frac{(z^*)^2 \sigma^2}{ME^2} =
\frac{(1.96)^2 \times 8^2}{2^2} =
\frac{245.86}{4} = 62\) students
(b)
With \(n = 100\):
\(ME = z^* \frac{\sigma}{\sqrt{n}} =
z^* \frac{8}{\sqrt{100}} =
0.8 z^*\)
For \(ME = 2\):
\(z^* = \frac{2}{0.8} = 2.5\),
which corresponds to about 98.8% confidence
(c) Trade-offs:
Higher confidence \(\rightarrow\) wider intervals (less precision)
Larger sample \(\rightarrow\) narrower intervals (more precision)
Lower margin of error \(\rightarrow\) need larger sample or lower confidence
Common Mistakes and Misconceptions
Interpretation Errors
❌ Wrong: “\(95\%\) of the data falls in this interval”
✅ Right: “We’re \(95\%\) confident the parameter is in this interval”
❌ Wrong: “There’s a \(95\%\) chance \(\mu\) is in this interval”
✅ Right: “\(95\%\) of such intervals contain \(\mu\)”
Technical Errors
Using \(z*\) when σ is unknown and \(n < 30\)
Forgetting to check conditions
Confusing standard error with standard deviation
Using wrong degrees of freedom for t-distribution
Remember: The confidence level refers to the long-run proportion of intervals that capture the parameter!
Sample Size and Margin of Error Relationships
Required Sample Size: - | Resulting ME: -
Types of Sampling Methods
Method | Description | Advantages | Disadvantages |
---|---|---|---|
Simple Random | Every individual has equal chance | Unbiased, simple | May not represent subgroups |
Stratified | Sample from each subgroup | Ensures representation | More complex |
Cluster | Sample entire groups | Cost-effective for spread populations | Higher variability |
Systematic | Every k-th individual | Simple to implement | Can miss patterns |
Convenience | Easily accessible individuals | Quick and cheap | Highly biased |
Sampling Method Matters: Only probability sampling methods allow for valid statistical inference!
Confidence Intervals in Practice
When to Use Each Type
Means: Continuous data (height, income, test scores)
Proportions: Categorical data (yes/no, success/failure)
Choosing Confidence Level
90%: Quick estimates, less critical decisions
95%: Standard in most research
99%: High-stakes decisions, medical trials
Real-World Applications
Political polls: Proportion confidence intervals
Quality control: Mean confidence intervals
Medical research: Both types with high confidence
Business analytics: Varies by decision importance
Communication Tips
Always include the confidence level
State what the interval estimates
Acknowledge the uncertainty
Consider practical significance
Key Takeaways
Main Concepts
Sampling distributions follow predictable patterns
Confidence intervals quantify uncertainty
Central Limit Theorem makes normal-based inference possible
Sample size directly affects precision
Practical Guidelines Choose appropriate methods based on:
Data type (continuous vs categorical)
Sample size (use t when σ unknown)
Desired precision (affects sample size)
Confidence level (affects interval width)
Key Principle Statistical inference allows us to make informed decisions about populations using sample data, while properly accounting for uncertainty.
Looking Ahead
Next Lecture: Hypothesis Testing
Topics we’ll cover:
Null and alternative hypotheses
Test statistics and p-values
Type I and Type II errors
Connection: Confidence intervals and hypothesis tests are two sides of the same statistical inference coin
Questions?
Office Hours: 11AM on Thursday (link on Canvas)
Email: nmathlouthi@ucsb.edu
Next Class: Hypothesis Testing and Statistical Significance