PSTAT 5A: Sampling and Confidence Intervals

Lecture 9

Author

Narjes Mathlouthi

Published

July 29, 2025

🏠

Welcome to Lecture 9

Sampling and Confidence Intervals From samples to populations: making inferences with uncertainty

Today’s Learning Objectives

By the end of this lecture, you will be able to:

  • Understand sampling distributions and their properties (Section 1.2)
  • Apply the Central Limit Theorem to sampling (Section 1.4)
  • Construct confidence intervals for population means (Section 1.6)
  • Construct confidence intervals for population proportions (Section 1.8)
  • Interpret confidence intervals correctly (Section 1.5)
  • Determine appropriate sample sizes for desired precision
  • Use python to calculate confidence intervals
  • Distinguish between different types of sampling methods

The Big Picture: Statistical Inference

Population vs Sample

  • Population: All individuals of interest
  • Sample: Subset we actually observe
  • Parameter: Population characteristic (\(\mu\), \(p\))
  • Statistic: Sample characteristic (\(\bar{x}\), \(\hat{p}\))

Goal: Use sample statistics to estimate population parameters

Why Confidence Intervals?

  • Point estimates are rarely exactly correct
  • Interval estimates capture uncertainty
  • Confidence level quantifies our certainty
  • Margin of error shows precision

Key Insight: We trade precision for confidence

Sampling Distributions

A sampling distribution is the distribution of a statistic (like \(\bar{x}\)) across all possible samples of size \(n\).

Key Properties:

Center:
\(E[\bar{X}] = \mu\) (unbiased)

Spread:
\(SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}\)

Shape:
Approaches normal as \(n\) increases (Central Limit Theorem)

Standard Error vs Standard Deviation:

  • \(\sigma\): spread of individual observations
  • \(SE = \frac{\sigma}{\sqrt{n}}\): spread of sample means

Drag the slider to see how sample size affects the sampling distribution

Central Limit Theorem in Action

Population μ: - | Sample Means μ: - | Standard Error: -

Confidence Intervals: The Concept

What is a Confidence Interval? A confidence interval provides a range of plausible values for a population parameter. 95% Confidence Interval: If we repeated our sampling process many times, about \(95\%\) of the intervals we construct would contain the true population parameter.

Click to generate new 95% confidence intervals

Confidence Intervals for Population Means

🎯 When σ is Known:

\[\bar{x} \pm z^* \cdot \frac{\sigma}{\sqrt{n}}\]

When \(\sigma\) is Unknown (more common):

\[\bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}}\]

Key Components:

  • \(\bar{x}\): sample mean
  • \(t^*\): critical value (df = n-1)
  • \(\frac{s}{\sqrt{n}}\): standard error

Common Confidence Levels:

  • 90%: z* = 1.645, more precise
  • 95%: z* = 1.96, most common
  • 99%: z* = 2.576, more confident

Conditions Required:

  • Random sampling
  • Nearly normal population OR n ≥ 30
  • Independent observations

Interactive CI Demo: Confidence Intervals for Means

Current CI: Generate a sample to see CI
Captures μ? - | Margin of Error: -

Confidence Intervals for Population Proportions

🎯 Formula:

\[\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

Key Components:

  • \(\hat{p} = \frac{x}{n}\): sample proportion
  • \(z^*\): critical value
  • \(\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\): standard error

Conditions Required:

  • Random sampling
  • \(n\hat{p} \geq 10\) and \(n(1-\hat{p}) \geq 10\)
  • Independent observations
  • Population at least 10× sample size

Conservative Approach:

Use \(\hat{p} = 0.5\) for planning when true proportion unknown (maximizes margin of error)

Interactive CI Demo: Confidence Intervals for Proportions

Current CI: Generate a sample to see CI
Captures p? - | Sample Proportion: -

Practice Problem 1: CI for Mean

A random sample of 25 college students shows a mean daily screen time of 6.2 hours with a standard deviation of 1.8 hours. (a) Construct a 95% confidence interval for the mean daily screen time. (b) Interpret the confidence interval in context. (c) What would happen to the interval width if we used 99% confidence instead?

Solution. (a)
Given: \(n = 25\), \(\bar{x} = 6.2\), \(s = 1.8\), 95% confidence

For \(df = 24\), \(t^* = 2.064\)

\(SE = \frac{s}{\sqrt{n}} = \frac{1.8}{\sqrt{25}} = 0.36\)

\(CI = 6.2 \pm 2.064 \times 0.36 = 6.2 \pm 0.743 = (5.46, 6.94)\) hours

(b)
We are 95% confident that the true mean daily screen time for all college students is between \(5.46\) and \(6.94\) hours.

(c)
For 99% confidence, we use \(t^* = 2.797\), giving a wider interval: \((5.19, 7.21)\) hours.

Practice Problem 2: CI for Proportion

In a survey of 400 voters, 240 support a particular candidate. (a) Construct a 90% confidence interval for the true proportion of supporters. (b) Check if the conditions for inference are met. (c) How large a sample would be needed for a margin of error of 0.03 with 95% confidence?

Solution. (a)
\(\hat{p} = \frac{240}{400} = 0.6\), \(n = 400\), 90% confidence, \(z^* = 1.645\)

\(SE = \sqrt{\frac{0.6 \times 0.4}{400}} = \sqrt{\frac{0.24}{400}} = 0.0245\)

\(CI = 0.6 \pm 1.645 \times 0.0245 = 0.6 \pm 0.0403 = (0.560, 0.640)\)

(b)

Check conditions: \(n\hat{p} = 400 \times 0.6 = 240 \geq 10\)
\(n(1-\hat{p}) = 400 \times 0.4 = 160 \geq 10\)

(c)

Sample size calculation:

\(n = \frac{(z^*)^2 \hat{p}(1-\hat{p})}{ME^2} = \frac{(1.96)^2 \times 0.6 \times 0.4}{(0.03)^2} = \frac{0.9216}{0.0009} = 1024\) people

Practice Problem 3: Sample Size Planning

A market researcher wants to estimate the average amount spent on coffee per week by college students. (a) How large a sample is needed for a 95% CI with margin of error $2 if \(\sigma\) = $8? (b) If the budget only allows for 100 students, what confidence level gives a $2 margin of error? (c) What’s the trade-off between sample size, confidence level, and precision?

Solution. (a)
For means:
\(n = \frac{(z^*)^2 \sigma^2}{ME^2} = \frac{(1.96)^2 \times 8^2}{2^2} = \frac{245.86}{4} = 62\) students

(b)
With \(n = 100\):
\(ME = z^* \frac{\sigma}{\sqrt{n}} = z^* \frac{8}{\sqrt{100}} = 0.8 z^*\)

For \(ME = 2\):
\(z^* = \frac{2}{0.8} = 2.5\),
which corresponds to about 98.8% confidence

(c) Trade-offs:

  • Higher confidence \(\rightarrow\) wider intervals (less precision)

  • Larger sample \(\rightarrow\) narrower intervals (more precision)

  • Lower margin of error \(\rightarrow\) need larger sample or lower confidence

Common Mistakes and Misconceptions

Interpretation Errors

❌ Wrong: “\(95\%\) of the data falls in this interval”

✅ Right: “We’re \(95\%\) confident the parameter is in this interval”

❌ Wrong: “There’s a \(95\%\) chance \(\mu\) is in this interval”

✅ Right: “\(95\%\) of such intervals contain \(\mu\)

Technical Errors

  • Using \(z*\) when σ is unknown and \(n < 30\)

  • Forgetting to check conditions

  • Confusing standard error with standard deviation

  • Using wrong degrees of freedom for t-distribution

Remember: The confidence level refers to the long-run proportion of intervals that capture the parameter!

Sample Size and Margin of Error Relationships

Sample Size vs Margin of Error

Required Sample Size: - | Resulting ME: -

Types of Sampling Methods

Method Description Advantages Disadvantages
Simple Random Every individual has equal chance Unbiased, simple May not represent subgroups
Stratified Sample from each subgroup Ensures representation More complex
Cluster Sample entire groups Cost-effective for spread populations Higher variability
Systematic Every k-th individual Simple to implement Can miss patterns
Convenience Easily accessible individuals Quick and cheap Highly biased
Note

Sampling Method Matters: Only probability sampling methods allow for valid statistical inference!

Confidence Intervals in Practice

When to Use Each Type

Means: Continuous data (height, income, test scores)

Proportions: Categorical data (yes/no, success/failure)

Choosing Confidence Level

  • 90%: Quick estimates, less critical decisions

  • 95%: Standard in most research

  • 99%: High-stakes decisions, medical trials

Real-World Applications

  • Political polls: Proportion confidence intervals

  • Quality control: Mean confidence intervals

  • Medical research: Both types with high confidence

  • Business analytics: Varies by decision importance

Communication Tips

  • Always include the confidence level

  • State what the interval estimates

  • Acknowledge the uncertainty

  • Consider practical significance

Key Takeaways

Main Concepts

  • Sampling distributions follow predictable patterns

  • Confidence intervals quantify uncertainty

  • Central Limit Theorem makes normal-based inference possible

  • Sample size directly affects precision

Practical Guidelines Choose appropriate methods based on:

  • Data type (continuous vs categorical)

  • Sample size (use t when σ unknown)

  • Desired precision (affects sample size)

  • Confidence level (affects interval width)

Key Principle Statistical inference allows us to make informed decisions about populations using sample data, while properly accounting for uncertainty.

Looking Ahead

Next Lecture: Hypothesis Testing

Topics we’ll cover:

  • Null and alternative hypotheses

  • Test statistics and p-values

  • Type I and Type II errors

Connection: Confidence intervals and hypothesis tests are two sides of the same statistical inference coin

Questions?

  • Office Hours: 11AM on Thursday (link on Canvas)

  • Email: nmathlouthi@ucsb.edu

  • Next Class: Hypothesis Testing and Statistical Significance

Resources