PSTAT 5A: Course Wrap-Up and Quiz Review

Narjes Mathlouthi

2025-07-31

Course Overview 📊

What We’ve Learned

  • Understanding Data: Types, visualization, summary statistics
  • Sampling & Distributions: How samples represent populations
  • Statistical Inference: Confidence intervals and hypothesis testing
  • Relationships: Correlation and linear regression
  • Real-world Applications: Making data-driven decisions

Section 1: Understanding Data 📈

Types of Data

Quantitative

  • Numerical values

  • Can perform arithmetic

  • Examples: height, income, test scores

Continuous vs Discrete

  • Continuous: can take any value in range

  • Discrete: countable values

Qualitative (Categorical)

  • Non-numerical categories

  • Examples: color, major, satisfaction level

Nominal vs Ordinal

  • Nominal: no natural order

  • Ordinal: natural ordering exists

Descriptive Statistics

Measures of Center

  • Mean: \(\bar{x} = \frac{\sum x_i}{n}\)
  • Median: Middle value when ordered
  • Mode: Most frequent value

Measures of Spread

  • Range: Max - Min
  • Standard Deviation: \(s = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}\)
  • IQR: Q3 - Q1

Shape

  • Skewness: Left, right, or symmetric
  • Outliers: Values far from typical

Data Visualization

Common Plots

  • Histogram: Distribution of quantitative data
  • Boxplot: Five-number summary, outliers
  • Scatterplot: Relationship between two quantitative variables
  • Bar chart: Frequency of categorical data

Key Principles

  • Choose appropriate plot for data type
  • Clear labels and titles
  • Avoid misleading scales
  • Highlight important patterns

Section 2: Sampling & Sampling Distributions 🎯

Why Sample?

  • Populations often too large/expensive to study completely
  • Good samples can represent populations well
  • Random sampling reduces bias

Sampling Methods

  • Simple Random: Every individual has equal chance
  • Stratified: Divide into groups, sample from each
  • Cluster: Sample entire groups
  • Systematic: Every nth individual

Central Limit Theorem

The Magic of Averages ✨

For large samples (n ≥ 30), the sampling distribution of \(\bar{X}\) is approximately normal, regardless of the population distribution

Key Results

  • \(E[\bar{X}] = \mu\) (unbiased)
  • \(SD[\bar{X}] = \frac{\sigma}{\sqrt{n}}\) (standard error)
  • \(\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\) for large n

Implications

  • Larger samples → more precise estimates
  • Can make probability statements about sample means

Standard Error

  • Formula and Interpretation \[SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}} \text{ or } \frac{s}{\sqrt{n}}\]

  • For Proportions \[SE_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}\]

Key Points

  • Measures variability of sample statistic
  • Decreases as sample size increases
  • Foundation for confidence intervals and hypothesis tests

Section 3: Confidence Intervals 🎯

The Big Idea

“We are X% confident that the true parameter lies between [lower bound, upper bound]”

General Form

\[\text{Estimate} \pm \text{(Critical Value)} \times \text{(Standard Error)}\]

CI for Population Mean

When σ is Known (Z-interval) - \[\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]

When σ is Unknown (t-interval) - \[\bar{x} \pm t_{\alpha/2,df} \cdot \frac{s}{\sqrt{n}}\]

where \(df = n - 1\)

Common Critical Values

  • 90% CI: \(z_{0.05} = 1.645\), \(t_{0.05}\) (depends on df)

  • 95% CI: \(z_{0.025} = 1.96\), \(t_{0.025}\) (depends on df)

  • 99% CI: \(z_{0.005} = 2.576\), \(t_{0.005}\) (depends on df)

CI for Population Proportion

Formula

\[\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

Conditions

  • \(n\hat{p} \geq 10\) and \(n(1-\hat{p}) \geq 10\)
  • Random sample
  • Independent observations

Interpretation

  • Same logic as mean: we’re confident the true proportion lies in this interval

Sample Size Determination

For Means \[n = \left(\frac{z_{\alpha/2} \cdot \sigma}{ME}\right)^2\]

For Proportions \[n = \left(\frac{z_{\alpha/2}}{ME}\right)^2 \cdot \hat{p}(1-\hat{p})\]

Key Trade-offs

  • Higher confidence → larger sample needed

  • Smaller margin of error → larger sample needed

  • Use \(\hat{p} = 0.5\) for most conservative estimate

Section 4: Hypothesis Testing ⚖️

The Scientific Method in Statistics

  1. Formulate hypotheses
  2. Collect data
  3. Calculate test statistic
  4. Find p-value
  5. Make decision
  6. State conclusion in context

Setting Up Hypotheses

Null Hypothesis (\(H_0\))

  • Status quo, no effect, no difference

  • Contains equality (=, ≤, ≥)

  • What we assume is true

Alternative Hypothesis (\(H_a\) or \(H_1\))

  • What we want to prove

  • Contains inequality (<, >, ≠)

  • Represents change or difference

Example

  • \(H_0: \mu = 100\) vs \(H_a: \mu \neq 100\) (two-tailed)

  • \(H_0: p \leq 0.5\) vs \(H_a: p > 0.5\) (one-tailed)

Test Statistics

For Population Mean

  • When \(\sigma\) known: \(z = (\bar x - \mu_0)/(\sigma/\sqrt{n})\)

  • When \(\sigma\) unknown: \(t = (\bar x - \mu_0)/(s/\sqrt{n})\), \(df = n-1\)

For Population Proportion

  • \(z = (\hat p - p_0)/\sqrt{[p_0(1-p_0)/n]}\)

P-values and Decision Making

P-value Definition

The probability of observing a test statistic as extreme or more extreme than what we observed, assuming \(H_0\) is true

Decision Rules

  • If p-value ≤ \(\alpha\): Reject \(H_0\) (statistically significant)
  • If p-value > \(\alpha\): Fail to reject \(H_0\) (not statistically significant)

Common Significance Levels

  • \(\alpha\) = \(0.05\) (most common)
  • \(\alpha\) = \(0.01\) (more stringent)
  • \(\alpha\) = \(0.10\) (less stringent)

Types of Errors

Type I Error

  • Rejecting true \(H_0\)
  • Probability = \(\alpha\) (significance level)
  • “False positive”

Type II Error

  • Failing to reject false \(H_0\)
  • Probability = \(\beta\)
  • “False negative”
  • Power = \(1 - \beta\)

Trade-off

  • Decreasing \(\alpha\) increases \(\beta\)
  • Need larger samples to decrease both

Section 5: Linear Regression 📈

Correlation Coefficient (r) \[r = \frac{\sum(x-\bar{x})(y-\bar{y})}{\sqrt{\sum(x-\bar{x})^2 \sum(y-\bar{y})^2}}\]

  • Ranges from -1 to +1
  • Measures strength and direction of linear relationship
  • r = 0: no linear relationship
  • |r| close to 1: strong linear relationship

Regression Line

Equation

\[\hat{y} = a + bx\]

Slope (b)

\[b = \frac{\sum(x-\bar{x})(y-\bar{y})}{\sum(x-\bar{x})^2}\]

Intercept (a)

\[a = \bar{y} - b\bar{x}\]

Interpretation

  • Slope: Change in y for 1-unit increase in x

  • Intercept: Value of y when x = 0

Coefficient of Determination

R-squared (\(r^2\))

  • Proportion of variance in y explained by x
  • Ranges from 0 to 1
  • Higher values indicate better fit

Residuals

  • \(e_i = y_i - \hat{y}_i\)
  • Vertical distance from point to line
  • Used to assess model fit

Standard Error of Regression

  • \[s_e = \sqrt{\frac{\sum(y-\hat{y})^2}{n-2}}\]

Inference for Regression

Testing Slope Significance

  • \(H_0: \beta = 0\) vs \(H_a: \beta \neq 0\)
  • Test statistic: \(t = \frac{b}{SE_b}\)
  • If significant: x is useful for predicting y

Conditions for Regression Inference

  • Linear relationship
  • Independent observations
  • Normal residuals
  • Equal variance (homoscedasticity)

Section 6: Quiz Review Strategy 🎯

What to Focus On

  1. Know your formulas (but understand when to use them)
  2. Practice identifying appropriate procedures
  3. Check conditions before applying methods
  4. Interpret results in context
  5. Show your work clearly

Common Mistakes to Avoid ⚠️

Confidence Intervals

  • Saying “probability the parameter is in interval”
  • Confusing confidence level with confidence
  • Wrong critical values (z vs t)

Hypothesis Testing

  • Confusing “fail to reject” with “accept”
  • Wrong tail for p-value calculation
  • Not stating conclusions in context

Regression

  • Claiming causation from correlation
  • Extrapolating beyond data range
  • Ignoring model assumptions

Problem-Solving Approach 🧠

Step-by-Step Method

  1. Identify what type of problem
  2. Check conditions/assumptions
  3. State hypotheses (if testing)
  4. Calculate test statistic or interval
  5. Find p-value or interpret interval
  6. Make decision (if testing)
  7. Conclude in context of problem

Key Formulas Summary

Standard Errors

  • \(SE_{\bar x} = \sigma/\sqrt{n}\) or \(s/\sqrt{n}\)

  • \(SE_{\hat p}= \sqrt{[p(1-p)/n]}\)

Confidence Intervals

  • Mean: \(\bar x ± t_{(\alpha/2)} \times (s/\sqrt{n})\)

  • Proportion: \(\hat p ± z_(\alpha/2) \times \sqrt{[\hat p(1-\hat p)/n]}\)

Test Statistics

  • \(t = (\bar x - \mu_{0})/(s/\sqrt{n})\)

  • \(z = (\hat p - p_{0})/\sqrt{[p_0(1-p_0)/n]}\)

Calculator/Software Tips 💻

Know How To:

  • Calculate descriptive statistics
  • Find normal/t probabilities
  • Perform hypothesis tests
  • Create confidence intervals
  • Do regression analysis

Double-Check:

  • Input values correctly
  • Choose right test/interval
  • Interpret output properly

Final Tips for Success 🌟

Before the Quiz

  • Review worksheet problems
  • Practice with different scenarios
  • Make a cheat sheet of key formulas
  • Understand when to use each method

During the Quiz

  • Read problems carefully
  • Show all work
  • Check your answers make sense
  • Manage your time wisely

Note

  • Statistics is about making decisions with uncertainty
  • Context matters as much as calculations
  • Practice makes perfect!

Questions? 🤔

Remember:

  • Confidence intervals: estimate unknown parameters
  • Hypothesis tests: make decisions about claims
  • Regression: model relationships between variables
  • Sampling: connect samples to populations

Good luck on Quiz 3! 🍀

🏠 Back to Main Page