PSTAT 5A: Course Wrap-Up and Quiz Review

Narjes Mathlouthi

2025-07-31

Course Overview 📊

What We’ve Learned

Understanding Data: Types, visualization, summary statistics
Sampling & Distributions: How samples represent populations
Statistical Inference: Confidence intervals and hypothesis testing
Relationships: Correlation and linear regression
Real-world Applications: Making data-driven decisions

Section 1: Understanding Data 📈

Types of Data

Quantitative

Numerical values
Can perform arithmetic
Examples: height, income, test scores

Continuous vs Discrete

Continuous: can take any value in range
Discrete: countable values

Qualitative (Categorical)

Non-numerical categories
Examples: color, major, satisfaction level

Nominal vs Ordinal

Nominal: no natural order
Ordinal: natural ordering exists

Descriptive Statistics

Measures of Center

Mean: \(\bar{x} = \frac{\sum x_i}{n}\)
Median: Middle value when ordered
Mode: Most frequent value

Measures of Spread

Range: Max - Min
Standard Deviation: \(s = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}\)
IQR: Q3 - Q1

Shape

Skewness: Left, right, or symmetric
Outliers: Values far from typical

Data Visualization

Common Plots

Histogram: Distribution of quantitative data
Boxplot: Five-number summary, outliers
Scatterplot: Relationship between two quantitative variables
Bar chart: Frequency of categorical data

Key Principles

Choose appropriate plot for data type
Clear labels and titles
Avoid misleading scales
Highlight important patterns

Section 2: Sampling & Sampling Distributions 🎯

Why Sample?

Populations often too large/expensive to study completely
Good samples can represent populations well
Random sampling reduces bias

Sampling Methods

Simple Random: Every individual has equal chance
Stratified: Divide into groups, sample from each
Cluster: Sample entire groups
Systematic: Every nth individual

Central Limit Theorem

The Magic of Averages ✨

For large samples (n ≥ 30), the sampling distribution of \(\bar{X}\) is approximately normal, regardless of the population distribution

Key Results

\(E[\bar{X}] = \mu\) (unbiased)
\(SD[\bar{X}] = \frac{\sigma}{\sqrt{n}}\) (standard error)
\(\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\) for large n

Implications

Larger samples → more precise estimates
Can make probability statements about sample means

Standard Error

Formula and Interpretation \[SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}} \text{ or } \frac{s}{\sqrt{n}}\]
For Proportions \[SE_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}\]

Key Points

Measures variability of sample statistic
Decreases as sample size increases
Foundation for confidence intervals and hypothesis tests

Section 3: Confidence Intervals 🎯

The Big Idea

“We are X% confident that the true parameter lies between [lower bound, upper bound]”

General Form

\[\text{Estimate} \pm \text{(Critical Value)} \times \text{(Standard Error)}\]

CI for Population Mean

When σ is Known (Z-interval) - \[\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]

When σ is Unknown (t-interval) - \[\bar{x} \pm t_{\alpha/2,df} \cdot \frac{s}{\sqrt{n}}\]

where \(df = n - 1\)

Common Critical Values

90% CI: \(z_{0.05} = 1.645\), \(t_{0.05}\) (depends on df)
95% CI: \(z_{0.025} = 1.96\), \(t_{0.025}\) (depends on df)
99% CI: \(z_{0.005} = 2.576\), \(t_{0.005}\) (depends on df)

CI for Population Proportion

Formula

\[\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

Conditions

\(n\hat{p} \geq 10\) and \(n(1-\hat{p}) \geq 10\)
Random sample
Independent observations

Interpretation

Same logic as mean: we’re confident the true proportion lies in this interval

Sample Size Determination

For Means \[n = \left(\frac{z_{\alpha/2} \cdot \sigma}{ME}\right)^2\]

For Proportions \[n = \left(\frac{z_{\alpha/2}}{ME}\right)^2 \cdot \hat{p}(1-\hat{p})\]

Key Trade-offs

Higher confidence → larger sample needed
Smaller margin of error → larger sample needed
Use \(\hat{p} = 0.5\) for most conservative estimate

Section 4: Hypothesis Testing ⚖️

The Scientific Method in Statistics

Formulate hypotheses
Collect data
Calculate test statistic
Find p-value
Make decision
State conclusion in context

Setting Up Hypotheses

Null Hypothesis (\(H_0\))

Status quo, no effect, no difference
Contains equality (=, ≤, ≥)
What we assume is true

Alternative Hypothesis (\(H_a\) or \(H_1\))

What we want to prove
Contains inequality (<, >, ≠)
Represents change or difference

Example

\(H_0: \mu = 100\) vs \(H_a: \mu \neq 100\) (two-tailed)
\(H_0: p \leq 0.5\) vs \(H_a: p > 0.5\) (one-tailed)

Test Statistics

For Population Mean

When \(\sigma\) known: \(z = (\bar x - \mu_0)/(\sigma/\sqrt{n})\)
When \(\sigma\) unknown: \(t = (\bar x - \mu_0)/(s/\sqrt{n})\), \(df = n-1\)

For Population Proportion

\(z = (\hat p - p_0)/\sqrt{[p_0(1-p_0)/n]}\)

P-values and Decision Making

P-value Definition

The probability of observing a test statistic as extreme or more extreme than what we observed, assuming \(H_0\) is true

Decision Rules

If p-value ≤ \(\alpha\): Reject \(H_0\) (statistically significant)
If p-value > \(\alpha\): Fail to reject \(H_0\) (not statistically significant)

Common Significance Levels

\(\alpha\) = \(0.05\) (most common)
\(\alpha\) = \(0.01\) (more stringent)
\(\alpha\) = \(0.10\) (less stringent)

Types of Errors

Type I Error

Rejecting true \(H_0\)
Probability = \(\alpha\) (significance level)
“False positive”

Type II Error

Failing to reject false \(H_0\)
Probability = \(\beta\)
“False negative”
Power = \(1 - \beta\)

Trade-off

Decreasing \(\alpha\) increases \(\beta\)
Need larger samples to decrease both

Section 5: Linear Regression 📈

Correlation Coefficient (r) \[r = \frac{\sum(x-\bar{x})(y-\bar{y})}{\sqrt{\sum(x-\bar{x})^2 \sum(y-\bar{y})^2}}\]

Ranges from -1 to +1
Measures strength and direction of linear relationship
r = 0: no linear relationship
|r| close to 1: strong linear relationship

Regression Line

Equation

\[\hat{y} = a + bx\]

Slope (b)

\[b = \frac{\sum(x-\bar{x})(y-\bar{y})}{\sum(x-\bar{x})^2}\]

Intercept (a)

\[a = \bar{y} - b\bar{x}\]

Interpretation

Slope: Change in y for 1-unit increase in x
Intercept: Value of y when x = 0

Coefficient of Determination

R-squared (\(r^2\))

Proportion of variance in y explained by x
Ranges from 0 to 1
Higher values indicate better fit

Residuals

\(e_i = y_i - \hat{y}_i\)
Vertical distance from point to line
Used to assess model fit

Standard Error of Regression

\[s_e = \sqrt{\frac{\sum(y-\hat{y})^2}{n-2}}\]

Inference for Regression

Testing Slope Significance

\(H_0: \beta = 0\) vs \(H_a: \beta \neq 0\)
Test statistic: \(t = \frac{b}{SE_b}\)
If significant: x is useful for predicting y

Conditions for Regression Inference

Linear relationship
Independent observations
Normal residuals
Equal variance (homoscedasticity)

Section 6: Quiz Review Strategy 🎯

What to Focus On

Know your formulas (but understand when to use them)
Practice identifying appropriate procedures
Check conditions before applying methods
Interpret results in context
Show your work clearly

Common Mistakes to Avoid ⚠️

Confidence Intervals

Saying “probability the parameter is in interval”
Confusing confidence level with confidence
Wrong critical values (z vs t)

Hypothesis Testing

Confusing “fail to reject” with “accept”
Wrong tail for p-value calculation
Not stating conclusions in context

Regression

Claiming causation from correlation
Extrapolating beyond data range
Ignoring model assumptions

Problem-Solving Approach 🧠

Step-by-Step Method

Identify what type of problem
Check conditions/assumptions
State hypotheses (if testing)
Calculate test statistic or interval
Find p-value or interpret interval
Make decision (if testing)
Conclude in context of problem

Key Formulas Summary

Standard Errors

\(SE_{\bar x} = \sigma/\sqrt{n}\) or \(s/\sqrt{n}\)
\(SE_{\hat p}= \sqrt{[p(1-p)/n]}\)

Confidence Intervals

Mean: \(\bar x ± t_{(\alpha/2)} \times (s/\sqrt{n})\)
Proportion: \(\hat p ± z_(\alpha/2) \times \sqrt{[\hat p(1-\hat p)/n]}\)

Test Statistics

\(t = (\bar x - \mu_{0})/(s/\sqrt{n})\)
\(z = (\hat p - p_{0})/\sqrt{[p_0(1-p_0)/n]}\)

Calculator/Software Tips 💻

Know How To:

Calculate descriptive statistics
Find normal/t probabilities
Perform hypothesis tests
Create confidence intervals
Do regression analysis

Double-Check:

Input values correctly
Choose right test/interval
Interpret output properly

Final Tips for Success 🌟

Before the Quiz

Review worksheet problems
Practice with different scenarios
Make a cheat sheet of key formulas
Understand when to use each method

During the Quiz

Read problems carefully
Show all work
Check your answers make sense
Manage your time wisely

Note

Statistics is about making decisions with uncertainty
Context matters as much as calculations
Practice makes perfect!

Questions? 🤔

Remember:

Confidence intervals: estimate unknown parameters
Hypothesis tests: make decisions about claims
Regression: model relationships between variables
Sampling: connect samples to populations

Good luck on Quiz 3! 🍀

🏠 Back to Main Page