2025-07-31
Quantitative
Numerical values
Can perform arithmetic
Examples: height, income, test scores
Continuous vs Discrete
Continuous: can take any value in range
Discrete: countable values
Qualitative (Categorical)
Non-numerical categories
Examples: color, major, satisfaction level
Nominal vs Ordinal
Nominal: no natural order
Ordinal: natural ordering exists
For large samples (n ≥ 30), the sampling distribution of \(\bar{X}\) is approximately normal, regardless of the population distribution
Formula and Interpretation \[SE_{\bar{X}} = \frac{\sigma}{\sqrt{n}} \text{ or } \frac{s}{\sqrt{n}}\]
For Proportions \[SE_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}\]
Key Points
“We are X% confident that the true parameter lies between [lower bound, upper bound]”
\[\text{Estimate} \pm \text{(Critical Value)} \times \text{(Standard Error)}\]
When σ is Known (Z-interval) - \[\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]
When σ is Unknown (t-interval) - \[\bar{x} \pm t_{\alpha/2,df} \cdot \frac{s}{\sqrt{n}}\]
where \(df = n - 1\)
Common Critical Values
90% CI: \(z_{0.05} = 1.645\), \(t_{0.05}\) (depends on df)
95% CI: \(z_{0.025} = 1.96\), \(t_{0.025}\) (depends on df)
99% CI: \(z_{0.005} = 2.576\), \(t_{0.005}\) (depends on df)
\[\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]
For Means \[n = \left(\frac{z_{\alpha/2} \cdot \sigma}{ME}\right)^2\]
For Proportions \[n = \left(\frac{z_{\alpha/2}}{ME}\right)^2 \cdot \hat{p}(1-\hat{p})\]
Key Trade-offs
Higher confidence → larger sample needed
Smaller margin of error → larger sample needed
Use \(\hat{p} = 0.5\) for most conservative estimate
Null Hypothesis (\(H_0\))
Status quo, no effect, no difference
Contains equality (=, ≤, ≥)
What we assume is true
Alternative Hypothesis (\(H_a\) or \(H_1\))
What we want to prove
Contains inequality (<, >, ≠)
Represents change or difference
Example
\(H_0: \mu = 100\) vs \(H_a: \mu \neq 100\) (two-tailed)
\(H_0: p \leq 0.5\) vs \(H_a: p > 0.5\) (one-tailed)
For Population Mean
When \(\sigma\) known: \(z = (\bar x - \mu_0)/(\sigma/\sqrt{n})\)
When \(\sigma\) unknown: \(t = (\bar x - \mu_0)/(s/\sqrt{n})\), \(df = n-1\)
For Population Proportion
The probability of observing a test statistic as extreme or more extreme than what we observed, assuming \(H_0\) is true
Correlation Coefficient (r) \[r = \frac{\sum(x-\bar{x})(y-\bar{y})}{\sqrt{\sum(x-\bar{x})^2 \sum(y-\bar{y})^2}}\]
\[\hat{y} = a + bx\]
\[b = \frac{\sum(x-\bar{x})(y-\bar{y})}{\sum(x-\bar{x})^2}\]
\[a = \bar{y} - b\bar{x}\]
Slope: Change in y for 1-unit increase in x
Intercept: Value of y when x = 0
Standard Errors
\(SE_{\bar x} = \sigma/\sqrt{n}\) or \(s/\sqrt{n}\)
\(SE_{\hat p}= \sqrt{[p(1-p)/n]}\)
Confidence Intervals
Mean: \(\bar x ± t_{(\alpha/2)} \times (s/\sqrt{n})\)
Proportion: \(\hat p ± z_(\alpha/2) \times \sqrt{[\hat p(1-\hat p)/n]}\)
Test Statistics
\(t = (\bar x - \mu_{0})/(s/\sqrt{n})\)
\(z = (\hat p - p_{0})/\sqrt{[p_0(1-p_0)/n]}\)
Note
PSTAT 5A - Understanding Data | Course Wrap Up and Quiz Review