# Install any missing packages (will skip those already installed)
#!%pip install --quiet numpy matplotlib scipy pandas statsmodels seaborn
# Load our tools (libraries)
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import pandas as pd
import statsmodels.api as sm
import seaborn as sns
# Make our graphs look nice
'seaborn-v0_8-whitegrid')
plt.style.use("husl")
sns.set_palette(
# Set random seed for reproducible results
42)
np.random.seed(
print("โ
All tools loaded successfully!")
Lab 6: Basic Hypothesis Testing & Simple Regression
PSTAT 5A - Summer Session A 2025
โฑ๏ธ Total Lab Time: 50 minutes
Welcome to Lab 6! This lab focuses on two fundamental areas of statistical analysis that youโll use throughout your data science journey: hypothesis testing and simple linear regression. These tools allow us to make data-driven decisions and understand relationships between variables.
What Youโll Learn Today
By the end of this lab, youโll be able to:
- Conduct hypothesis tests to determine if sample data provides evidence against a claim
- Model relationships between variables using simple linear regression
- Make predictions based on data patterns
- Interpret statistical results in plain English for real-world applications
Getting Started
โฑ๏ธ Estimated time: 5 minutes
Navigate to our class Jupyterhub Instance. Create a new notebook and rename it โlab6โ (for detailed instructions view lab1).
First, letโs load our tools! Copy the below code to get started! Weโll be using the following core libraries:
- NumPy: Fundamental package for fast array-based numerical computing.
- Matplotlib (
pyplot
): Primary library for creating static 2D plots and figures.
- SciPy (
stats
): Collection of scientific algorithms, including probability distributions and statistical tests.
- Pandas: High-performance data structures (
DataFrame
) and tools for data wrangling and analysis.
- Statsmodels: Econometric and statistical modeling for regression analysis, time series, and more.
- Seaborn:Seaborn is a Python data visualization library based on
matplotlib
. It provides a high-level interface for drawing attractive and informative statistical graphics.
Task 1: One-Sample T-Test
โฑ๏ธ Estimated time: 20 minutes
What is a One-Sample T-Test?
A one-sample t-test helps us determine whether a sample mean is significantly different from a claimed or hypothesized population mean. Itโs one of the most common statistical tests youโll encounter.
Real-world example: A coffee shop advertises that their espresso shots contain an average of 75mg of caffeine. As a health-conscious consumer (or maybe a caffeine researcher!), you want to test this claim. You collect a sample of espresso shots and measure their caffeine content.
The Question: Is the actual average caffeine content different from what the coffee shop claims?
Scenario
A coffee shop claims their average espresso shot contains 75 mg of caffeine. You suspect itโs actually higher. You test 20 shots and want to test at \(\alpha = 0.05\) significance level.
Your Goal: Determine if thereโs sufficient evidence that the actual caffeine content exceeds the coffee shopโs claim.
Step 1: Explore the Data
# Generate caffeine data for our analysis
123)
np.random.seed(= np.random.normal(78, 8, 20) # Sample data: n=20 espresso shots
caffeine_data
print("โ Coffee Shop Caffeine Analysis")
print("=" * 40)
print(f"๐ Sample size: {len(caffeine_data)}")
print(f"๐ Sample mean: {np.mean(caffeine_data):.2f} mg")
print(f"๐ Sample std dev: {np.std(caffeine_data, ddof=1):.2f} mg")
print(f"๐ช Coffee shop's claim: 75 mg")
# Let's look at our raw data
print(f"\n๐ First 10 caffeine measurements:")
print([f"{x:.1f}" for x in caffeine_data[:10]])
Step 2: Set Up Your Hypotheses
Think about this carefully: - What does the coffee shop claim? (This becomes your null hypothesis) - What do you suspect? (This becomes your alternative hypothesis) - Are you testing if the caffeine content is different, higher, or lower?
print("๐ STEP 1: Setting Up Hypotheses")
print("=" * 35)
# TODO: Complete these hypotheses
print("$H_0$ (Null Hypothesis): $\\mu$ = _____ mg") # What is the coffee shop's claim?
print("$H_1$ (Alternative Hypothesis): $\\mu$ _____ _____ mg") # What do you suspect? (>, <, or โ )
# TODO: What type of test is this?
print("Test type: _____-tailed test") # Right, left, or two-tailed?
print("\n๐ก Explanation:")
print("โข $H_0$ represents the coffee shop's claim (status quo)")
print("โข $H_1$ represents what we suspect is actually true")
print("โข We use $\\alpha$ = 0.05 as our significance level")
Step 3: Calculate the Test Statistic
The t-statistic formula is: \(t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}\)
print("๐ข STEP 2: Calculating Test Statistic")
print("=" * 38)
# Calculate the components
= np.mean(caffeine_data)
sample_mean = np.std(caffeine_data, ddof=1) # ddof=1 for sample std dev
sample_std = len(caffeine_data)
n = 75
claimed_mean
print(f"Sample mean ($\\bar{{x}}$): {sample_mean:.3f} mg")
print(f"Sample std dev (s): {sample_std:.3f} mg")
print(f"Sample size (n): {n}")
print(f"Claimed mean ($\\mu_0$): {claimed_mean} mg")
# TODO: Calculate the t-statistic using the formula above
# Hint: t = (sample_mean - claimed_mean) / (sample_std / sqrt(n))
= (sample_mean - claimed_mean) / (sample_std / np.sqrt(n))
t_statistic
= n - 1
degrees_freedom
print(f"\n๐ Formula: $t = \\frac{{\\bar{{x}} - \\mu_0}}{{s / \\sqrt{{n}}}}$")
print(f"๐ Calculation: t = ({sample_mean:.3f} - {claimed_mean}) / ({sample_std:.3f} / โ{n})")
print(f"๐ t-statistic: {t_statistic:.3f}")
print(f"๐ Degrees of freedom: {degrees_freedom}")
Step 4: Find the P-Value
For a right-tailed test, the p-value is the probability of getting a t-statistic as extreme or more extreme than what we observed.
Loosely speaking, the pโvalue answers the question:
โIf the null hypothesis were true, how surprising would my sample be?โ
Formally, it is the probability, calculated under the assumption that the null hypothesis is correct; of obtaining a test statistic as extreme or more extreme than the one observed.
- Small pโvalue (e.g., < 0.05) โ data are rare under Hโ โ strong evidence against Hโ.
- Large pโvalue โ data are plausible under Hโ โ little or no evidence against Hโ.
Important: A pโvalue does not give the probability that the null hypothesis is true; it quantifies how incompatible your data are with Hโ.
print("๐ STEP 3: Finding the P-Value")
print("=" * 32)
# TODO: Calculate p-value for right-tailed test
# Hint: For right-tailed test, p-value = 1 - stats.t.cdf(t_statistic, df)
= 1 - stats.t.cdf(t_statistic, degrees_freedom)
p_value
print(f"๐ฏ P-value calculation:")
print(f" P(t > {t_statistic:.3f}) = {p_value:.4f}")
print(f"\n๐ญ Interpretation:")
print(f" If the coffee shop's claim is true (ฮผ = 75),")
print(f" there's a {p_value:.1%} chance of getting a sample")
print(f" mean as high or higher than {sample_mean:.2f} mg")
Step 5: Make Your Decision
Compare your p-value to \(\alpha = 0.05\) and make a statistical decision.
print("โ๏ธ STEP 4: Making the Decision")
print("=" * 31)
= 0.05
alpha print(f"๐ฏ Significance level ($\\alpha$): {alpha}")
print(f"๐ P-value: {p_value:.4f}")
print(f"๐ Decision rule: Reject $H_0$ if p-value < $\\alpha$")
print(f"\n๐ Comparison:")
if p_value < alpha:
print(f" {p_value:.4f} < {alpha} โ
")
print(f" Decision: **REJECT $H_0$**")
print(f" Conclusion: There IS sufficient evidence that")
print(f" the average caffeine content > 75 mg")
print(f" ๐ช The coffee shop's claim appears to be FALSE")
else:
print(f" {p_value:.4f} โฅ {alpha} โ")
print(f" Decision: **FAIL TO REJECT $H_0$**")
print(f" Conclusion: There is NOT sufficient evidence that")
print(f" the average caffeine content > 75 mg")
print(f" ๐ช We cannot conclude the coffee shop's claim is false")
# TODO: Write your conclusion in your own words
print(f"\n๐ Your conclusion in plain English:")
print(f" _________________________________________________")
print(f" _________________________________________________")
Step 6: Verify with Python
Letโs double-check our work using Pythonโs built-in statistical functions.
print("โ
VERIFICATION using scipy.stats")
print("=" * 35)
# Use scipy's one-sample t-test function
= stats.ttest_1samp(caffeine_data, 75, alternative='greater')
t_stat_scipy, p_val_scipy
print(f"๐ Your calculations:")
print(f" t-statistic: {t_statistic:.3f}")
print(f" p-value: {p_value:.4f}")
print(f"\n๐ Python's calculations:")
print(f" t-statistic: {t_stat_scipy:.3f}")
print(f" p-value: {p_val_scipy:.4f}")
print(f"\n๐ฏ Match? {abs(t_statistic - t_stat_scipy) < 0.001 and abs(p_value - p_val_scipy) < 0.001}")
Step 7: Visualize Your Results
# Create visualizations to understand our test
= plt.subplots(1, 2, figsize=(16, 6))
fig, (ax1, ax2)
# Plot 1: Sample data histogram with means
=8, density=True, alpha=0.7, color='lightblue',
ax1.hist(caffeine_data, bins='black', label='Sample Data')
edgecolor='red', linestyle='-', linewidth=3,
ax1.axvline(sample_mean, color=f'Sample Mean = {sample_mean:.1f}mg')
label='orange', linestyle='--', linewidth=3,
ax1.axvline(claimed_mean, color=f'Claimed Mean = {claimed_mean}mg')
label'Caffeine Content (mg)', fontsize=12)
ax1.set_xlabel('Density', fontsize=12)
ax1.set_ylabel('โ Sample vs Claimed Caffeine Content', fontsize=14, fontweight='bold')
ax1.set_title(=11)
ax1.legend(fontsizeTrue, alpha=0.3)
ax1.grid(
# Plot 2: t-distribution with test statistic and p-value
= np.linspace(-4, 4, 1000)
x = stats.t.pdf(x, degrees_freedom)
y 'b-', linewidth=2, label=f't-distribution (df={degrees_freedom})')
ax2.plot(x, y, =0.3, color='lightblue')
ax2.fill_between(x, y, alpha
# Shade the rejection region (right tail)
= x[x >= t_statistic]
x_reject = stats.t.pdf(x_reject, degrees_freedom)
y_reject =0.7, color='red',
ax2.fill_between(x_reject, y_reject, alpha=f'p-value = {p_value:.4f}')
label
='red', linestyle='-', linewidth=3,
ax2.axvline(t_statistic, color=f'Our t-statistic = {t_statistic:.3f}')
label't-value', fontsize=12)
ax2.set_xlabel('Density', fontsize=12)
ax2.set_ylabel('๐ T-Distribution with Test Statistic', fontsize=14, fontweight='bold')
ax2.set_title(=11)
ax2.legend(fontsizeTrue, alpha=0.3)
ax2.grid(
plt.tight_layout() plt.show()
๐ค Reflection Questions
Answer these questions to check your understanding:
Hypotheses: What were your null and alternative hypotheses? Why did you choose a right-tailed test?
Test Choice: Why did you use a t-test instead of a z-test for this problem?
Results: What was your t-statistic and p-value? What do these numbers mean?
Decision: What was your final conclusion at \(\alpha = 0.05\)? Do you reject or fail to reject the null hypothesis?
Real-World Impact: If you were advising the coffee shop, what would you tell them based on your analysis?
Task 2: Simple Linear Regression
โฑ๏ธ Estimated time: 25 minutes
Simple linear regression helps us understand and model the relationship between two continuous variables. Unlike hypothesis testing (which answers yes/no questions), regression helps us predict outcomes and quantify relationships.
Real-world example: As a student, youโve probably wondered: โIf I study more hours, how much will my exam score improve?โ Linear regression can help answer this question by finding the relationship between study time and exam performance.
The Question: Can we predict exam scores based on hours studied? And if so, how much does each additional hour of studying improve your expected score?
- Explore & visualize the data
- Measure correlation (r) and \(R^2\)
- Fit the regression line \(\hat{y} = \beta_0 + \beta_1 x\)
- Test if the slope is significant
- Predict new values & quantify error
- Check model assumptions
- Visualize diagnostics
- Write a plainโEnglish conclusion
- Correlation: How strongly two variables move together (-1 to +1)
- Slope: How much \(Y\) changes when X increases by \(1\) unit
- Intercept: The predicted value of \(Y\) when \(X = 0\)
- \(R^2\): What percentage of the variation in \(Y\) is explained by \(X\)
Remember: Correlation does not imply causation! Just because two variables are related doesnโt mean one causes the other.
Scenario
You want to investigate the relationship between study hours and exam performance. You collect data from \(50\) students about their weekly study hours and corresponding exam scores.
Your Goal: Create a statistical model to predict exam scores based on study hours and determine how much each additional hour of studying helps.
Step 1: Explore the Data
# Generate realistic study data
101)
np.random.seed(= 50
n_students
# Study hours (predictor variable X)
= np.random.uniform(1, 20, n_students)
study_hours
# Exam scores with linear relationship plus noise
# True relationship: score = 65 + 2*hours + noise
= 65
true_intercept = 2
true_slope = np.random.normal(0, 8, n_students)
noise = true_intercept + true_slope * study_hours + noise
exam_scores
# Create DataFrame for easier handling
= pd.DataFrame({
study_data 'hours_studied': study_hours,
'exam_score': exam_scores
})
print("๐ Study Hours vs Exam Scores Analysis")
print("=" * 45)
print(f"๐ฅ Sample size: {len(study_data)} students")
print(f"โฐ Study hours range: {study_hours.min():.1f} to {study_hours.max():.1f} hours")
print(f"๐ Exam scores range: {exam_scores.min():.1f} to {exam_scores.max():.1f} points")
print(f"\n๐ First 10 students:")
print(study_data.head(10).round(2))
๐ค Quick Questions:
Do you see any obvious pattern in the data?
Which variable is the predictor (X) and which is the response (Y)?
Step 2: Calculate and Interpret Correlation
Correlation measures how strongly two variables move together.
print("๐ STEP 1: Measuring the Relationship")
print("=" * 40)
# TODO: Calculate the correlation coefficient
# Hint: Use np.corrcoef(x, y)[0, 1] to get correlation between x and y
= ________________
correlation
print(f"๐ Correlation coefficient: r = {correlation:.3f}")
# TODO: Interpret the correlation strength
print(f"\n๐ญ Interpretation:")
if abs(correlation) < 0.3:
= "weak"
strength elif abs(correlation) < 0.7:
= "moderate"
strength else:
= "strong"
strength
= "positive" if correlation > 0 else "negative"
direction print(f" This indicates a {strength} {direction} relationship")
print(f" between study hours and exam scores.")
print(f"\n๐ What this means:")
print(f" r = {correlation:.3f} means the variables are strongly related")
print(f" As study hours increase, exam scores tend to increase")
print(f" About {correlation**2:.1%} of the variation in scores")
print(f" can be explained by study hours alone")
โ Check Your Understanding:
What does r = 0.8 vs r = 0.3 tell you?
If r = -0.9, what would that mean?
Step 3: Fit the Linear Regression Model
Now weโll find the โline of best fitโ through our data points.
print("๐ STEP 2: Fitting the Regression Line")
print("=" * 42)
# Set up the regression (add constant for intercept)
= sm.add_constant(study_hours) # Add intercept term
X
# TODO: Fit the OLS (Ordinary Least Squares) model
# Hint: Use sm.OLS(y_variable, X_variable).fit()
= __________
model
print(f"๐ฏ Regression Equation:")
print(f" Exam Score = $\\beta_0$ + $\\beta_1$ ร Hours Studied")
print(f" Exam Score = {model.params[0]:.2f} + {model.params[1]:.2f} ร Hours")
print(f"\n๐ Model Coefficients:")
print(f" Intercept ($\\beta_0$): {model.params[0]:.3f}")
print(f" Slope ($\\beta_1$): {model.params[1]:.3f}")
print(f" R-squared ($R^2$): {model.rsquared:.3f}")
# TODO: Complete these interpretations
print(f"\n๐ก What These Numbers Mean:")
print(f" ๐ Intercept ({model.params[0]:.1f}): Expected score with 0 hours of study")
print(f" ๐ Slope ({model.params[1]:.2f}): Each additional hour increases score by {model.params[1]:.2f} points")
print(f" ๐ $R^2$ ({model.rsquared:.3f}): Study hours explain {model.rsquared:.1%} of score variation")
Step 4: Test Statistical Significance
Is the relationship we found statistically significant, or could it be due to chance?
print("๐ฌ STEP 3: Testing Statistical Significance")
print("=" * 46)
# Check if the slope is significantly different from zero
= model.pvalues[1] # p-value for the slope
slope_pvalue = 0.05
alpha
print(f"๐งช Hypothesis Test for Slope:")
print(f" $H_0$: $\\beta_1$ = 0 (no relationship)")
print(f" $H_1$: $\\beta_1$ โ 0 (there is a relationship)")
print(f" $\\alpha$ = {alpha}")
print(f"\n๐ Test Results:")
print(f" Slope p-value: {slope_pvalue:.6f}")
# TODO: Make the decision
if slope_pvalue < alpha:
print(f" Decision: REJECT $H_0$")
print(f" Conclusion: The relationship IS statistically significant")
= "IS"
significance else:
print(f" Decision: FAIL TO REJECT $H_0$")
print(f" Conclusion: The relationship is NOT statistically significant")
= "IS NOT"
significance
print(f"\nโ
Bottom Line:")
print(f" Study hours {significance} a significant predictor of exam scores")
# Show confidence intervals
= model.conf_int(alpha=0.05)
conf_int print(f"\n๐ 95% Confidence Intervals:")
print(f" Intercept: [{conf_int[0,0]:.2f}, {conf_int[0,1]:.2f}]")
print(f" Slope: [{conf_int[1,0]:.2f}, {conf_int[1,1]:.2f}]")
Step 5: Make Predictions
Now letโs use our model to predict exam scores for different study scenarios.
print("๐ฎ STEP 4: Making Predictions")
print("=" * 32)
# TODO: Calculate predictions for different study hours
# Hint: prediction = intercept + slope * hours
= [5, 10, 15, 20]
example_hours
print(f"๐ฏ Prediction Examples:")
for hours in example_hours:
# TODO: Calculate predicted score
= model.params[0] + model.params[1] * hours
pred_score print(f" ๐ {hours:2d} hours โ Predicted score: {pred_score:.1f} points")
print(f"\n๐ค Your Turn:")
# TODO: Pick your own study hours and make a prediction
= ______ # Enter a number between 1-20
your_hours = model.params[0] + model.params[1] * your_hours
your_prediction print(f" ๐ {your_hours} hours โ Predicted score: {your_prediction:.1f} points")
# Calculate residuals for analysis
= model.predict(X)
y_predicted = exam_scores - y_predicted
residuals = np.std(residuals, ddof=2)
residual_std
print(f"\n๐ Prediction Accuracy:")
print(f" Average prediction error: ยฑ{residual_std:.1f} points")
print(f" This means most predictions are within ยฑ{residual_std:.1f} points of actual scores")
Step 6: Check Model Assumptions
Before trusting our model, we need to verify it meets the assumptions of linear regression.
print("โ
STEP 5: Checking Model Assumptions")
print("=" * 42)
print("๐ Linear Regression Assumptions:")
print(" 1๏ธโฃ Linear relationship between X and Y")
print(" 2๏ธโฃ Residuals are normally distributed")
print(" 3๏ธโฃ Residuals have constant variance (homoscedasticity)")
print(" 4๏ธโฃ Residuals are independent")
# Calculate residuals
= model.predict(X)
y_predicted = exam_scores - y_predicted
residuals
print(f"\n๐ Residual Analysis:")
print(f" Mean residual: {np.mean(residuals):.6f} (should be โ 0)")
print(f" Std of residuals: {np.std(residuals, ddof=2):.3f}")
# TODO: Check normality of residuals using Shapiro-Wilk test
from scipy.stats import shapiro
= shapiro(residuals)
shapiro_stat, shapiro_p print(f"\n๐งช Normality Test (Shapiro-Wilk):")
print(f" p-value: {shapiro_p:.4f}")
if shapiro_p > 0.05:
print(" โ
Residuals appear normally distributed")
else:
print(" โ ๏ธ Residuals may not be normally distributed")
Step 7: Visualize Your Results
# Create comprehensive visualization
= plt.subplots(2, 2, figsize=(16, 12))
fig, ((ax1, ax2), (ax3, ax4))
# Plot 1: Scatter plot with regression line
=0.6, color='blue', s=60,
ax1.scatter(study_hours, exam_scores, alpha='Student Data')
label= np.sort(study_hours)
sorted_hours = model.params[0] + model.params[1] * sorted_hours
sorted_predictions ='red', linewidth=3,
ax1.plot(sorted_hours, sorted_predictions, color=f'y = {model.params[0]:.1f} + {model.params[1]:.2f}x')
label
'Study Hours', fontsize=12)
ax1.set_xlabel('Exam Score', fontsize=12)
ax1.set_ylabel(f'๐ Study Hours vs Exam Scores\n$R^2$ = {model.rsquared:.3f}',
ax1.set_title(=14, fontweight='bold')
fontsize=11)
ax1.legend(fontsizeTrue, alpha=0.3)
ax1.grid(
# Plot 2: Residuals vs Fitted values
=0.6, color='purple', s=50)
ax2.scatter(y_predicted, residuals, alpha=0, color='red', linestyle='--', linewidth=2)
ax2.axhline(y'Fitted Values', fontsize=12)
ax2.set_xlabel('Residuals', fontsize=12)
ax2.set_ylabel('๐ Residuals vs Fitted\n(Should show no pattern)',
ax2.set_title(=14, fontweight='bold')
fontsizeTrue, alpha=0.3)
ax2.grid(
# Plot 3: Q-Q plot for normality of residuals
="norm", plot=ax3)
stats.probplot(residuals, dist'๐ Q-Q Plot of Residuals\n(Should be roughly linear)',
ax3.set_title(=14, fontweight='bold')
fontsizeTrue, alpha=0.3)
ax3.grid(
# Plot 4: Histogram of residuals
=12, density=True, alpha=0.7, color='lightgreen',
ax4.hist(residuals, bins='black')
edgecolor'Residuals', fontsize=12)
ax4.set_xlabel('Density', fontsize=12)
ax4.set_ylabel('๐ Distribution of Residuals\n(Should look normal)',
ax4.set_title(=14, fontweight='bold')
fontsizeTrue, alpha=0.3)
ax4.grid(
# Overlay normal curve
= np.linspace(residuals.min(), residuals.max(), 100)
x_norm = stats.norm.pdf(x_norm, np.mean(residuals), np.std(residuals))
y_norm 'r-', linewidth=2, label='Normal curve')
ax4.plot(x_norm, y_norm,
ax4.legend()
plt.tight_layout() plt.show()
Step 8: Interpret Your Model
print("๐ FINAL INTERPRETATION")
print("=" * 25)
print(f"๐ฏ Our Model: Exam Score = {model.params[0]:.1f} + {model.params[1]:.2f} ร Study Hours")
print(f"\nโ
Key Findings:")
print(f" ๐ Strong positive relationship (r = {correlation:.3f})")
print(f" ๐ Study hours explain {model.rsquared:.1%} of score variation")
print(f" ๐ฏ Each extra hour โ {model.params[1]:.1f} point increase")
print(f" ๐ฌ Relationship is statistically significant (p < 0.001)")
print(f"\n๐ก Practical Insights:")
print(f" ๐โโ๏ธ Going from 5 to 10 hours of study:")
= model.params[0] + model.params[1] * 5
pred_5 = model.params[0] + model.params[1] * 10
pred_10 = pred_10 - pred_5
improvement print(f" Expected score improvement: {improvement:.1f} points")
print(f"\nโ ๏ธ Important Limitations:")
print(f" โข Correlation โ Causation")
print(f" โข Model only explains {model.rsquared:.1%} of variation")
print(f" โข Other factors matter too (sleep, prior knowledge, etc.)")
print(f" โข Predictions have uncertainty: ยฑ{residual_std:.1f} points")
# Show full model summary
print(f"\n๐ Full Statistical Summary:")
print("=" * 30)
print(model.summary())
๐ค Reflection Questions
Test your understanding by answering these questions:
- Correlation vs Causation:
What was your correlation coefficient?
Does this prove that studying more causes higher exam scores? Why or why not?
- Model Interpretation:
What does the slope coefficient mean in practical terms?
What does the intercept represent, and does it make sense?
- Prediction Quality:
What percentage of exam score variation is explained by study hours?
How accurate are your predictions (whatโs the typical error)?
- Statistical Significance:
Is the relationship statistically significant?
What would it mean if the p-value for the slope was 0.20?
- Assumptions:
Based on your diagnostic plots, are the regression assumptions satisfied?
What would you do if the assumptions were violated?
- Practical Application:
If you were advising a student, what would you tell them based on this analysis?
What other variables might improve your prediction model?
Congratulations! Youโve successfully completed Lab 6 ๐ฏ ! and learned fundamental statistical analysis techniques: