PSTAT5A
  • Home
  • Class Schedule
  • Syllabus
  • Resources
  • Contact

Course Resources

Your comprehensive guide to learning materials and references

🔍

Week 1: Foundations of Data Science

Getting Started with Data - Data types, basic statistics, and Python tools

📚 Core Materials

Required Reading

OpenIntro Statistics, Chapters 1 & 2

Essential foundations covering data types, variables, and descriptive statistics. Provides theoretical foundation for understanding how data is structured and analyzed.

📖 PDF • ⏱️ 2-3 hours • 🎯 Beginner

Python for Statistics

Think Stats: Exploratory Data Analysis

Perfect introduction to Python for statistics students. Covers probability, descriptive statistics, and statistical inference using Python with real datasets.

🎓 UCSB Access: Library Database → Search “O’Reilly” → Login with NetID → Search “Think Stats”

💻 Online via UCSB Library • ⏱️ 4-6 hours • 🐍 Python + Statistics

Supplementary Reading

Python for Data Analysis, Ch. 5

Deep dive into pandas operations including describe(), groupby(), and essential aggregation functions for real-world datasets.

🎓 UCSB Access: Library Database → Search “O’Reilly” → Login with NetID → Search “Python for Data Analysis”

💻 Online via UCSB Library • ⏱️ 1-2 hours • 🎯 Intermediate

Library Access

O’Reilly Learning Platform

Access thousands of technology and programming books including Python, statistics, and data science titles. Essential for supplementary reading and advanced topics.

🏛️ UCSB Library • 📖 50,000+ titles • 🔐 NetID Required

💻 Interactive Tools & Practice

Hands-on Workshop

UCSB Data Lab: Data Types & Format

Comprehensive hands-on workshop covering Python data types, pandas DataFrame structures, and input/output operations with downloadable datasets.

🛠️ Interactive • ⏱️ 2-3 hours • 📁 Sample datasets

API Documentation

pandas.DataFrame.describe()

Official documentation for descriptive statistics in pandas. Essential reference for understanding central tendency, dispersion, and shape analysis.

📚 API Docs • 🔗 Quick Reference • 🎯 All levels

Statistical Functions

NumPy Statistical Functions

Complete reference for NumPy’s statistical toolkit including mean(), median(), std(), percentile(), and advanced statistical measures.

📚 API Docs • 🔧 25+ methods • 🎯 Beginner-Advanced

Advanced Reference

SciPy Stats Module

Comprehensive statistical analysis toolkit covering probability distributions, hypothesis testing, and advanced descriptive statistics for research-grade analysis.

📚 API Docs • 🎯 Advanced • 🔬 Research-grade

🎯 Learning Objectives & Study Plan

Mastery Checklist

Week 1 Learning Goals

After completing this week, you should be able to:

✅ Self-Assessment • 🎯 Beginner • 🐍 Python Focus

Study Schedule

Recommended Learning Path

Days 1-2: Core Reading (OpenIntro Chapters 1-2)

Day 3: Python for Statistics (Think Stats)

Day 4: Supplementary Reading (Python for Data Analysis)

Day 5: Interactive tutorials and hands-on practice

Day 6: Get familiar with python documentation

Day 7: Review and concept integration

📅 7-day plan • ⏱️ 2-3 hours daily • 🎯 Beginner-friendly

🔍 Additional Resources

Extended Learning

Supplementary Materials by Topic

Python Basics: Python.org Tutorial • Codecademy Python • Python for Everybody

Data Science Foundations: Kaggle Learn • DataCamp Intro to Python • Coursera Python for Data Science

Statistical Computing: Think Python • Automate the Boring Stuff • Real Python Tutorials

🎯 All skill levels • 📈 Beginner to Advanced • 🆓 Many free options

Week 2: Introduction to Probability

Understanding Uncertainty - Sample spaces, conditional probability, and Bayes’ theorem

📚 Core Materials

Required Reading

OpenIntro Statistics, Chapter 3

Essential foundations covering probability definitions, sample spaces, events, conditional probability, and independence.

📖 PDF • ⏱️ 3-4 hours • 🎯 Beginner-Intermediate

Supplementary Reading

Elements of Set Theory for Probability

Interactive guide to set theory operations with Venn diagrams and visual explanations of probability concepts.

🌐 Online Book • ⏱️ 1-2 hours • 📊 Interactive diagrams

Video Lectures

MIT: Introduction to Probability

High-quality video lectures covering probability axioms, sample spaces, and basic probability rules with problem sets.

🎬 Video + Notes • ⏱️ 1 hour • 🎓 University Level

Quick Reference

Probability Formulas Cheat Sheet

Comprehensive reference sheet covering all major probability formulas including conditional probability and independence.

📄 PDF • ⚡ Quick reference • 🎯 All levels

💻 Interactive Tools & Practice

Visualizations

Seeing Theory: Probability

Interactive visual introduction with animations for conditional probability, Bayes’ theorem, and independence.

🎮 Interactive Web App • ⏱️ 1-2 hours

Simulations

StatKey: Probability Simulations

Interactive probability simulation tool with tree diagrams, conditional probability, and Bayes’ theorem calculators.

🌳 Interactive Tool • ⏱️ 1 hour

Practice Problems

Khan Academy: Probability

Comprehensive practice problems covering basic probability, conditional probability, and independence with instant feedback.

📝 Interactive Problems • ⏱️ 3-4 hours

Python Code

Matplotlib Statistical Gallery

Gallery of statistical visualizations including probability distributions, Venn diagrams, and tree diagrams.

📊 Code Examples • 🛠️ Python, Matplotlib

🎯 Learning Objectives & Study Plan

Mastery Checklist

Week 2 Learning Goals

After completing this week, you should be able to:

✅ Self-Assessment • 🎯 Beginner-Intermediate

Study Schedule

Recommended Learning Path

Days 1-2: Core Reading (OpenIntro Chapter 3)

Day 3: Interactive tutorials and visualizations

Day 4: Video lectures and supplementary readings

Day 5: Practice problems and exercises

Day 6: Review and concept integration

Day 7: Assessment preparation

📅 7-day plan • ⏱️ 2-3 hours daily

🔍 Additional Resources

Extended Learning

Supplementary Materials by Learning Style

Visual Learners: Khan Academy Videos • Treena Notes • Math is Fun Stats

Practical Applications: Medical Diagnosis Examples • Real-world Problems

Advanced Study: MIT 6.041 Full Course • Probability Fallacies Guide

🎯 All learning styles • 📈 Beginner to Advanced

Week 3: Conditional Probability, Counting & Discrete Random Variables

Advanced Probability & Discrete Distributions

📚 Core Materials

Required Reading

OpenIntro Statistics, Chapter 3 sections 3.3-3.4

Essential coverage of conditional probability, Bayes’ theorem, counting principles, and discrete random variables. Includes probability mass functions and expected values.

📖 PDF • ⏱️ 3-4 hours • 🎯 Intermediate

Bayes’ Theorem

Seeing Theory: Bayes’ Theorem

Interactive visualization of Bayes’ theorem with medical testing examples, false positives/negatives, and real-world applications. Essential for understanding conditional probability.

🎮 Interactive • ⏱️ 1 hour • 🔍 Visual Learning

Combinatorics

Khan Academy: Counting & Probability

Comprehensive coverage of counting principles, permutations, combinations, and their applications to probability with step-by-step examples and practice problems.

🎬 Video + Practice • ⏱️ 2-3 hours • 🎯 Beginner-Friendly

Discrete Distributions

Seeing Theory: Random Variables

Interactive exploration of discrete random variables, probability mass functions, and common distributions (Bernoulli, Binomial, Geometric, Poisson) with parameter adjustments.

🎮 Interactive • ⏱️ 1-2 hours • 📊 Distribution Explorer

💻 Interactive Tools & Practice

Bayes Calculator

Interactive Bayes’ Theorem Calculator

Step-by-step Bayes’ theorem calculator with medical testing examples, tree diagrams, and visual representations of prior and posterior probabilities.

🧮 Calculator • ⏱️ 30 minutes • 🎯 All Levels

Combinatorics Tool

Permutation & Combination Calculator

Online calculator for permutations, combinations, and factorial calculations with explanations and step-by-step solutions for complex counting problems.

🧮 Calculator • ⚡ Quick Results • 📊 Problem Solving

Python Documentation

SciPy Stats: Discrete Distributions

Complete reference for discrete probability distributions in Python including Bernoulli, Binomial, Geometric, and Poisson with PMF, CDF, and random generation functions.

📚 API Docs • 🐍 Python • 🎯 Intermediate-Advanced

Interactive Simulations

Treena: Probability Distributions

Interactive probability distribution notes and overview for discrete random variables.

🎮 Interactive Tool • ⏱️ 1 hour • 📈 Visual Distributions

🎯 Learning Objectives & Study Plan

Mastery Checklist

Week 3 Learning Goals

After completing this week, you should be able to:

✅ Self-Assessment • 🎯 Intermediate • 🎲 Probability & Counting

Study Schedule

Recommended Learning Path

Days 1-2: Conditional probability and Bayes’ theorem review

Day 3: Counting principles: permutations and combinations

Day 4: Introduction to discrete random variables and PMFs

Day 5: Expected values, variance, and common distributions

Day 6: Python implementation and interactive practice

Day 7: Real-world applications and problem integration

📅 7-day plan • ⏱️ 3-4 hours daily • 🔄 Building on Week 2

🔍 Additional Resources

Extended Learning

Supplementary Materials by Topic

Bayes’ Theorem Applications: Medical Diagnosis Examples • Spam Filtering Kaggle Python Example • Spam Filtering • Legal Evidence

Combinatorics: Art of Problem Solving • Brilliant Combinatorics • Pascal’s Triangle

Discrete Distributions: Wolfram MathWorld • NIST Engineering Statistics • Real-world Examples

🎯 All skill levels • 📈 Beginner to Advanced • 🔢 Mathematical Applications

Week 4: Continuous Random Variables & Intro to Confidence Intervals

From Discrete to Continuous: Understanding Density and Intervals

📚 Core Materials

Required Reading

OpenIntro Statistics, Chapter 4

Essential coverage of continuous random variables, probability density functions, normal distribution, and Central Limit Theorem. Foundation for understanding statistical inference.

📖 PDF • ⏱️ 3-4 hours • 🎯 Intermediate

Required Reading

OpenIntro Statistics, Chapter 5 sections 5.1-5.2

Introduction to confidence intervals, interpretation, and construction. Essential for understanding how sample statistics relate to population parameters.

📖 PDF • ⏱️ 2-3 hours • 🎯 Intermediate

Central Limit Theorem

Seeing Theory: Central Limit Theorem

Interactive visualization of the Central Limit Theorem with adjustable sample sizes and population distributions. See how sample means become normally distributed regardless of the original population shape.

🎮 Interactive • ⏱️ 1 hour • 📊 Visual Learning

Continuous Distributions

Think Stats: Continuous Distributions

Python-focused introduction to continuous distributions including normal, exponential, and Pareto distributions with real data examples and implementation.

🎓 UCSB Access: Library Database → Search “O’Reilly” → Login with NetID → Search “Think Stats”

💻 Online via UCSB Library • ⏱️ 2-3 hours • 🐍 Python Focus

💻 Interactive Tools & Practice

Distribution Explorer

Seeing Theory: Probability Distributions

Interactive exploration of continuous distributions including Normal, Exponential, and Uniform distributions. Adjust parameters and see real-time changes in PDFs and CDFs.

🎮 Interactive Web App • ⏱️ 1-2 hours • 📈 Parameter Exploration

Confidence Intervals

StatKey: Confidence Intervals

Interactive confidence interval construction tool. Upload data or use built-in datasets to create and interpret confidence intervals with different confidence levels.

🔧 Interactive Tool • ⏱️ 1 hour • 📊 Data Upload

Normal Distribution

GeoGebra: Normal Distribution

Interactive normal distribution calculator with Z-score calculations, area under curve, and probability computations. Essential for understanding standardization.

🧮 Calculator • ⏱️ 30 minutes • 📐 Z-scores & Areas

Python Documentation

SciPy Stats: Continuous Distributions

Complete reference for continuous probability distributions in Python including Normal, Exponential, Uniform, and T-distributions with PDF, CDF, and random generation functions.

📚 API Docs • 🐍 Python • 🎯 Intermediate-Advanced

🎯 Learning Objectives & Study Plan

Mastery Checklist

Week 4 Learning Goals

After completing this week, you should be able to:

✅ Self-Assessment • 🎯 Intermediate • 📊 Continuous Probability

Study Schedule

Recommended Learning Path

Days 1-2: Continuous random variables and PDFs (OpenIntro Ch. 4)

Day 3: Normal distribution and standardization

Day 4: Central Limit Theorem and sampling distributions

Day 5: Introduction to confidence intervals (OpenIntro Ch. 5.1-5.2)

Day 6: Python implementation and interactive practice

Day 7: Real-world applications and interpretation practice

📅 7-day plan • ⏱️ 3-4 hours daily • 🔄 Building on Discrete Foundation

🔍 Additional Resources

Extended Learning

Supplementary Materials by Topic

Normal Distribution: Standard Normal Table • 68-95-99.7 Rule • Z-score Calculator

Central Limit Theorem: Khan Academy CLT • Interactive CLT Demo • Rice University Simulations

Confidence Intervals: Interpretation Guide • Common Misconceptions • Sample Size Calculator

🎯 All skill levels • 📈 Beginner to Advanced • 🔬 Statistical Inference

Week 5: Statistical Methods & Testing

Confidence Intervals, Hypothesis Testing, and Statistical Inference

📚 Core Materials

Required Reading

OpenIntro Statistics, Chapter 5 sections 5.3-5.4

Comprehensive coverage of confidence intervals for means and proportions, including t-distribution applications when population standard deviation is unknown.

📖 PDF • ⏱️ 3-4 hours • 🎯 Intermediate-Advanced

Required Reading

OpenIntro Statistics, Chapter 6 sections 6.1-6.3

Introduction to hypothesis testing fundamentals including null and alternative hypotheses, p-values, Type I and Type II errors, and statistical significance.

📖 PDF • ⏱️ 4-5 hours • 🎯 Advanced

T-Distribution

Khan Academy: t- statistics

Video series explaining the t-distribution, degrees of freedom, and when to use t vs z distributions in confidence intervals and hypothesis testing.

🎬 Video Series • ⏱️ 2-3 hours • 🎯 Beginner-Friendly

Hypothesis Testing

Seeing Theory: Frequentist Inference

Interactive exploration of hypothesis testing concepts including point estimation, confidence intervals, and the bootstrap method with real-time visualizations.

🎮 Interactive • ⏱️ 2 hours • 📊 Visual Learning

💻 Interactive Tools & Practice

T-Distribution Calculator

StatKey: T-Distribution

Interactive t-distribution calculator with adjustable degrees of freedom, tail probability calculations, and critical value finder for confidence intervals.

🧮 Calculator • ⏱️ 30 minutes • 📊 Interactive Visualization

Hypothesis Testing Simulator

StatKey: Hypothesis Testing

Step-by-step hypothesis testing tool with built-in datasets or data upload capability. Automatically calculates test statistics, p-values, and conclusions.

🔧 Interactive Tool • ⏱️ 1-2 hours • 📈 Real Data Analysis

P-Value Visualization

Rpsychologist: P-Values

Interactive visualization of p-values, showing the relationship between test statistics, distributions, and probability calculations for different hypothesis tests.

📊 Visual Tool • ⏱️ 45 minutes • 🎯 Conceptual Understanding

Python Implementation

SciPy Stats: Statistical Tests

Complete reference for statistical tests in Python including t-tests, z-tests, and chi-square tests with confidence interval functions and effect size calculations.

📚 API Docs • 🐍 Python • 🎯 Intermediate-Advanced

🎯 Learning Objectives & Study Plan

Mastery Checklist

Week 5 Learning Goals

After completing this week, you should be able to:

✅ Self-Assessment • 🎯 Advanced • 🔬 Statistical Inference

Study Schedule

Recommended Learning Path

Days 1-2: Advanced confidence intervals with t-distribution (OpenIntro Ch. 5.3-5.4)

Day 3: Confidence intervals for proportions and sample size calculations

Day 4: Hypothesis testing fundamentals (OpenIntro Ch. 6.1-6.2)

Day 5: P-values, significance levels, and Type I/II errors

Day 6: One-sample tests implementation and practice

Day 7: Real-world applications and result interpretation

📅 7-day plan • ⏱️ 4-5 hours daily • 🎯 Intensive Study Week

🔍 Additional Resources

Extended Learning

Real-World Applications & Case Studies

Medical Research: Clinical Trial Analysis Notebook • Drug Effectiveness Testing • Medical Device Validation

Business Analytics: A/B Testing Guide • Marketing Campaign Analysis • Customer Satisfaction Testing

Quality Control: Manufacturing Process Control • Product Testing Analysis • Six Sigma Applications

Educational Research: Student Performance Analysis • Teaching Method Effectiveness • Standardized Test Scores

🎯 Applied Statistics • 📊 Real Data • 🔬 Research Applications

Essential Formulas for Week 5

Confidence Intervals: - For Means (σ unknown): \(\bar{x} \pm t_{\alpha/2,df} \cdot \frac{s}{\sqrt{n}}\) - For Proportions: \(\hat{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\) - Degrees of Freedom: \(df = n - 1\)

Hypothesis Testing: - Test Statistic (means): \(t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\) - Test Statistic (proportions): \(z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\) - P-value: Probability of observing test statistic or more extreme under H₀

Error Types: - Type I Error (α): Reject true H₀ - Type II Error (β): Fail to reject false H₀ - Power: \(1 - \beta\) = Probability of correctly rejecting false H₀

Week 6: Linear Regression Basics

Statistical Modeling and Relationship Analysis

📚 Core Materials

Required Reading

OpenIntro Statistics, Chapter 7 sections 7.1-7.3 and Chapter 8

Comprehensive introduction to linear regression including correlation, least squares method, regression equations, and interpretation of slope and intercept.

📖 PDF • ⏱️ 4-5 hours • 🎯 Advanced

Python for Regression

Think Stats: Regression

Python-focused approach to linear regression with real data examples, including correlation analysis, fitting regression lines, and residual analysis.

🎓 UCSB Access: Library Database → Search “O’Reilly” → Login with NetID → Search “Think Stats”

💻 Online via UCSB Library • ⏱️ 2-3 hours • 🐍 Python Implementation

Correlation Analysis

Khan Academy: Correlation and Regression

Comprehensive video series covering correlation coefficients, scatterplots, regression lines, and interpretation of relationships between quantitative variables.

🎬 Video Series • ⏱️ 3-4 hours • 🎯 Beginner-Friendly

Regression Assumptions

Penn State: Regression Methods

Detailed coverage of linear regression assumptions including linearity, independence, normality, and equal variance with diagnostic methods and remedies.

🎓 University Course • ⏱️ 2-3 hours • 🔬 Statistical Theory

💻 Interactive Tools & Practice

Regression Visualization

Seeing Theory: Linear Regression

Interactive exploration of linear regression with adjustable data points, real-time least squares fitting, and visualization of residuals and R-squared values.

🎮 Interactive Web App • ⏱️ 1-2 hours • 📈 Real-time Fitting

Correlation Calculator

StatKey: Correlation

Interactive tool for calculating correlation coefficients and fitting regression lines with built-in datasets or data upload capability.

🔧 Interactive Tool • ⏱️ 1 hour • 📊 Data Analysis

Residual Analysis

GeoGebra: Regression Analysis

Interactive regression analysis tools including scatterplot creation, line fitting, residual plots, and regression diagnostics for assumption checking.

📊 Analysis Tool • ⏱️ 45 minutes • 🔍 Diagnostic Plots

Python Libraries

Scikit-learn: Linear Regression

Complete documentation for linear regression in scikit-learn including model fitting, prediction, coefficient interpretation, and performance metrics.

📚 API Docs • 🐍 Python • 🎯 Intermediate-Advanced

🎯 Learning Objectives & Study Plan

Mastery Checklist

Week 6 Learning Goals

After completing this week, you should be able to:

✅ Self-Assessment • 🎯 Advanced • 📈 Statistical Modeling

Study Schedule

Recommended Learning Path

Days 1-2: Correlation analysis and scatterplots (OpenIntro Ch. 7.1)

Day 3: Linear regression theory and least squares method (Ch. 7.2)

Day 4: Regression equations, interpretation, and R-squared (Ch. 7.3)

Day 5: Residual analysis and assumption checking

Day 6: Python implementation with real datasets

Day 7: Advanced topics and course integration review

📅 7-day plan • ⏱️ 4-5 hours daily • 🎯 Capstone Week

🔍 Additional Resources & Real-World Applications

Business Analytics

Sales and Marketing Analysis

Kaggle Notebooks: - Marketing Campaign ROI Analysis • Sales Forecasting with Regression • Customer Lifetime Value Prediction

Applications: Advertising spend vs. sales revenue, price elasticity analysis, customer acquisition cost modeling

Datasets: Marketing Data • Sales Data • E-commerce Data

💼 Business • 📈 ROI Analysis • 💰 Revenue Prediction

Health & Medicine

Medical Research Applications

Research Examples: - BMI vs Health Outcomes • Drug Dosage Effectiveness • Treatment Response Prediction

Applications: Dose-response relationships, biomarker analysis, treatment outcome prediction, epidemiological studies

Datasets: Heart Disease Data • Diabetes Prediction • Cancer Research Data

🏥 Healthcare • 🔬 Medical Research • 📊 Clinical Analysis

Environmental Science

Climate and Environmental Modeling

Climate Analysis: - Temperature vs Time Trends • Air Quality Prediction • Renewable Energy Analysis

Applications: Climate change modeling, pollution correlation analysis, energy consumption prediction, environmental impact assessment

Datasets: Global Temperature • Air Quality • Energy Consumption

🌍 Environmental • 🌡️ Climate Science • ♻️ Sustainability

Economics & Finance

Economic Analysis & Financial Modeling

Financial Applications: - Stock Price Prediction • Economic Indicators Analysis • Housing Price Modeling

Applications: Portfolio optimization, risk assessment, economic forecasting, market trend analysis, real estate valuation

Datasets: Stock Market Data • Housing Prices • Economic Indicators

💹 Finance • 🏠 Real Estate • 📊 Economic Modeling

📊 Advanced Regression Topics Preview

Beyond Simple Regression

Topics for Further Study

Multiple Regression: Adding more predictor variables, interpreting coefficients in multivariate models, dealing with multicollinearity

Polynomial Regression: Modeling non-linear relationships, choosing appropriate degree, overfitting concerns

Logistic Regression: Binary outcome variables, odds ratios, classification problems

Model Diagnostics: Advanced residual analysis, influence measures, model selection criteria (AIC, BIC)

Regularization: Ridge regression, Lasso regression, dealing with high-dimensional data

Time Series: Regression with temporal data, autocorrelation, trend analysis

🎓 Recommended Next Courses: PSTAT 109 (Statistics for Economics), PSTAT 126 (Regression Analysis), PSTAT 131 (Statistical Machine Learning)

🚀 Advanced Topics • 📈 Next Level • 🎓 Further Study

Essential Regression Formulas

Correlation: - Pearson’s r: \(r = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2 \sum(y_i - \bar{y})^2}}\)

Simple Linear Regression: - Slope: \(b_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}\) - Intercept: \(b_0 = \bar{y} - b_1\bar{x}\) - Regression Line: \(\hat{y} = b_0 + b_1 x\)

Model Evaluation: - R-squared: \(R^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}\) - Residual: \(e_i = y_i - \hat{y}_i\) - Standard Error: \(s_e = \sqrt{\frac{\sum e_i^2}{n-2}}\)

Key Relationships: - SST = SSR + SSE (Total = Regression + Error) - Correlation and R²: \(R^2 = r^2\) (for simple linear regression)