From Random Variables to Sampling & Confidence Intervals

Review of Concepts, Applications & Confidence Intervals Intro

Narjes Mathlouthi

Today’s Learning Objectives

By the end of this session, you will be able to:

  • Define what a random variable is
  • Distinguish between different types of random variables
  • Identify examples of random variables in your field of study
  • Connect probability concepts to real-world applications

What is a Random Variable?

  • A random variable (r.v.) is a function that assigns numerical values to the outcomes of a random experiment
  • Notation: Usually denoted by capital letters (X, Y, Z)
  • It’s a bridge between the sample space and real numbers
  • Think of it as a “rule” that translates outcomes into numbers

Key Point: It’s not actually “random”, it’s a deterministic function applied to random outcomes!

Real-World Connection

Activity: Your Research Field

Think About Your Major/Research Area

Take 2 minutes to brainstorm:

  1. What random phenomena occur in your field?
  2. How might you assign numbers to these outcomes?
  3. What questions could you answer with this data?

Examples by Field

  • Psychology: Reaction times, survey responses

  • Biology: Species counts, gene expression levels

  • Economics: Stock prices, unemployment rates

  • Engineering: System failures, signal strength

Types of Random Variables

Discrete Random Variables

  • Definition: Takes on countable values (finite or countably infinite)
  • Examples:
    • Number of emails received per day
    • Number of defective products in a batch
    • Student enrollment in courses
    • Number of research papers published per year

Note: If X is discrete, then X can take values x_1, x_2, x_3, \cdot where we can list all possible values.

Continuous Random Variables

  • Definition: Takes on uncountably infinite values (any value in an interval)
  • Examples:
    • Height of students
    • Time until equipment failure
    • Temperature measurements
    • GPA (technically discrete, but often treated as continuous)

Note: If X is continuous, then X can take any value in an interval [a,b] or (-\infty, \infty).

Sampling and Random Variables

Central Limit Theorem Connection

  • When we repeatedly sample from a population, the sample mean becomes a random variable
  • Formula: \bar{X} = \frac{1}{n}\sum_{i=1}^{n} X_i
  • Each time we sample, we get a different \bar{X}
  • The distribution of \bar{X} has special properties!

Central Limit Theorem Connection

These properties are :

  • Center (Unbiased): E[\bar{X}] = \mu.
  • Spread Shrinks with n: \mathrm{Var}(\bar{X}) = \sigma^2/n; \mathrm{SE}(\bar{X}) = \sigma/\sqrt{n} (estimate with s/\sqrt{n}).
  • Shape:
    • If the population is Normal, then \bar{X} \sim \text{Normal}(\mu, \sigma^2/n) exactly.
    • Otherwise, CLT: for large n, \bar{X} is approximately Normal even when the data aren’t.
  • Consistency / Law of Large Numbers: \bar{X} \xrightarrow{P} \mu as n \to \infty (estimates get closer to the truth with more data).
  • (If sampling w/out replacement, pop size N): Apply finite population correction (FPC):
    \mathrm{SE}(\bar{X}) = \dfrac{\sigma}{\sqrt{n}}\sqrt{\dfrac{N-n}{N-1}}.

Common Discrete Distributions

Binomial Distribution - Characteristics

  • Fixed number of trials (n)

  • Each trial has two outcomes

  • Constant probability of success

  • Trials are independent

Example: Number of successful research grants out of 10 applications

Poisson Distribution - Characteristics

  • Models rare events

  • Events occur independently

  • Constant average rate

  • Useful for counts over time/space

Example: Number of emails received per hour, number of mutations in DNA sequences

Common Continuous Distributions

Normal Distribution - Characteristics

  • Bell-shaped curve

  • Symmetric around mean

  • Parameters: \mu (mean), \sigma (standard deviation)

  • Many natural phenomena follow this pattern

Example: Heights, test scores, measurement errors

Exponential Distribution - Characteristics

  • Models waiting times

  • Memoryless property

  • Parameter: \lambda (rate)

  • Right-skewed

Example: Time between arrivals, equipment lifespan, time to next earthquake

Interactive Activity: Choose Your Distribution

Group Discussion (5 minutes)

For each scenario, identify: 1. Is the random variable discrete or continuous? 2. What distribution might it follow? 3. What are the parameters?

Scenarios: - Number of students attending office hours per week - Time spent studying for an exam - Number of typos in a research paper - Body temperature of patients in a hospital

Application: Research Design

Consider your research question:

  1. Identify your random variable(s)
    • What are you measuring?
    • What values can it take?
  2. Choose appropriate distribution
    • Based on the nature of your data
    • Consider the underlying process
  3. Plan your analysis
    • How will you collect data?
    • What statistical tests are appropriate?

Probability Mass vs. Density Functions

Discrete: Probability Mass Function (PMF)

  • P(X = x) for specific values

  • Sums to 1 over all possible values

  • Can find exact probabilities

Example: P(X = 3) = 0.2

Continuous: Probability Density Function (PDF)

  • f(x) represents density

  • Area under curve = 1

  • P(X = x) = 0 for any specific value

  • Find probabilities over intervals

Example: P(a < X < b) = \int_{a}^{b} f(x)dx

Comparing Distributions Side-by-Side

Confidence Intervals for Means

  • Problem: We have one sample mean, but want to estimate the population mean
  • Solution: Use the sampling distribution to create a confidence interval
  • Key Insight: If we know how \bar{X} varies, we can make probabilistic statements about μ

95% Confidence Interval Formula: \bar{x} \pm 1.96 \times \frac{\sigma}{\sqrt{n}}

Interpretation: “We are 95% confident that the true population mean lies within this interval”

Visualizing Confidence Intervals

Confidence Interval Interpretation

Common Misconceptions

❌ WRONG: “There’s a 95% probability that μ is in this specific interval”

✅ CORRECT: “If we repeated this process many times, 95% of the intervals we construct would contain the true μ”

Note

  • The interval is random, not the population parameter
  • Before collecting data: 95% chance our method will work
  • After collecting data: The interval either contains μ or it doesn’t
  • Confidence level = Long-run success rate of the method

Factors Affecting Confidence Interval Width

Key Takeaways

  1. Random variables translate random outcomes into numbers
  2. Discrete variables have countable values; continuous variables have uncountable values
  3. Distributions describe the probability patterns of random variables
  4. Choosing the right distribution depends on understanding your data’s nature
  5. Real applications exist in every field - think about your research!

Next Steps

For Your Research/Interests

  1. Identify random variables in your field
  2. Think about appropriate distributions
  3. Consider data collection methods
  4. Plan statistical analyses
  5. Connect theory to practice

Questions and Discussion

Share with the class:

  • What random variables are important in your field of study/major?

  • Which distributions might be most relevant?

  • What challenges do you anticipate in data collection?

Thank you for your participation!

Appendix: Python Code Examples

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns

# Generate random samples from different distributions

# Binomial
binom_data = np.random.binomial(n=10, p=0.3, size=100)

# Poisson  
poisson_data = np.random.poisson(lam=3, size=100)

# Normal
normal_data = np.random.normal(loc=0, scale=1, size=100)

# Exponential
exp_data = np.random.exponential(scale=1/1.5, size=100)

# Create histograms
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

axes[0,0].hist(binom_data, bins=11, alpha=0.7, color='steelblue')
axes[0,0].set_title('Binomial Sample')

axes[0,1].hist(poisson_data, bins=15, alpha=0.7, color='coral')
axes[0,1].set_title('Poisson Sample')

axes[1,0].hist(normal_data, bins=20, alpha=0.7, color='lightblue')
axes[1,0].set_title('Normal Sample')

axes[1,1].hist(exp_data, bins=20, alpha=0.7, color='lightgreen')
axes[1,1].set_title('Exponential Sample')

plt.tight_layout()
plt.show()

Additional Resources

# Useful Python libraries for statistics and probability
import numpy as np           # Numerical computing
import scipy.stats as stats  # Statistical functions
import matplotlib.pyplot as plt  # Plotting
import seaborn as sns        # Statistical visualization
import pandas as pd          # Data manipulation

# Quick reference for common distributions:
# stats.binom.pmf(k, n, p)     # Binomial PMF
# stats.poisson.pmf(k, lam)    # Poisson PMF  
# stats.norm.pdf(x, mu, sigma) # Normal PDF
# stats.expon.pdf(x, scale)    # Exponential PDF

# Generate random samples:
# np.random.binomial(n, p, size)
# np.random.poisson(lam, size)
# np.random.normal(mu, sigma, size)
# np.random.exponential(scale, size)

🏠 Back to Main Page