Proving Relationships Between X and Y = aX
2025-06-26
Given: Let \(X = \{x_i\}_{i=1}^{n}\) and define \(Y = \{y_i\}_{i=1}^{n}\) by \(y_i = a x_i\) for some fixed constant \(a \neq 0\).
Prove the following relationships:
\[\sum_{i=1}^{n} y_i = a \sum_{i=1}^{n} x_i, \quad \bar{Y} = a \bar{X}, \quad S_Y^2 = a^2 S_X^2\]
\(X = \{x_i\}_{i=1}^{n}\) means:
\(X\) is a dataset containing \(n\) observations
The observations are labeled \(x_1, x_2, x_3, \ldots, x_n\)
\(i\) is an index that runs from 1 to \(n\)
This is read as: “X is the set of \(x_i\) for \(i\) from 1 to \(n\)”
\(\sum_{i=1}^{n} x_i\) means:
Add up all the \(x_i\) values
Start with \(i = 1\) and go up to \(i = n\)
This equals: \(x_1 + x_2 + x_3 + \cdots + x_n\)
If \(X = \{2, 4, 6, 8, 10\}\), then: \[\sum_{i=1}^{5} x_i = x_1 + x_2 + x_3 + x_4 + x_5 = 2 + 4 + 6 + 8 + 10 = 30\]
\(Y = \{y_i\}_{i=1}^{n}\) where \(y_i = a x_i\) means:
Each element of \(Y\) is obtained by multiplying the corresponding element of \(X\) by the constant \(a\)
This is called a linear transformation
\(a\) is called the scaling factor
If \(X = \{2, 4, 6, 8, 10\}\) and \(a = 3\), then:
\(y_1 = 3 \times 2 = 6\)
\(y_2 = 3 \times 4 = 12\)
\(y_3 = 3 \times 6 = 18\)
\(y_4 = 3 \times 8 = 24\)
\(y_5 = 3 \times 10 = 30\)
So \(Y = \{6, 12, 18, 24, 30\}\)
Sample Mean: \(\bar{X} = \frac{1}{n} \sum_{i=1}^{n} x_i\)
This means:
Add up all observations: \(\sum_{i=1}^{n} x_i\)
Divide by the number of observations: \(n\)
The “bar” over \(X\) indicates the mean
For \(X = \{2, 4, 6, 8, 10\}\): \[\bar{X} = \frac{1}{5}(2 + 4 + 6 + 8 + 10) = \frac{30}{5} = 6\]
Sample Variance: \(S_X^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{X})^2\)
This means:
For each observation, find its deviation from the mean: \((x_i - \bar{X})\)
Square each deviation: \((x_i - \bar{X})^2\)
Add up all squared deviations: \(\sum_{i=1}^{n} (x_i - \bar{X})^2\)
Divide by \((n-1)\): This gives the sample variance
Prove: \(\sum_{i=1}^{n} y_i = a \sum_{i=1}^{n} x_i\)
Step 1: Start with the definition of \(y_i\) \[y_i = a x_i \text{ for all } i = 1, 2, \ldots, n\]
Step 2: Write out the sum of all \(y_i\) \[\sum_{i=1}^{n} y_i = y_1 + y_2 + y_3 + \cdots + y_n\]
Step 3: Substitute the definition \(y_i = a x_i\) \[\sum_{i=1}^{n} y_i = a x_1 + a x_2 + a x_3 + \cdots + a x_n\]
Step 4: Factor out the constant \(a\) \[\sum_{i=1}^{n} y_i = a(x_1 + x_2 + x_3 + \cdots + x_n)\]
Step 5: Recognize the sum notation \[\sum_{i=1}^{n} y_i = a \sum_{i=1}^{n} x_i\]
Therefore: \(\sum_{i=1}^{n} y_i = a \sum_{i=1}^{n} x_i\) ✓
Key Property Used: Constants can be factored out of sums \[\sum_{i=1}^{n} (c \cdot x_i) = c \sum_{i=1}^{n} x_i\]
Given: \(X = \{2, 4, 6\}\) and \(a = 5\)
Then: \(Y = \{10, 20, 30\}\) (since \(y_i = 5x_i\))
Check our formula:
\(\sum_{i=1}^{3} x_i = 2 + 4 + 6 = 12\)
\(\sum_{i=1}^{3} y_i = 10 + 20 + 30 = 60\)
\(a \sum_{i=1}^{3} x_i = 5 \times 12 = 60\) ✓
Verification: \(60 = 60\) ✓
Prove: \(\bar{Y} = a \bar{X}\)
Step 1: Start with the definition of sample mean \[\bar{Y} = \frac{1}{n} \sum_{i=1}^{n} y_i\]
Step 2: Use our result from Proof 1 We proved that \(\sum_{i=1}^{n} y_i = a \sum_{i=1}^{n} x_i\)
Step 3: Substitute this result \[\bar{Y} = \frac{1}{n} \cdot a \sum_{i=1}^{n} x_i\]
Step 4: Rearrange the constants \[\bar{Y} = a \cdot \frac{1}{n} \sum_{i=1}^{n} x_i\]
Step 5: Recognize the definition of \(\bar{X}\) \[\bar{Y} = a \bar{X}\]
Therefore: \(\bar{Y} = a \bar{X}\) ✓
Key Property Used: Constants can be moved outside of fractions \[\frac{1}{n} \cdot a \sum_{i=1}^{n} x_i = a \cdot \frac{1}{n} \sum_{i=1}^{n} x_i\]
Given: \(X = \{2, 4, 6\}\) and \(a = 5\)
Then: \(Y = \{10, 20, 30\}\)
Check our formula:
\(\bar{X} = \frac{1}{3}(2 + 4 + 6) = \frac{12}{3} = 4\)
\(\bar{Y} = \frac{1}{3}(10 + 20 + 30) = \frac{60}{3} = 20\)
\(a \bar{X} = 5 \times 4 = 20\) ✓
Verification: \(20 = 20\) ✓
Prove: \(S_Y^2 = a^2 S_X^2\)
Step 1: Start with the definition of sample variance \[S_Y^2 = \frac{1}{n-1} \sum_{i=1}^{n} (y_i - \bar{Y})^2\]
Step 2: Substitute \(y_i = a x_i\) and \(\bar{Y} = a \bar{X}\) \[S_Y^2 = \frac{1}{n-1} \sum_{i=1}^{n} (a x_i - a \bar{X})^2\]
Step 3: Factor out \(a\) from the parentheses \[S_Y^2 = \frac{1}{n-1} \sum_{i=1}^{n} [a(x_i - \bar{X})]^2\]
Step 4: Use the property \((ab)^2 = a^2 b^2\) \[S_Y^2 = \frac{1}{n-1} \sum_{i=1}^{n} a^2 (x_i - \bar{X})^2\]
Step 5: Factor out the constant \(a^2\) from the sum \[S_Y^2 = \frac{a^2}{n-1} \sum_{i=1}^{n} (x_i - \bar{X})^2\]
Step 6: Recognize the definition of \(S_X^2\) \[S_Y^2 = a^2 \cdot \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{X})^2 = a^2 S_X^2\]
Therefore: \(S_Y^2 = a^2 S_X^2\) ✓
Properties Used:
Factoring: \(ax_i - a\bar{X} = a(x_i - \bar{X})\)
Squaring: \([a(x_i - \bar{X})]^2 = a^2(x_i - \bar{X})^2\)
Constants in sums: \(\sum_{i=1}^{n} a^2 f(x_i) = a^2 \sum_{i=1}^{n} f(x_i)\)
Given: \(X = \{1, 3, 5\}\) and \(a = 2\)
Calculate \(S_X^2\):
\(\bar{X} = \frac{1+3+5}{3} = 3\)
\(S_X^2 = \frac{1}{2}[(1-3)^2 + (3-3)^2 + (5-3)^2] = \frac{1}{2}[4 + 0 + 4] = 4\)
For \(Y = \{2, 6, 10\}\): - \(\bar{Y} = \frac{2+6+10}{3} = 6 = 2 \times 3\) ✓
Check: \(a^2 S_X^2 = 2^2 \times 4 = 16\) ✓
For the linear transformation \(Y = \{y_i\}_{i=1}^{n}\) where \(y_i = a x_i\):
Sum: \(\sum_{i=1}^{n} y_i = a \sum_{i=1}^{n} x_i\)
Mean: \(\bar{Y} = a \bar{X}\)
Variance: \(S_Y^2 = a^2 S_X^2\)
Note: The standard deviation relationship is \(S_Y = |a| S_X\)
Scaling Up (a > 1):
Sums and means increase by factor \(a\)
Variance increases by factor \(a^2\)
Data becomes more spread out
Example: Converting inches to feet
Scaling Down (0 < a < 1):
Sums and means decrease by factor \(a\)
Variance decreases by factor \(a^2\)
Data becomes less spread out
Example: Converting dollars to cents
Temperature Conversion:
Celsius to Fahrenheit: \(F = \frac{9}{5}C + 32\) (not linear!)
Celsius to Kelvin: \(K = C + 273.15\) (not linear!)
But scaling: \(C_{doubled} = 2C\) is linear with \(a = 2\)
Unit Conversions: - Meters to centimeters: \(a = 100\)
Dollars to cents: \(a = 100\)
Hours to minutes: \(a = 60\)
Statistical Significance:
Key Insight: Linear transformations preserve the shape of the distribution while changing location and scale.
Try This: A dataset has mean \(\bar{X} = 15\) and variance \(S_X^2 = 9\).
If we transform the data using \(y_i = 3x_i - 2\), what are the new mean and variance?
Tip
Hint: This is not a pure linear transformation! You need \(y_i = 3x_i - 2 = 3(x_i - \frac{2}{3})\)
Answer:
New mean: \(\bar{Y} = 3(15) - 2 = 43\)
New variance: \(S_Y^2 = 3^2 \times 9 = 81\)
(The constant \(-2\) doesn’t affect variance!)
❌ Wrong: \(S_Y^2 = a S_X^2\)
✅ Correct: \(S_Y^2 = a^2 S_X^2\)
Why: Variance involves squared deviations, so the scaling factor gets squared too.
❌ Wrong: Adding constants affects variance
✅ Correct: Only multiplication affects variance; addition only shifts the mean
What We Proved:
For the linear transformation \(y_i = ax_i\) with constant \(a \neq 0\):
Key Takeaway: Understanding these relationships is fundamental for:
Data transformations
Statistical modeling
Unit conversions
Standardization procedures
Key Concepts Covered:
Summation notation and indexing
Linear transformations
Properties of means and variances
Step-by-step mathematical proofs
Next Steps:
Apply to real datasets
Explore non-linear transformations
Practice with different scaling factors
Exercise 1: If \(X = \{10, 20, 30, 40\}\) and \(Y = \{-5, -10, -15, -20\}\), what is the value of \(a\)?
Exercise 2: A dataset has \(\bar{X} = 50\) and \(S_X = 10\). After transformation \(y_i = 0.5x_i\), find \(\bar{Y}\) and \(S_Y\).
Exercise 3: Prove that if \(y_i = ax_i + b\) (adding a constant), then \(S_Y^2 = a^2 S_X^2\) (the constant doesn’t affect variance).
Linear Transformation Properties © Narjes Mathlouthi 2025