No description has been provided for this image

Central Limit Theorem Example¶

Faisal Qureshi
faisal.qureshi@ontariotechu.ca
http://www.vclab.ca

Copyright information¶

© Faisal Qureshi

In [1]:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42) 

License¶

No description has been provided for this image This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Mean of dice throws¶

We know that an un-biased six-sided dice can have one of six outcomes (with equal probability): $1, 2, 3, 4, 5, 6$.

This suggests that the mean value for a dice throw is: $$ \frac{1+2+3+4+5+6}{6} = 3.5 $$

Estimating Sample Means¶

Say, we want to estimate the mean value for a dice throw, but we do not have access to the formula shown above. How would we do it?

One scheme is to throw a dice multiple times (i.e., sample the outcomes) and estimate the mean of the outcomes. Say, we decide to throw a dice $10$ times and we got the following:

In [2]:
def dice_throw(times):
    return np.random.randint(1, 7, times)
In [3]:
samples = dice_throw(10)
print('Samples: ', samples)
Samples:  [4 5 3 5 5 2 3 3 3 5]

We can compute the mean of this sample as follows

In [4]:
samples_mean = np.sum(samples)
print('Sample mean:', samples_mean)
print('True mean:  ', 3.5)
Sample mean: 38
True mean:   3.5

Central Limit Theorem¶

the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the original population's distribution.

Central Limit Theorem in Action¶

  1. Let's take multiple samples by performing $10$ dice throws.
  2. Next compute the mean of each sample.
  3. Plot the means (as per the Central Limit Theorem, we should get a normal distribution)
In [5]:
num_samples = 10000
samples_means = np.empty(num_samples)

for i in range(num_samples):
    samples = dice_throw(10)
    samples_mean = np.mean(samples)
    samples_means[i] = samples_mean
In [6]:
plt.hist(samples_means, bins=30, density=True, alpha=0.7, edgecolor='black')
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.title("Distribution of Sample Means (Central Limit Theorem)")
plt.axvline(np.mean(samples_means), color='red', linestyle='dashed', linewidth=2, label="Mean of Sample Means")
plt.legend()
plt.show()

print('Estimated mean: ', np.mean(samples_means))
print('True mean: ', 3.5)
No description has been provided for this image
Estimated mean:  3.5031199999999996
True mean:  3.5

What is important is to note that the distribution of "samples means" is a normal distribution, whose mean approaches the true mean as the number of samples increase.

No description has been provided for this image