
Central Limit Theorem Example¶
Faisal Qureshi
faisal.qureshi@ontariotechu.ca
http://www.vclab.ca
Copyright information¶
© Faisal Qureshi
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
License¶

Mean of dice throws¶
We know that an un-biased six-sided dice can have one of six outcomes (with equal probability): $1, 2, 3, 4, 5, 6$.
This suggests that the mean value for a dice throw is: $$ \frac{1+2+3+4+5+6}{6} = 3.5 $$
Estimating Sample Means¶
Say, we want to estimate the mean value for a dice throw, but we do not have access to the formula shown above. How would we do it?
One scheme is to throw a dice multiple times (i.e., sample the outcomes) and estimate the mean of the outcomes. Say, we decide to throw a dice $10$ times and we got the following:
def dice_throw(times):
return np.random.randint(1, 7, times)
samples = dice_throw(10)
print('Samples: ', samples)
Samples: [4 5 3 5 5 2 3 3 3 5]
We can compute the mean of this sample as follows
samples_mean = np.sum(samples)
print('Sample mean:', samples_mean)
print('True mean: ', 3.5)
Sample mean: 38 True mean: 3.5
Central Limit Theorem¶
the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the original population's distribution.
Central Limit Theorem in Action¶
- Let's take multiple samples by performing $10$ dice throws.
- Next compute the mean of each sample.
- Plot the means (as per the Central Limit Theorem, we should get a normal distribution)
num_samples = 10000
samples_means = np.empty(num_samples)
for i in range(num_samples):
samples = dice_throw(10)
samples_mean = np.mean(samples)
samples_means[i] = samples_mean
plt.hist(samples_means, bins=30, density=True, alpha=0.7, edgecolor='black')
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.title("Distribution of Sample Means (Central Limit Theorem)")
plt.axvline(np.mean(samples_means), color='red', linestyle='dashed', linewidth=2, label="Mean of Sample Means")
plt.legend()
plt.show()
print('Estimated mean: ', np.mean(samples_means))
print('True mean: ', 3.5)
Estimated mean: 3.5031199999999996 True mean: 3.5
What is important is to note that the distribution of "samples means" is a normal distribution, whose mean approaches the true mean as the number of samples increase.
