No description has been provided for this image

On MNIST Dataset¶

Faisal Qureshi
faisal.qureshi@ontariotechu.ca
http://www.vclab.ca

What is MNIST?¶

  • MNIST (Modified National Institute of Standards and Technology) dataset consists of 70,000 grayscale images of handwritten digits (0-9).
  • Each image is 28 $\times$ 28 pixels and labeled with a corresponding digit.
  • Used extensively for benchmarking machine learning and deep learning models.

History of MNIST¶

  • Originates from the NIST dataset used for US postal service digit recognition.
  • Yann LeCun, Corinna Cortes, and Christopher Burges modified the dataset to create MNIST in 1998.
  • Became a standard dataset for evaluating image classification algorithms.

MNIST Dataset Composition¶

  • Training Set: 60,000 images
  • Test Set: 10,000 images
  • Classes: 10 (digits 0-9)
  • Data Format:
    • Images: 28 $\times$ 28 pixel grayscale.
    • Labels: Single digit (0-9).

Benefits¶

  • Small and Simple: Easy to train models quickly.
  • Well-Structured: Standardized format enables consistent benchmarking.
  • Broadly Used: Suitable for deep learning, feature extraction, and traditional machine learning models.

Using MNIST in Python¶

In [1]:
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Load MNIST dataset
dataset = datasets.MNIST(root='../datasets/common', train=True, transform=transforms.ToTensor(), download=True)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

Sample MNIST Images¶

In [2]:
import matplotlib.pyplot as plt
import numpy as np

# Load dataset
mnist = datasets.MNIST(root='../datasets/common', train=True, transform=transforms.ToTensor(), download=True)
images, labels = zip(*[mnist[i] for i in range(10)])

# Plot images
plt.suptitle('MNIST Samples')
fig, axes = plt.subplots(1, 10, figsize=(10, 2))
for i, ax in enumerate(axes):
    ax.imshow(images[i].numpy().squeeze(), cmap='gray')
    ax.set_title(labels[i])
    ax.axis('off')
plt.show()
<Figure size 640x480 with 0 Axes>
No description has been provided for this image

Applications of MNIST¶

  • Benchmarking Image Classification Algorithms
  • Feature Learning and Representation
  • Handwriting Recognition Systems
  • Experimenting with Neural Networks (CNNs, Autoencoders, GANs, VAEs)

MNIST Extensions¶

  • Fashion-MNIST: Clothing images instead of digits.
  • EMNIST: Extended MNIST with letters.
  • KMNIST: Kuzushiji characters (Japanese handwriting dataset).

Conclusions¶

  • MNIST remains a fundamental dataset for learning and benchmarking image classification models.
  • Despite being solved by modern deep learning, it serves as an excellent starting point for understanding neural networks.
No description has been provided for this image