
On CIFAR Dataset¶
Faisal Qureshi
faisal.qureshi@ontariotechu.ca
http://www.vclab.ca
What is CIFAR dataset?¶
- CIFAR (Canadian Institute For Advanced Research) datasets are widely used in computer vision.
- Two main versions:
- CIFAR-10: 60,000 images (10 classes, 6,000 per class).
- CIFAR-100: 60,000 images (100 classes, 600 per class).
- Developed by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
CIFAR-10 Dataset¶
- 10 classes: Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck.
- Image format: 32x32 RGB images.
- Training set: 50,000 images.
- Test set: 10,000 images.
CIFAR-100 Dataset¶
- 100 classes grouped into 20 superclasses.
- Each class has 600 images.
- Follows the same structure as CIFAR-10.
Why Use CIFAR?¶
- Challenging dataset for deep learning.
- Diverse classes for object recognition.
- Benchmarking for CNN architectures.
CIFAR-10 in Python¶
In [1]:
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Load CIFAR-10 dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
dataset = datasets.CIFAR10(root='../datasets/common', train=True, transform=transform, download=True)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
Sample CIFAR-10 Images¶
In [2]:
import matplotlib.pyplot as plt
import numpy as np
# Load dataset
cifar = datasets.CIFAR10(root='../datasets/common', train=True, transform=transforms.ToTensor(), download=True)
images, labels = zip(*[cifar[i] for i in range(10)])
# Plot images
fig, axes = plt.subplots(1, 10, figsize=(20, 4))
for i, ax in enumerate(axes):
ax.imshow(np.transpose(images[i].numpy(), (1, 2, 0)))
ax.set_title(cifar.classes[labels[i]])
ax.axis('off')
plt.show()
Applications of CIFAR¶
- Convolutional Neural Network (CNN) training
- Object recognition and classification
- Transfer learning and fine-tuning
- Image augmentation experiments
CIFAR Challenges¶
- Small image size (32 $\times$ 32) limits detail.
- Intra-class variability (e.g., different dog breeds in the same class).
- Susceptible to overfitting due to limited resolution.
Variants and Extensions¶
- Tiny ImageNet: Larger dataset with 200 classes.
- SVHN (Street View House Numbers): Digit classification similar to CIFAR but with real-world images.
- ImageNet: Larger, more complex dataset for advanced deep learning models.
Conclusion¶
- CIFAR remains a fundamental dataset for training and evaluating deep learning models.
- Used in academic research and industry applications.
- Serves as a stepping stone for working with larger datasets like ImageNet.
