Computer Vision

Lecture notes


This course introduces students to computer vision – the science and technology to make computers "see." The goal of computer vision is to develop computational machinery to extract useful information from images and videos. The course will study various steps of the overall image analysis pipeline. Topics covered will include: image formation, image representation, segmentation, feature extraction, motion analysis, object detection, camera calibration, and 3D visual reconstruction.


Faisal Qureshi

Email: faisal.qureshi@ontariotechu.ca
Web: http://vclab.science.ontariotechu.ca

Forward

Doing well in a computer vision course requires developing both mathematical and computational skills. Poor grasp of undergraduate level linear algebra or calculus will surely be an impediment. Strong programming skills are also needed to design "useful" computer vision systems. There are many great resources---books, tutorials, lecture notes, videos, etc.---that will help you learn computer vision theory and methods. Often times, however, these resources either assume strong programming skills or leave it up to the students to develop the needed skill. The lecture notes included below are aimed at individuals who may benefit from seeing computer vision theory and methods in action. I have attempted to provide Python code examples that make computer vision theory tangible.

Python is now de facto scientific computing language. There are of course many situations where Python perhaps is a poor choice for system development; studying computer vision at an undergraduate level is not one of those situations. Even if scientific computing is not your primary focus, it is probably a good idea that you have an above average working knowledge of Python.

Yes, even in the age of deep learning, it is important for you to learn computer vision fundamentals. This is especially true if you want to work at the edge of discovery. Many recent computer vision papers leverage "old computer vision knowledge" to develop deep learning systems that achieve state-of-the-art performance on some very challenging computer vision tasks. Knowing computer vision fundamentals will be your competetive advantage when it comes time for you to interview for a job.

You can reach me via e-mail with suggestions, comments, corrections, etc.

Notes


Image formation
  • Pinhole camera model
  • Homoegenous coordinates
  • Intrinsic and extrinsic camera matrices
  • Lens effects
  • Camera calibration

Camera calibration
  • OpenCV checkerboard based camera calibration
  • Image undistortion

Linear filtering
  • Linear Filtering in 1D
  • Cross-correlation
  • Convolution
  • Gaussian filter
  • Gaussian blurring
  • Separability
  • Relationship to Fourier transform
  • Integral images

Image pyramids
  • Gaussian image pyramids
  • Laplacian image pyramids
  • Laplacian blending

Frequency analysis
  • Frequency analysis of images
  • Fourier transform
  • Inverse Fourier transform
  • Discrete Fourier transform
    • Fast Fourier transform (FFT)
  • Nyquist theorem
  • Convolution theorem
  • Properties of Fourier transform
  • Fourier transform of an Image
  • Why FFT?

Template matching
  • Sum of squared differences
  • Normalized sum of squared differences
  • Cross-correlation
  • Normalized cross-correlation
  • Correlation coefficient
  • Normalized correlation coefficient

Image derivatives
  • Why do we care about image gradients?
  • Computing image derivatives
  • Sobel filters
  • Gradient magnitude and directions
  • Visualizing image gradients

Edge detection
  • Origin of edges
  • Uses of edge detection
  • Canny edge detector
    • Identifying edge pixels using image gradient
    • Non-maxima suppression
    • Edge linking via hysteresis
    • Difference of Gaussian
    • Implementation in Python

Histograms
  • Histograms in 1D and 2D
    • Construction
    • Visualization
  • Non-uniform bins

Interest points
  • Uses of interest point detection
  • Interest point detection and its relationship to feature descriptors and feature matching
  • Interest point detection
  • Corner detection

This notebook focuses on interest point detection. We leave feature descriptors and feature matching for an other time.


Image sampling
  • Interpolation basics
  • Image sampling
  • Bilinear sampling

Local features
  • Characteristics of a good local feature
  • Raw patches as local features
  • SIFT descriptor
  • Feature detection and matching in OpenCV
  • Blob detection
  • MSER in OpenCV
  • Applications of local features

Median filtering
  • Median filtering

Bilateral filtering
  • Bilateral filtering

Texture analysis
  • Texture analysis
  • Filter banks
    • Leung-Malik Filter (LM) Bank
      • LM filter construction
    • Schmid Filter Bank
    • Maximum Response Filter Bank

Least squares
  • Model fitting: Why?
  • Linear regression
  • Least squares
    • 2D line fitting example
  • Total least squares
  • Aside: Singular Value Decomposition (SVD)

Robust least squares
  • Robust least squares
  • Outliers
  • Loss functions
    • Linear loss
    • Soft L1 loss
    • Huber loss
    • Cauchy loss
    • arctan loss
    • Bisquare loss
  • Incomplete data
  • Mixed data

RANSAC
  • RANSAC
  • RANSAC for 2D line fitting
  • RANSAC Algorithm
    • Pros
    • Cons
    • Uses

Hough transform
  • Hough transform
  • Fitting lines to data
  • Polar representation of a line
  • Counting votes
  • Applications

Homography
  • Homography
  • Homography application: Image stitching
  • Solving for Homography

Feature tracking and optical flow
  • Motion cues
  • Recovering motion
  • Feature tracking
    • Challenges
  • Lucas-Kanade tracker
  • Aperture problem
  • Motion estimation and its relationship to corner detection
  • Actual and percieved motion
  • Dealing with large motions
    • Course-to-fine registation
  • Shi-Tomasi feature tracker
  • Perception of motion
  • Uses of motion
  • Optical flow
  • Lukas-Kanada optical flow

Epipolar geometry
  • The need for multiple views
  • Depth ambiguity
  • Estimating scene shape
    • Shape from shading
    • Shape from defocus
    • Shape from texture
    • Shape from perspective cues
    • SHape from motion
  • Stereograms, human stereopsis, and disparity
  • Imaging geometry for a simple stereo system
  • Epipolar geometry
  • Fundamental matrix
  • Essential matrix
  • Rectification
  • Stereo matching
  • Active stereo

Action recognition
  • Spatio-temporal interest points
  • Videos as a bag of visual words
  • Localization in space and in time
  • Early deep learning models for action recognition

Expectation maximization and Latent semantic analysis
  • Finite mixture models
  • Gaussian Mixture Models (GMM)
  • EM for GMM
  • Probabilistic latent semantic analysis

Reference material

Other useful items

Sam Roweis Notes

Sam Roweis was a Canadian computer scientist specializing in Machine Learning. He is sadly no longer with us. In addition to being a superlative machine learning researcher, Sam was a passionate educator. Below I include his notes on linear algebra and probability. I re-discovered these notes thanks to Inmar Givoni.

Programming Resources

Copyright and License

© Faisal Z. Qureshi

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Last update: 2024-02-26 18:15
Webify version: 4.1
© Faisal Qureshi