Lecture notes

This course introduces students to computer vision – the science and technology to make computers "see." The goal of computer vision is to develop computational machinery to extract useful information from images and videos. The course will study various steps of the overall image analysis pipeline. Topics covered will include: image formation, image representation, segmentation, feature extraction, motion analysis, object detection, camera calibration, and 3D visual reconstruction.

Email: faisal.qureshi@ontariotechu.ca

Web: http://vclab.science.ontariotechu.ca

Doing well in a computer vision course requires developing both mathematical and computational skills. Poor grasp of undergraduate level linear algebra or calculus will surely be an impediment. Strong programming skills are also needed to design "useful" computer vision systems. There are many great resources---books, tutorials, lecture notes, videos, etc.---that will help you learn computer vision theory and methods. Often times, however, these resources either assume strong programming skills or leave it up to the students to develop the needed skill. The lecture notes included below are aimed at individuals who may benefit from seeing computer vision theory and methods *in action*. I have attempted to provide Python code examples that make computer vision theory tangible.

Python is now *de facto* scientific computing language. There are of course many situations where Python perhaps is a poor choice for system development; studying computer vision at an undergraduate level is not one of those situations. Even if scientific computing is not your primary focus, it is probably a good idea that you have an above average working knowledge of Python.

Yes, even in the *age of deep learning*, it is important for you to learn computer vision fundamentals. This is especially true if you want to work at the edge of discovery. Many recent computer vision papers leverage "old computer vision knowledge" to develop deep learning systems that achieve state-of-the-art performance on some very challenging computer vision tasks. Knowing computer vision fundamentals will be your competetive advantage when it comes time for you to interview for a job.

You can reach me via e-mail with suggestions, comments, corrections, etc.

- Pinhole camera model
- Homoegenous coordinates
- Intrinsic and extrinsic camera matrices
- Lens effects
- Camera calibration

- OpenCV checkerboard based camera calibration
- Image undistortion

- Linear Filtering in 1D
- Cross-correlation
- Convolution
- Gaussian filter
- Gaussian blurring
- Separability
- Relationship to Fourier transform
- Integral images

- Gaussian image pyramids
- Laplacian image pyramids
- Laplacian blending

- Frequency analysis of images
- Fourier transform
- Inverse Fourier transform
- Discrete Fourier transform
- Fast Fourier transform (FFT)

- Nyquist theorem
- Convolution theorem
- Properties of Fourier transform
- Fourier transform of an Image
- Why FFT?

- Sum of squared differences
- Normalized sum of squared differences
- Cross-correlation
- Normalized cross-correlation
- Correlation coefficient
- Normalized correlation coefficient

- Why do we care about image gradients?
- Computing image derivatives
- Sobel filters
- Gradient magnitude and directions
- Visualizing image gradients

- Origin of edges
- Uses of edge detection
- Canny edge detector
- Identifying edge pixels using image gradient
- Non-maxima suppression
- Edge linking via hysteresis
- Difference of Gaussian
- Implementation in Python

- Histograms in 1D and 2D
- Construction
- Visualization

- Non-uniform bins

- Uses of interest point detection
- Interest point detection and its relationship to feature descriptors and feature matching
- Interest point detection
- Corner detection

*This notebook focuses on interest point detection. We leave
feature descriptors and feature matching for an other time.*

- Interpolation basics
- Image sampling
- Bilinear sampling

- Characteristics of a good local feature
- Raw patches as local features
- SIFT descriptor
- Feature detection and matching in OpenCV
- Blob detection
- MSER in OpenCV
- Applications of local features

- Median filtering

- Bilateral filtering

- Texture analysis
- Filter banks
- Leung-Malik Filter (LM) Bank
- LM filter construction

- Schmid Filter Bank
- Maximum Response Filter Bank

- Leung-Malik Filter (LM) Bank

- Model fitting: Why?
- Linear regression
- Least squares
- 2D line fitting example

- Total least squares
- Aside: Singular Value Decomposition (SVD)

- Robust least squares
- Outliers
- Loss functions
- Linear loss
- Soft L1 loss
- Huber loss
- Cauchy loss
- arctan loss
- Bisquare loss

- Incomplete data
- Mixed data

- RANSAC
- RANSAC for 2D line fitting
- RANSAC Algorithm
- Pros
- Cons
- Uses

- Hough transform
- Fitting lines to data
- Polar representation of a line
- Counting votes
- Applications

- Homography
- Homography application: Image stitching
- Solving for Homography

- Motion cues
- Recovering motion
- Feature tracking
- Challenges

- Lucas-Kanade tracker
- Aperture problem
- Motion estimation and its relationship to corner detection
- Actual and percieved motion
- Dealing with large motions
- Course-to-fine registation

- Shi-Tomasi feature tracker
- Perception of motion
- Uses of motion
- Optical flow
- Lukas-Kanada optical flow

- The need for multiple views
- Depth ambiguity
- Estimating scene shape
- Shape from shading
- Shape from defocus
- Shape from texture
- Shape from perspective cues
- SHape from motion

- Stereograms, human stereopsis, and disparity
- Imaging geometry for a simple stereo system
- Epipolar geometry
- Fundamental matrix
- Essential matrix
- Rectification
- Stereo matching
- Active stereo

- Spatio-temporal interest points
- Videos as a bag of visual words
- Localization in space and in time
- Early deep learning models for action recognition

- Finite mixture models
- Gaussian Mixture Models (GMM)
- EM for GMM
- Probabilistic latent semantic analysis

*Computer Vision: Algorithms and Applications*by Richard Szelski*Fundamentals of Computer Vison*by Mubarak Shah*Multi View Geometry in Computer Vision*by Richard Hartley and Andrew Zisserman*Computer Vision: Models, Learning, and Inference*by Simon Prince

Sam Roweis was a Canadian computer scientist specializing in Machine Learning. He is sadly no longer with us. In addition to being a superlative machine learning researcher, Sam was a passionate educator. Below I include his notes on linear algebra and probability. I re-discovered these notes thanks to Inmar Givoni.

- python
- numpy;
- scipy;
- opencv
- matplotlib; and
- jupyter notebook.

© Faisal Z. Qureshi

This
work is licensed under a
Creative
Commons Attribution-NonCommercial 4.0 International License.

Last update: 2024-02-26 18:15

Webify version: 4.1

© Faisal Qureshi