Faisal Qureshi
Professor
Faculty of Science
Ontario Tech University
Oshawa ON Canada
http://vclab.science.ontariotechu.ca
© Faisal Qureshi
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Consider image stitching. It requires that we find corresponding "locations" in two images. Given these corresponding locations, we can compute homography, which would allow us to stitch the two images to construct a panorama.
We need to find at least some of the same points in two images to any chance of finding true matches. There is little chance that we can find corresponding locations given the following two images.
Detection process run independently on two images should return at least some of the corresponding locations as seen below.
Recall that we have attempted to address this issue by interest point detection. These are locations in the image that are (somewhat) "invariant" to geometric and photometric changes. Specifically, we identified corner locations as those that are covariant to translation and rotation and partially invariant to changes in intensity. Recall also that corner detection is not invariant to changes in scale.
Observation 1: identify interest points locations (say, through corner detection) and construct local features around these locations.
Available interest point detectors
What do you want it for?
Best choice often application dependent
Take home lesson
We will soon see that deep learning has revolutionized image feature construction. More on this later.
We want to reliably determine which location in one image goes with which location in the second image. The computed features should be invariant to geometric and photometric differences between the two images. Consider the following figure
Our task is to find the corresponding locations in the two images. This means that we need to figure out which of the two locations in the image on the right matches with the location shown in the image on the left.
Observation 2: compute descriptors that encode the area surrounding an interest point. These descriptors should be compact (for computational reasons), these should have local support, and these should have invariance properties with respect to geometric and photometric changes.
? Why do we want to encode local region around an interest point. Why not encode the entire image?
Encode area around interest points as vectors. We can then easily match these features to identify the corresponding locations between the two images. The following figure illustrates this idea. Here, local area around three interest point locations (one in the left image, and two in the right image) is encoded as $d$-dimensional vectors.
We can find the corresponding location by matching these $d$-dimensional vectors. There are many options for doing so. E.g., we can uses sum-of-squared differences (SSD) to match these vectors. Alternately, we can use cosine similarity. And there are many other techniques for matching vectors.
Invariance to translation, rotation, and scale.
Invariance to changes in intensity and color.
(Figures courtesy T. Tuytelaars ECCV 2006 tutorial)
The simplest way to describe the neighborhood around an interest point is to write down the list of intensities to form a feature vector.
Consider the figure below.
The image patch around the interest point locations (depicted by the red circles) are as seen below.
Lets write down the list of intensities in these patches to form the feature vector
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
left_patch = cv.imread('data/local-features-construction-2.jpg')
left_patch = cv.cvtColor(left_patch, cv.COLOR_BGR2RGB)
left_patch = cv.resize(left_patch, (32, 32), interpolation=cv.INTER_NEAREST)
right_patch = cv.imread('data/local-features-construction-1.jpg')
right_patch = cv.cvtColor(right_patch, cv.COLOR_BGR2RGB)
right_patch = cv.resize(right_patch, (32, 32), interpolation=cv.INTER_NEAREST)
plt.figure(figsize=(10,5))
plt.subplot(121)
plt.imshow(left_patch)
plt.subplot(122)
plt.imshow(right_patch)
Notice that listing intensities is very sensitiive to changes in rotation, scale, intensity, etc. These make poor feature descriptors.
Description taken from various places, including https://sbme-tutorials.github.io/2019/cv/notes/7_week7.html
Construct SIFT pyramid, which consists of Octaves and Scales. Octaves are different levels of image resolutions (pyramids levels), and scales represent different scales of window in each octave level (different $\sigma$ of Gaussian window)
At each scale compare cornerness with neighbouring scales (upper and lower scales) and pick the scale with maximum cornerness value. Not all corners in an image are localized at the same scale.
Rotate patch according to its dominant gradient orientation. This puts the patches into a canonical orientation.
See below for how to find the dominant gradient orientation.
After localization of a key-point in our scale space. We can get its SIFT descriptor as follow
SIFT was initially included in OpenCV; however, it is no longer available. Since SIFT was patented. An option is to use VLfeat library, which includes a SIFT implementation. VLfeat currently doesn't have a "stable" Python biding. Still you are welcome to try it using pip install pyvlfeat
.
Consider the figures below. "SIFT patches" are overlaid on the two images. Our goal is to generate candidate matches.
Classical feature descriptors, such as SIFT, SURF, etc., are usually compared and matched using the Euclidean distance (or L2-norm). Other techniques for matching these features are Cosine similarity, Earth Mover's Distance (also known as Wasserstein Distance), etc.
Compute Cosine and Euclidean distance matrix between three vectors $[1,0,0]$, $[0,1,0]$, $[1,1,0]$, and $[10,-2,1]$
# %load solutions/local-features/solution-01.py
How do we convert distance values to similarity values. For cosine distance, simply subtract cosine distance from 1.0. In general if your distance metric returns values between 0 and 1, then you can use this trick.
Compute Cosine similarity matrix between $[1,0,0]$, $[0,1,0]$, $[1,1,0]$, and $[10,-2,1]$
# %load solutions/local-features/solution-02.py
For other distances, we can use, say, a Gaussian kernel as follows:
$$ K(d) = \exp \left( \frac{d^2} {2 \sigma^2} \right), $$where $d$ is the distance between two vectors $\mathbf{x}_1$ and $\mathbf{x}_2$. $\sigma$ is a tuning (or scaling) parameter. If $\sigma$ is high, $K(d)$ will be close to $1$ (i.e., high similarity) for large values of $d$. If $\sigma$ is small, even a small $d$ will reduce the similarity scores for the two vectors.
Compute similarity matrix between $[1,0,0]$, $[0,1,0]$, $[1,1,0]$, and $[10,-2,1]$. Assume Euclidean distance metric.
# %load solutions/local-features/solution-03.py
Wasserstien distance is computed between two probability distributions (below represented as histograms). Check out scipy.stats
module for methods for computing Wasserstien distance.
from scipy.stats import wasserstein_distance
wasserstein_distance([0, 1, 3], [5, 6, 8])
We now also have binary feature descritors, such as ORB, BRISK, which are matched using Hamming distance.
$$ d_{\mathrm{hamming}} (\mathbf{a}, \mathbf{b}) = \sum_{i=0}^{n-1} (a_i \oplus b_i) $$Aside:
Compare them all, take the closest (or closest $k$, or within a thresholded distance).
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img1 = cv.imread('data/box.png',cv.IMREAD_GRAYSCALE)
img2 = cv.imread('data/box_in_scene.png',cv.IMREAD_GRAYSCALE)
print(img1.shape)
orb = cv.ORB_create()
kp1, des1 = orb.detectAndCompute(img1, None) # locations and descriptor
kp2, des2 = orb.detectAndCompute(img2, None)
bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)
matches = sorted(matches, key = lambda x:x.distance)
img3 = cv.drawMatches(img1, kp1, img2, kp2, matches[:10], None, flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.figure(figsize=(10,10))
plt.imshow(img3)
plt.show()
# %load solutions/local-features/kdtree.py
Lets consider SSD metric for finding matches. How do we threshold on SSD? One approach is to compute the ratio of the distance to best match to distance to the second best match. If this ratio is low, the best match is a good candidate. If this ratio is high, then the best match could be an ambiguous match.
(From OpenCV documentation) FLANN stands for Fast Library for Approximate Nearest Neighbors. It contains a collection of algorithms, such as KDTree, Locality Sensitive Hashing, etc., optimized for fast nearest neighbor search in large datasets and for high dimensional features. It works more faster than BFMatcher for large datasets.
For OpenCV implementation, possible values are:
FLANN_INDEX_LINEAR = 0
FLANN_INDEX_KDTREE = 1
FLANN_INDEX_KMEANS = 2
FLANN_INDEX_COMPOSITE = 3
FLANN_INDEX_KDTREE_SINGLE = 4
FLANN_INDEX_HIERARCHICAL = 5
FLANN_INDEX_LSH = 6
FLANN_INDEX_SAVED = 254
FLANN_INDEX_AUTOTUNED = 255
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img1 = cv.imread('data/box.png',cv.IMREAD_GRAYSCALE)
img2 = cv.imread('data/box_in_scene.png',cv.IMREAD_GRAYSCALE)
orb = cv.ORB_create()
kp1, des1 = orb.detectAndCompute(img1, None) # locations and descriptor
kp2, des2 = orb.detectAndCompute(img2, None)
FLANN_INDEX_LSH = 6
index_params = dict(algorithm = FLANN_INDEX_LSH, table_number = 6)
search_params = dict(checks=50)
flann = cv.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1, des2, k=2)
good = []
for match in matches:
if len(match) < 2: continue
m, n = match[0], match[1]
if m.distance < 0.75*n.distance:
good.append([m])
img3 = cv.drawMatchesKnn(img1, kp1, img2, kp2, good, None, flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.figure(figsize=(10,10))
plt.imshow(img3)
plt.show()
Laplacian of Gaussian is circularly symmetric operator for blob detection in 2D.
$$ \nabla^2 g = \frac{\partial^2 g}{\partial x^2} + \frac{\partial^2 g}{\partial y^2} $$We define the characteristic scale as the scale that produces peak of Laplacian response.
We can approximate Laplacian as Difference of Gaussian (DOG), which much more efficient to compute.
The relevant parameters are described below:
import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt
filename = "data/butterfly.jpg"
#filename = "data/BlobTest.jpg"
im = cv.imread(filename)
im = cv.cvtColor(im, cv.COLOR_BGR2RGB)
params = cv.SimpleBlobDetector_Params()
params.minThreshold = 10 # Change thresholds
params.maxThreshold = 250
params.filterByArea = False # Filter by Area.
params.minArea = 100
params.filterByCircularity = False # Filter by Circularity
params.minCircularity = 0.1
params.filterByConvexity = False # Filter by Convexity
params.minConvexity = 0.9
params.filterByInertia = False # Filter by Inertia
params.minInertiaRatio = 0.9
detector = cv.SimpleBlobDetector_create(params)
keypoints = detector.detect(im)
im_with_keypoints = cv.drawKeypoints(im, keypoints, np.array([]), (255,0,0), cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
plt.figure(figsize=(15,15))
plt.imshow(im_with_keypoints)