Epipolar Geometry¶

Faisal Qureshi
Professor
Faculty of Science
Ontario Tech University
Oshawa ON Canada
http://vclab.science.ontariotechu.ca

Copyright information¶

License¶

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Lesson Plan¶

The need for multiple views
Depth ambiguity
Estimating scene shape
- Shape from shading
- Shape from defocus
- Shape from texture
- Shape from perspective cues
- SHape from motion
Stereograms, human stereopsis, and disparity
Imaging geometry for a simple stereo system
Epipolar geometry
Fundamental matrix
Essential matrix
Rectification
Stereo matching
Active stereo

The need for multiple views?¶

Structure and depth are inherently ambiguous from single views.

No description has been provided for this image

Figures from Lana Lazebnik.

Depth ambiguitiy¶

Notice that 3D points $a$, $b$, $c$ all project to the same location $a'=b'=c'$ in the image. This suggests that there is no straightforward scheme of estimating the depth given an image.

Estimating scene shape¶

"Shape from X", where X is one of shading, texture, focus, or motion.

Shape from shading¶

Shape from defocus¶

Shape from texture¶

Shape using perspective cues¶

Shape from motion¶

Often referred to as "structure from motion"

Human eye¶

Pupil/iris: control the amount of light passing reaching the retina
Retina: contains photo-sensitive cells, where image is formed
Fovea: highest concentration of cones

Human stereopsis: disparity¶

Human eyes fixate on points in space--rotate so that corresponding images form in the centers of fovea

From Bruce and Green, Visual Perception, Physiology, Psychology and Ecology

Disparity occurs when eyes fixate on one object; others appear at different visual angles

From Bruce and Green, Visual Perception, Physiology, Psychology and Ecology

Specifically, disparity $d$ is given by the following relation

$$ d = r - l = D - F $$

Forsyth and Ponce

Random dot stereograms¶

Julesz 1960: Do we identify local brightness patterns before fusion (monocular process) or after (binocular)?
- Pair of synthetic images obtained by randomly spraying black dots on white objects

Findings: when viewed monocularly, they appear random; when viewed stereoscopically, see 3d structure.

Conclusion¶

Human binocular fusion not directly associated with the physical retinas; must involve the central nervous system
Imaginary “cyclopean retina” that combines the left and right image stimuli as a single unit

Stereo photography and stereo viewers¶

Take two pictures of the same subject from two slightly different viewpoints and display so that each eye sees only one of the images.

Invented by Sir Charles Weatstone, 1838.

Image from fisher-price.com

3D cinema¶

http://www.johnsonshawmuseum.org

Stereo glasses needed to watch 3D movies.

http://www.johnsonshawmuseum.org

Brain fuses information from left/right images when these are shown in quick succession to give an appearance of depth.

http://www.well.com/~jimg/stereo/stereo_list.html

Autostereograms¶

Exploit disparity as depth cue using single image.
- Sometimes referred to as single image random dot stereogram or single image stereogram

Seeing 3D structure¶

Try to see beyond the image, merging the two dots in the process. You should percieve a 3D structure. See if you can see it.

magiceye.com

If it works out, you will see the following 3D structure.

magiceye.com

Estimating depth with stereo¶

Shape from "motion" between two views
Infer 3D shape of scene from two (multiple) images from different viewpoints

Find the same point in two images (local feature descriptors?)

Stereo vision¶

Two camera setup (left) or a single moving camera (right)

Key idea¶

A second camera can resolve ambiquity enabling measurement of depth via triangulation

Imaging geometry¶

Extrinsic params
- Rotation matrix and translation vector
Intrinsic params
- Focal length, pixel sizes (mm), image center point, radial distortion

We’ll assume for now that these parameters are given and fixed.

Here extrinsic parameters describe the relationship between two cameras or a single moving camera.

Geometry for a simple stereo system¶

Assuming parallel optical axes and known camera parameters (i.e., calibrated cameras), we get the following setup:

Consider $\triangle (p_l, \mathbf{p}, p_r)$ shown in red and $\triangle (O_l, \mathbf{p}, O_r)$ shown in blue in the following figure

Use pinhole model to map world point $\mathbf{p}$ to image point $\mathbf{x}_l$ in the left camera:

$$ \mathbf{x}_l = f \frac{X}{Z}. $$

Now, use pinhole model to map world point $\mathbf{p}$ to image point $\mathbf{x}_r$ in the right camera:

$$ \mathbf{x}_r = f \frac{X-T}{Z}. $$

Here we employ the fact that if the world point $\mathbf{p}$ is $(X,Z)$ in the left camera then the same world point will be $(X-T,Z)$ in the right camera because the right camera is shifted by $T$.

We define disparity as $$ d = \mathbf{x}_l - \mathbf{x}_r. $$

This gives us

$$ \begin{align} d &= \mathbf{x}_l - \mathbf{x}_r \\ &= f \frac{X}{Z} - f \frac{X-T}{Z} \\ &= \frac{fX - fX + fT}{Z} \\ &= f \frac{T}{Z} \end{align} $$

This suggests that we are able to estimate the depth of point $\mathbf{p}$ by using disparity $(\mathbf{x}_l - \mathbf{x}_r)$:

$$ Z = f \frac{T}{d} $$

Depth from disparity¶

If we can find corresponding points (locations) in two images (top and middle), we can compute disparity for these locations (last row). We can then use disparity to calculate the relative depth.

Top figure
- Red dot with arrow is $(x,y)$
Middle figure
- Red dot with arrow is $(x',y')$
Bottom figure
- Red dot disparity $D(x,y)$ for location $(x,y)$ in top image

Then

$$ (x',y') = (x + d(x,y), y) $$

Aside: the red dot without arrows in top figure is $(x',y')$ and the red dot without arrows in bottom figure is $(x,y)$. This confirms that the same 3D point appears at different locations in the two images.

Depth from stereo: key idea¶

Triangulate from corresponding image points in two or more images.

Stereo matching¶

Given a point in one image (say left image below), where do we find the corresponding point in the second image (say right image below)?

Epipolar geometry¶

Epipolar geometry is defined by two cameras. Given an image location in one camera, the epipolar constrat identifies the location(s) in the other camera that must contain the corresponding image point.
- If extrinsic parameters are available for both cameras, the epipolar constraints is often used to speed up correspondence search (stereo matching).

Epipolar constraint¶

Given an image point $p$ in the left image, its correspondence points in the right image must lie on the line $\overline{e'p'}$. Note that this line is the intersection of two planes: 1) right image plane and plane $oo'p$. Plane $oo'p$ is contains the point $p$ in the left image and the two optical centers.