Least squares

Faisal Qureshi
Professor
Faculty of Science
Ontario Tech University
Oshawa ON Canada
http://vclab.science.ontariotechu.ca

© Faisal Qureshi

In [208]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm

Lesson plan

  • Linear regression
  • Least squares
  • Total least squares

Why model fitting?

Finding objects/elements in an image

Image analysis and scene understanding

Least squares

2D line fitting example

Sum of square differences is

$$ \epsilon = \sum_{i=1}^n (y_i - m x_i - c)^2 $$

We want to find values of slope $m$ and intercept $c$ so as to minimize the sum of squared differences $\epsilon$. Note that $\epsilon$ is a quadratic function (bowl shaped).

Quadratic function in 1D

$$ y = f(x) = (x + 4)^2 $$
In [209]:
plotting_x = np.linspace(-10,10,100)
quadratic_fn_1d = (plotting_x+4)**2

plt.figure(figsize=(5,5))
plt.title("Quadractic function in 1d")
plt.plot(plotting_x, quadratic_fn_1d)
plt.yticks([0])
plt.xticks([-4])
Out[209]:
([<matplotlib.axis.XTick at 0x1210e9e10>],
 <a list of 1 Text major ticklabel objects>)

The derivation of $f(x)$ is $$ \frac{d y}{d x} = 2(x + 4) $$

By setting $$ \frac{d y}{d x} = 0 $$ we can find the value of $x$ that minimizes function $f(x)$. In this case it is $x=-4$.

In [210]:
deriv_quadratic_fn_1d = 2*(plotting_x + 4)

plt.figure(figsize=(5,5))
plt.title("Quadractic function in 1d")
plt.plot(plotting_x, quadratic_fn_1d, label='$f(x)$')
plt.plot(plotting_x, deriv_quadratic_fn_1d, label="$f'(x)$")
plt.legend()
plt.yticks([0])
plt.xticks([-4])
Out[210]:
([<matplotlib.axis.XTick at 0x11f2d1f10>],
 <a list of 1 Text major ticklabel objects>)

Quadratic function in 2D

In [211]:
plotting_x = np.linspace(-2,2,11)
plotting_y = np.linspace(-2,2,11)

xx, yy = np.meshgrid(plotting_x, plotting_y)

quadratic_fn_2d = xx**2 + yy**2

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(14,6))
ax = fig.add_subplot(121, projection='3d')
ax.plot_surface(xx, yy, quadratic_fn_2d)
plt.xlabel('x')
plt.ylabel('y')
plt.xticks([])
plt.yticks([])
plt.subplot(122)
plt.imshow(quadratic_fn_2d, extent=[-2,2,-2,2])
plt.xlabel('x')
plt.ylabel('y')
plt.xticks([0.0])
plt.yticks([0.0])
plt.suptitle('2D quadratic function');

Solving for $m$ and $c$.

$$ \DeclareMathOperator*{\argmin}{arg\,min} m^*,c^* = \argmin_{m,c} \sum_{i=1}^n \left(y_i - m x_i - c \right)^2 $$

We can achieve this by setting $$ \frac{\partial \epsilon}{\partial m} = 0 $$ and $$ \frac{\partial \epsilon}{\partial c} = 0. $$

From calculus, we know that $$ \frac{\partial \epsilon}{\partial m} = - \sum_{i=1}^n 2 x_i \left(y_i - m x_i - c \right) $$ and $$ \frac{\partial \epsilon}{\partial c} = - \sum_{i=1}^n 2 \left(y_i - m x_i - c \right). $$

Setting these to 0, we get $$ \sum_{i=1}^n 2 x_i \left(y_i - m x_i - c \right) = 0 $$ and $$ \sum_{i=1}^n 2 \left(y_i - m x_i - c \right) = 0. $$

Let $$\langle y \rangle = \frac{1}{n} \sum_{i=1}^n y_i$$ and $$\langle x \rangle = \frac{1}{n} \sum_{i=1}^n x_i.$$

We can then re-write equation 2 above as $$ c = \langle y \rangle - m \langle x \rangle. $$

Re-write equation 1 above. $$ \begin{align} & \sum_{i=1}^n x_i y_i - m \sum_{i=1}^n x_i^2 - c \sum_{i=1}^n x_i = 0 \\ \Rightarrow & \frac{1}{n} \sum_{i=1}^n x_i y_i - \frac{m}{n} \sum_{i=1}^n x_i^2 - \frac{c}{n} \sum_{i=1}^n x_i = 0 \\ \Rightarrow & \langle x y \rangle - m \langle x^2 \rangle = c \langle x \rangle \end{align} $$ where $$ \langle x y \rangle = \frac{1}{n} \sum_{i=1}^n x_i y_i $$ and $$ \langle x^2 \rangle = \frac{1}{n} \sum_{i=1}^n x_i^2. $$

Substitute the value of $c$ $$ \begin{align} & \langle x y \rangle - m \langle x^2 \rangle = \langle x \rangle \langle y \rangle - m \langle x \rangle^2 \\ \Rightarrow & m \left( \langle x \rangle^2 - \langle x^2 \rangle \right) = \langle x \rangle \langle y \rangle - \langle x y \rangle \\ \Rightarrow & m = \frac{\langle x \rangle \langle y \rangle - \langle x y \rangle}{\langle x \rangle^2 - \langle x^2 \rangle} \end{align} $$

Use this value of $m$ to solve for $c$.

2D line fitting example 1

In [238]:
# Generate some data
np.random.seed(0)

m_true = .75
c_true = 1

x = np.hstack([np.linspace(3, 5, 5), np.linspace(7, 8, 5)])
y = m_true * x + c_true + np.random.rand(len(x))
data = np.vstack([x, y]).T
n = data.shape[0]

plt.figure(figsize=(7,7))
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Generating some points from a line');
In [213]:
x = data[:,0]
y = data[:,1]

print(f'x =\n{x.reshape(n,1)}')
print(f'y =\n{y.reshape(n,1)}')
x =
[[3.  ]
 [3.5 ]
 [4.  ]
 [4.5 ]
 [5.  ]
 [7.  ]
 [7.25]
 [7.5 ]
 [7.75]
 [8.  ]]
y =
[[3.7988135 ]
 [4.34018937]
 [4.60276338]
 [4.91988318]
 [5.1736548 ]
 [6.89589411]
 [6.87508721]
 [7.516773  ]
 [7.77616276]
 [7.38344152]]
In [214]:
mean_x = np.mean(x)
mean_y = np.mean(y)
mean_x2 = np.mean(np.square(x))
mean_xy = np.mean(x * y)

print(f'<x> = {mean_x}')
print(f'<y> = {mean_y}')
print(f'<x2> = {mean_x2}')
print(f'<xy> = {mean_xy}')
<x> = 5.75
<y> = 5.928266283314543
<x2> = 36.4375
<xy> = 36.68301372420285
In [215]:
m = (mean_x * mean_y - mean_xy) / (mean_x**2 - mean_x2)
print(f'm = {m}')
m = 0.7690318800427336
In [216]:
c = mean_y - mean_x * m
print(f'c = {c}')
c = 1.506332973068825
In [217]:
x_plotting = np.linspace(0,10,100)
y_plotting = m * x_plotting + c

plt.figure(figsize=(7,7))
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.plot(x_plotting, y_plotting, 'b-')
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Line fitted to data using least squares.')
Out[217]:
Text(0.5, 1.0, 'Line fitted to data using least squares.')

2D line fitting - vector notation

$$ \begin{align} \epsilon &= \sum_{i=1}^n (y_i - m x_i - c)^2 \\ &= \| \left[ \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right] - \left[ \begin{array}{cc} x_1 & 1\\ x_2 & 1\\ \vdots & \vdots \\ x_n & 1 \end{array} \right] \left[ \begin{array}{c} m\\ c \end{array} \right] \|^2 \\ &= \| \mathbf{y} - \mathbf{A} \mathbf{p} \|^2 \\ &= \mathbf{y}^T\mathbf{y} - 2 ( \mathbf{A} \mathbf{p} )^T \mathbf{y} + (\mathbf{A} \mathbf{p})^T \mathbf{A} \mathbf{p} \end{align} $$

We know that $$ \frac{\partial \epsilon}{\partial \mathbf{p}} = -2 \mathbf{A}^T \mathbf{y} + 2 \mathbf{A}^T \mathbf{A} \mathbf{p} $$

Setting $$ \frac{\partial \epsilon}{\partial \mathbf{p}}=0. $$ we get $$ \begin{align} & -2 \mathbf{A}^T \mathbf{y} + 2 \mathbf{A}^T \mathbf{A} \mathbf{p} = 0 \\ \Rightarrow & \mathbf{A}^T\mathbf{A} \mathbf{p} = \mathbf{A}^T \mathbf{y} \\ \Rightarrow & \mathbf{p} = (\mathbf{A}^T\mathbf{A})^{-1} \mathbf{A}^T \mathbf{y} \end{align} $$

$(\mathbf{A}^T\mathbf{A})^{-1}$ is called the psuedo inverse $\mathbf{A}$.

2D line fitting - intuition - linear algebra

2D line fitting example 2

In [218]:
# Generate some data
np.random.seed(0)

m_true = .75
c_true = 1

x = np.hstack([np.linspace(3, 5, 5), np.linspace(7, 8, 5)])
y = m_true * x + c_true + np.random.rand(len(x))

#x = np.linspace(1,9,10)
#y = np.ones(10)+5

#y = np.linspace(1,9,10)
#x = np.ones(10)+5

data = np.vstack([x, y]).T
n = data.shape[0]

plt.figure(figsize=(7,7))
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Generating some points from a line')
Out[218]:
Text(0.5, 1.0, 'Generating some points from a line')
In [219]:
print(f'data =\n{data}')
n = data.shape[0]
print(f'Number of data items = {n}')
data =
[[3.         3.7988135 ]
 [3.5        4.34018937]
 [4.         4.60276338]
 [4.5        4.91988318]
 [5.         5.1736548 ]
 [7.         6.89589411]
 [7.25       6.87508721]
 [7.5        7.516773  ]
 [7.75       7.77616276]
 [8.         7.38344152]]
Number of data items = 10
In [220]:
A = np.ones([n, 2])
A[:,0] = data[:,0]
print(f'A =\n{A}')
A =
[[3.   1.  ]
 [3.5  1.  ]
 [4.   1.  ]
 [4.5  1.  ]
 [5.   1.  ]
 [7.   1.  ]
 [7.25 1.  ]
 [7.5  1.  ]
 [7.75 1.  ]
 [8.   1.  ]]
In [221]:
AtA = np.dot(A.T,A)
print(f'AtA =\n {AtA}')
AtA =
 [[364.375  57.5  ]
 [ 57.5    10.   ]]
In [222]:
AtY = np.dot(A.T, data[:,1].reshape(n,1))
print(f'AtY = \n{AtY}')
AtY = 
[[366.83013724]
 [ 59.28266283]]
In [223]:
# m = slope
# c = intercept
m_and_c = np.dot(np.linalg.inv(AtA), AtY)

m = m_and_c[0]
c = m_and_c[1]

print(f'm = {m}')
print(f'c = {c}')
m = [0.76903188]
c = [1.50633297]
In [224]:
x_plotting = np.linspace(0,10,100)
y_plotting = m * x_plotting + c

plt.figure(figsize=(7,7))
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.plot(x_plotting, y_plotting, 'b-')
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Line fitted to data using least squares.');

Total least squares

2D line fitting example

Solving for $a$, $b$, and $c$

Minimize $$ \epsilon = \sum_{i=1}^n (a x_i + b y_i + c)^2\mathrm{\ s.t.\ } (a,b)^T(a,b) = 1 $$

From calculus $$ \frac{\partial \epsilon}{\partial c} = 2 \sum_{i=1}^n (a x_i + b y_i + c). $$ Setting $$ \frac{\partial \epsilon}{\partial c} = 0, $$ we get $$ \begin{align} & \sum_{i=1}^n (a x_i + b y_i + c) = 0 \\ \Rightarrow & a \sum_{i=1}^n x_i + b \sum_{i=1}^n y_i + n c = 0 \\ \Rightarrow & c = - \frac{a \sum_{i=1}^n x_i}{n} - \frac{b \sum_{i=1}^n y_i}{n}\\ \Rightarrow & c = - a \langle x \rangle - b \langle y \rangle \end{align} $$

Substituting $c$ back in $\epsilon$ $$ \begin{align} \epsilon &= \sum_{i=1}^n \left[a x_i + b y_i + \left(- a \langle x \rangle - b \langle y \rangle\right)\right]^2 \\ &= \sum_{i=1}^n \left[ a \left( x_i - \langle x \rangle \right) + b \left( y_i - \langle y \rangle \right) \right]^2 \\ &= \| \left[ \begin{array}{cc} x_1 - \langle x \rangle & y_1 - \langle y \rangle \\ x_2 - \langle x \rangle & y_2 - \langle y \rangle \\ \vdots & \vdots \\ x_n - \langle x \rangle & y_n - \langle y \rangle \\ \end{array} \right] \left[ \begin{array}{c} a \\ b \end{array} \right] \|^2 \\ &= \| \mathbf{A} \mathbf{p} \|^2 \end{align} $$

Note that this is not the same $\mathbf{A}$ as the one that we constructed for linear least squares.

We can thus re-write our problem as $$ \DeclareMathOperator*{\argmin}{arg\,min} \mathbf{p}^* = \argmin_\mathbf{p} \| \mathbf{A} \mathbf{p} \|^2 \mathrm{\ s.t.\ }\mathbf{p}^T\mathbf{p} = 1. $$

Solution is the eigenvector corresponding to the smallest eigenvalues of $\mathbf{A}^T\mathbf{A}$.

2D line fitting example 3

In [225]:
# Generate some data
np.random.seed(0)

m_true = .75
c_true = 1

x = np.hstack([np.linspace(3, 5, 5), np.linspace(7, 8, 5)])
y = m_true * x + c_true + np.random.rand(len(x))\

#x = np.linspace(1,9,10)
#y = np.ones(10)+5

#y = np.linspace(1,9,10)
#x = np.ones(10)+5

data = np.vstack([x, y]).T
n = data.shape[0]

plt.figure(figsize=(7,7))
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Generating some points from a line')
Out[225]:
Text(0.5, 1.0, 'Generating some points from a line')
In [226]:
print(f'x =\n {data[:,0].reshape(n,1)}')
x =
 [[3.  ]
 [3.5 ]
 [4.  ]
 [4.5 ]
 [5.  ]
 [7.  ]
 [7.25]
 [7.5 ]
 [7.75]
 [8.  ]]
In [227]:
mean_x = np.mean(data[:,0])
print(f'mean_x = {mean_x}')
mean_x = 5.75
In [228]:
print(f'y =\n {data[:,1].reshape(n,1)}')
y =
 [[3.7988135 ]
 [4.34018937]
 [4.60276338]
 [4.91988318]
 [5.1736548 ]
 [6.89589411]
 [6.87508721]
 [7.516773  ]
 [7.77616276]
 [7.38344152]]
In [229]:
mean_y = np.mean(data[:,1])
print(f'mean_x = {mean_y}')
mean_x = 5.928266283314543
In [230]:
A = np.hstack([(x-mean_x).reshape(n,1), (y-mean_y).reshape(n,1)])
print(f'A =\n{A}')
A =
[[-2.75       -2.12945278]
 [-2.25       -1.58807692]
 [-1.75       -1.32550291]
 [-1.25       -1.0083831 ]
 [-0.75       -0.75461148]
 [ 1.25        0.96762783]
 [ 1.5         0.94682093]
 [ 1.75        1.58850672]
 [ 2.          1.84789648]
 [ 2.25        1.45517524]]
In [231]:
AtA = np.dot(A.T, A)
print(f'AtA =\n{AtA}')
AtA =
[[33.75       25.95482595]
 [25.95482595 20.28817379]]
In [232]:
e, v = np.linalg.eig(AtA)
print(f'eigenvalues = \n{e}')
print(f'eigenvectors = \n{v}')
eigenvalues = 
[53.83248246  0.20569134]
eigenvectors = 
[[ 0.79089443 -0.61195261]
 [ 0.61195261  0.79089443]]
In [233]:
smallest_eigen_value_index = np.argmin(e)
print(f'Index for smallest eigenvalue = {smallest_eigen_value_index}')
Index for smallest eigenvalue = 1
In [234]:
p = v[:, smallest_eigen_value_index]
print(f'Eigenvector corresponding to the smallest eigenvector =\n{p}')
Eigenvector corresponding to the smallest eigenvector =
[-0.61195261  0.79089443]
In [235]:
a = p[0]
b = p[1]
c = - a * mean_x - b * mean_y

print(f'a = {a}, b={b}, and c={c}')
a = -0.6119526118929433, b=0.7908944308802566, and c=-1.1699052698642456
In [236]:
plotting_x = np.linspace(0, 10, 100)
plotting_y = np.linspace(0, 10, 100)
xx, yy = np.meshgrid(plotting_x, plotting_y)
z = a * xx + b * yy + c

fig = plt.figure(figsize=(7,7))
plt.imshow(z, origin='lower', extent=(0,10,0,10))
plt.colorbar()
plt.contour(xx, yy, z, [0])
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Line fitted using total least squares')
Out[236]:
Text(0.5, 1.0, 'Line fitted using total least squares')

Aside: Singular Value Decomposition (SVD)

SVD of a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ is defined as $$ \mathbf{A} = \mathbf{U} \mathbf{D} \mathbf{V}^T $$ such that $\mathbf{U}$ is an $m \times n$ matrix with orthognal columns, $\mathbf{D}$ is an $n \times n$ diagonal matrix containing singular values, and $\mathbf{V}$ is an $n \times n$ orthognal matrix.

The column of $\mathbf{V}$ corresponding to the smallest singular value in $\mathbf{D}$ is the smallest eigenvector of $\mathbf{A}^T \mathbf{A}$.

In [ ]: