# Least squares¶

Faisal Qureshi
Professor
Faculty of Science
Ontario Tech University
http://vclab.science.ontariotechu.ca

In [208]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm


## Lesson plan¶

• Linear regression
• Least squares
• Total least squares

## Least squares¶

### 2D line fitting example¶

Sum of square differences is

$$\epsilon = \sum_{i=1}^n (y_i - m x_i - c)^2$$

We want to find values of slope $m$ and intercept $c$ so as to minimize the sum of squared differences $\epsilon$. Note that $\epsilon$ is a quadratic function (bowl shaped).

$$y = f(x) = (x + 4)^2$$
In [209]:
plotting_x = np.linspace(-10,10,100)

plt.figure(figsize=(5,5))
plt.yticks([0])
plt.xticks([-4])

Out[209]:
([<matplotlib.axis.XTick at 0x1210e9e10>],
<a list of 1 Text major ticklabel objects>)

The derivation of $f(x)$ is $$\frac{d y}{d x} = 2(x + 4)$$

By setting $$\frac{d y}{d x} = 0$$ we can find the value of $x$ that minimizes function $f(x)$. In this case it is $x=-4$.

In [210]:
deriv_quadratic_fn_1d = 2*(plotting_x + 4)

plt.figure(figsize=(5,5))
plt.plot(plotting_x, quadratic_fn_1d, label='$f(x)$')
plt.plot(plotting_x, deriv_quadratic_fn_1d, label="$f'(x)$")
plt.legend()
plt.yticks([0])
plt.xticks([-4])

Out[210]:
([<matplotlib.axis.XTick at 0x11f2d1f10>],
<a list of 1 Text major ticklabel objects>)

In [211]:
plotting_x = np.linspace(-2,2,11)
plotting_y = np.linspace(-2,2,11)

xx, yy = np.meshgrid(plotting_x, plotting_y)

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(14,6))
plt.xlabel('x')
plt.ylabel('y')
plt.xticks([])
plt.yticks([])
plt.subplot(122)
plt.xlabel('x')
plt.ylabel('y')
plt.xticks([0.0])
plt.yticks([0.0])


### Solving for $m$ and $c$.¶

$$\DeclareMathOperator*{\argmin}{arg\,min} m^*,c^* = \argmin_{m,c} \sum_{i=1}^n \left(y_i - m x_i - c \right)^2$$

We can achieve this by setting $$\frac{\partial \epsilon}{\partial m} = 0$$ and $$\frac{\partial \epsilon}{\partial c} = 0.$$

From calculus, we know that $$\frac{\partial \epsilon}{\partial m} = - \sum_{i=1}^n 2 x_i \left(y_i - m x_i - c \right)$$ and $$\frac{\partial \epsilon}{\partial c} = - \sum_{i=1}^n 2 \left(y_i - m x_i - c \right).$$

Setting these to 0, we get $$\sum_{i=1}^n 2 x_i \left(y_i - m x_i - c \right) = 0$$ and $$\sum_{i=1}^n 2 \left(y_i - m x_i - c \right) = 0.$$

Let $$\langle y \rangle = \frac{1}{n} \sum_{i=1}^n y_i$$ and $$\langle x \rangle = \frac{1}{n} \sum_{i=1}^n x_i.$$

We can then re-write equation 2 above as $$c = \langle y \rangle - m \langle x \rangle.$$

Re-write equation 1 above. \begin{align} & \sum_{i=1}^n x_i y_i - m \sum_{i=1}^n x_i^2 - c \sum_{i=1}^n x_i = 0 \\ \Rightarrow & \frac{1}{n} \sum_{i=1}^n x_i y_i - \frac{m}{n} \sum_{i=1}^n x_i^2 - \frac{c}{n} \sum_{i=1}^n x_i = 0 \\ \Rightarrow & \langle x y \rangle - m \langle x^2 \rangle = c \langle x \rangle \end{align} where $$\langle x y \rangle = \frac{1}{n} \sum_{i=1}^n x_i y_i$$ and $$\langle x^2 \rangle = \frac{1}{n} \sum_{i=1}^n x_i^2.$$

Substitute the value of $c$ \begin{align} & \langle x y \rangle - m \langle x^2 \rangle = \langle x \rangle \langle y \rangle - m \langle x \rangle^2 \\ \Rightarrow & m \left( \langle x \rangle^2 - \langle x^2 \rangle \right) = \langle x \rangle \langle y \rangle - \langle x y \rangle \\ \Rightarrow & m = \frac{\langle x \rangle \langle y \rangle - \langle x y \rangle}{\langle x \rangle^2 - \langle x^2 \rangle} \end{align}

Use this value of $m$ to solve for $c$.

### 2D line fitting example 1¶

In [238]:
# Generate some data
np.random.seed(0)

m_true = .75
c_true = 1

x = np.hstack([np.linspace(3, 5, 5), np.linspace(7, 8, 5)])
y = m_true * x + c_true + np.random.rand(len(x))
data = np.vstack([x, y]).T
n = data.shape[0]

plt.figure(figsize=(7,7))
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Generating some points from a line');

In [213]:
x = data[:,0]
y = data[:,1]

print(f'x =\n{x.reshape(n,1)}')
print(f'y =\n{y.reshape(n,1)}')

x =
[[3.  ]
[3.5 ]
[4.  ]
[4.5 ]
[5.  ]
[7.  ]
[7.25]
[7.5 ]
[7.75]
[8.  ]]
y =
[[3.7988135 ]
[4.34018937]
[4.60276338]
[4.91988318]
[5.1736548 ]
[6.89589411]
[6.87508721]
[7.516773  ]
[7.77616276]
[7.38344152]]

In [214]:
mean_x = np.mean(x)
mean_y = np.mean(y)
mean_x2 = np.mean(np.square(x))
mean_xy = np.mean(x * y)

print(f'<x> = {mean_x}')
print(f'<y> = {mean_y}')
print(f'<x2> = {mean_x2}')
print(f'<xy> = {mean_xy}')

<x> = 5.75
<y> = 5.928266283314543
<x2> = 36.4375
<xy> = 36.68301372420285

In [215]:
m = (mean_x * mean_y - mean_xy) / (mean_x**2 - mean_x2)
print(f'm = {m}')

m = 0.7690318800427336

In [216]:
c = mean_y - mean_x * m
print(f'c = {c}')

c = 1.506332973068825

In [217]:
x_plotting = np.linspace(0,10,100)
y_plotting = m * x_plotting + c

plt.figure(figsize=(7,7))
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.plot(x_plotting, y_plotting, 'b-')
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Line fitted to data using least squares.')

Out[217]:
Text(0.5, 1.0, 'Line fitted to data using least squares.')

### 2D line fitting - vector notation¶

\begin{align} \epsilon &= \sum_{i=1}^n (y_i - m x_i - c)^2 \\ &= \| \left[ \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right] - \left[ \begin{array}{cc} x_1 & 1\\ x_2 & 1\\ \vdots & \vdots \\ x_n & 1 \end{array} \right] \left[ \begin{array}{c} m\\ c \end{array} \right] \|^2 \\ &= \| \mathbf{y} - \mathbf{A} \mathbf{p} \|^2 \\ &= \mathbf{y}^T\mathbf{y} - 2 ( \mathbf{A} \mathbf{p} )^T \mathbf{y} + (\mathbf{A} \mathbf{p})^T \mathbf{A} \mathbf{p} \end{align}

We know that $$\frac{\partial \epsilon}{\partial \mathbf{p}} = -2 \mathbf{A}^T \mathbf{y} + 2 \mathbf{A}^T \mathbf{A} \mathbf{p}$$

Setting $$\frac{\partial \epsilon}{\partial \mathbf{p}}=0.$$ we get \begin{align} & -2 \mathbf{A}^T \mathbf{y} + 2 \mathbf{A}^T \mathbf{A} \mathbf{p} = 0 \\ \Rightarrow & \mathbf{A}^T\mathbf{A} \mathbf{p} = \mathbf{A}^T \mathbf{y} \\ \Rightarrow & \mathbf{p} = (\mathbf{A}^T\mathbf{A})^{-1} \mathbf{A}^T \mathbf{y} \end{align}

$(\mathbf{A}^T\mathbf{A})^{-1}$ is called the psuedo inverse $\mathbf{A}$.

### 2D line fitting example 2¶

In [218]:
# Generate some data
np.random.seed(0)

m_true = .75
c_true = 1

x = np.hstack([np.linspace(3, 5, 5), np.linspace(7, 8, 5)])
y = m_true * x + c_true + np.random.rand(len(x))

#x = np.linspace(1,9,10)
#y = np.ones(10)+5

#y = np.linspace(1,9,10)
#x = np.ones(10)+5

data = np.vstack([x, y]).T
n = data.shape[0]

plt.figure(figsize=(7,7))
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Generating some points from a line')

Out[218]:
Text(0.5, 1.0, 'Generating some points from a line')
In [219]:
print(f'data =\n{data}')
n = data.shape[0]
print(f'Number of data items = {n}')

data =
[[3.         3.7988135 ]
[3.5        4.34018937]
[4.         4.60276338]
[4.5        4.91988318]
[5.         5.1736548 ]
[7.         6.89589411]
[7.25       6.87508721]
[7.5        7.516773  ]
[7.75       7.77616276]
[8.         7.38344152]]
Number of data items = 10

In [220]:
A = np.ones([n, 2])
A[:,0] = data[:,0]
print(f'A =\n{A}')

A =
[[3.   1.  ]
[3.5  1.  ]
[4.   1.  ]
[4.5  1.  ]
[5.   1.  ]
[7.   1.  ]
[7.25 1.  ]
[7.5  1.  ]
[7.75 1.  ]
[8.   1.  ]]

In [221]:
AtA = np.dot(A.T,A)
print(f'AtA =\n {AtA}')

AtA =
[[364.375  57.5  ]
[ 57.5    10.   ]]

In [222]:
AtY = np.dot(A.T, data[:,1].reshape(n,1))
print(f'AtY = \n{AtY}')

AtY =
[[366.83013724]
[ 59.28266283]]

In [223]:
# m = slope
# c = intercept
m_and_c = np.dot(np.linalg.inv(AtA), AtY)

m = m_and_c[0]
c = m_and_c[1]

print(f'm = {m}')
print(f'c = {c}')

m = [0.76903188]
c = [1.50633297]

In [224]:
x_plotting = np.linspace(0,10,100)
y_plotting = m * x_plotting + c

plt.figure(figsize=(7,7))
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.plot(x_plotting, y_plotting, 'b-')
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Line fitted to data using least squares.');


## Total least squares¶

### Solving for $a$, $b$, and $c$¶

Minimize $$\epsilon = \sum_{i=1}^n (a x_i + b y_i + c)^2\mathrm{\ s.t.\ } (a,b)^T(a,b) = 1$$

From calculus $$\frac{\partial \epsilon}{\partial c} = 2 \sum_{i=1}^n (a x_i + b y_i + c).$$ Setting $$\frac{\partial \epsilon}{\partial c} = 0,$$ we get \begin{align} & \sum_{i=1}^n (a x_i + b y_i + c) = 0 \\ \Rightarrow & a \sum_{i=1}^n x_i + b \sum_{i=1}^n y_i + n c = 0 \\ \Rightarrow & c = - \frac{a \sum_{i=1}^n x_i}{n} - \frac{b \sum_{i=1}^n y_i}{n}\\ \Rightarrow & c = - a \langle x \rangle - b \langle y \rangle \end{align}

Substituting $c$ back in $\epsilon$ \begin{align} \epsilon &= \sum_{i=1}^n \left[a x_i + b y_i + \left(- a \langle x \rangle - b \langle y \rangle\right)\right]^2 \\ &= \sum_{i=1}^n \left[ a \left( x_i - \langle x \rangle \right) + b \left( y_i - \langle y \rangle \right) \right]^2 \\ &= \| \left[ \begin{array}{cc} x_1 - \langle x \rangle & y_1 - \langle y \rangle \\ x_2 - \langle x \rangle & y_2 - \langle y \rangle \\ \vdots & \vdots \\ x_n - \langle x \rangle & y_n - \langle y \rangle \\ \end{array} \right] \left[ \begin{array}{c} a \\ b \end{array} \right] \|^2 \\ &= \| \mathbf{A} \mathbf{p} \|^2 \end{align}

Note that this is not the same $\mathbf{A}$ as the one that we constructed for linear least squares.

We can thus re-write our problem as $$\DeclareMathOperator*{\argmin}{arg\,min} \mathbf{p}^* = \argmin_\mathbf{p} \| \mathbf{A} \mathbf{p} \|^2 \mathrm{\ s.t.\ }\mathbf{p}^T\mathbf{p} = 1.$$

Solution is the eigenvector corresponding to the smallest eigenvalues of $\mathbf{A}^T\mathbf{A}$.

### 2D line fitting example 3¶

In [225]:
# Generate some data
np.random.seed(0)

m_true = .75
c_true = 1

x = np.hstack([np.linspace(3, 5, 5), np.linspace(7, 8, 5)])
y = m_true * x + c_true + np.random.rand(len(x))\

#x = np.linspace(1,9,10)
#y = np.ones(10)+5

#y = np.linspace(1,9,10)
#x = np.ones(10)+5

data = np.vstack([x, y]).T
n = data.shape[0]

plt.figure(figsize=(7,7))
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Generating some points from a line')

Out[225]:
Text(0.5, 1.0, 'Generating some points from a line')
In [226]:
print(f'x =\n {data[:,0].reshape(n,1)}')

x =
[[3.  ]
[3.5 ]
[4.  ]
[4.5 ]
[5.  ]
[7.  ]
[7.25]
[7.5 ]
[7.75]
[8.  ]]

In [227]:
mean_x = np.mean(data[:,0])
print(f'mean_x = {mean_x}')

mean_x = 5.75

In [228]:
print(f'y =\n {data[:,1].reshape(n,1)}')

y =
[[3.7988135 ]
[4.34018937]
[4.60276338]
[4.91988318]
[5.1736548 ]
[6.89589411]
[6.87508721]
[7.516773  ]
[7.77616276]
[7.38344152]]

In [229]:
mean_y = np.mean(data[:,1])
print(f'mean_x = {mean_y}')

mean_x = 5.928266283314543

In [230]:
A = np.hstack([(x-mean_x).reshape(n,1), (y-mean_y).reshape(n,1)])
print(f'A =\n{A}')

A =
[[-2.75       -2.12945278]
[-2.25       -1.58807692]
[-1.75       -1.32550291]
[-1.25       -1.0083831 ]
[-0.75       -0.75461148]
[ 1.25        0.96762783]
[ 1.5         0.94682093]
[ 1.75        1.58850672]
[ 2.          1.84789648]
[ 2.25        1.45517524]]

In [231]:
AtA = np.dot(A.T, A)
print(f'AtA =\n{AtA}')

AtA =
[[33.75       25.95482595]
[25.95482595 20.28817379]]

In [232]:
e, v = np.linalg.eig(AtA)
print(f'eigenvalues = \n{e}')
print(f'eigenvectors = \n{v}')

eigenvalues =
[53.83248246  0.20569134]
eigenvectors =
[[ 0.79089443 -0.61195261]
[ 0.61195261  0.79089443]]

In [233]:
smallest_eigen_value_index = np.argmin(e)
print(f'Index for smallest eigenvalue = {smallest_eigen_value_index}')

Index for smallest eigenvalue = 1

In [234]:
p = v[:, smallest_eigen_value_index]
print(f'Eigenvector corresponding to the smallest eigenvector =\n{p}')

Eigenvector corresponding to the smallest eigenvector =
[-0.61195261  0.79089443]

In [235]:
a = p[0]
b = p[1]
c = - a * mean_x - b * mean_y

print(f'a = {a}, b={b}, and c={c}')

a = -0.6119526118929433, b=0.7908944308802566, and c=-1.1699052698642456

In [236]:
plotting_x = np.linspace(0, 10, 100)
plotting_y = np.linspace(0, 10, 100)
xx, yy = np.meshgrid(plotting_x, plotting_y)
z = a * xx + b * yy + c

fig = plt.figure(figsize=(7,7))
plt.imshow(z, origin='lower', extent=(0,10,0,10))
plt.colorbar()
plt.contour(xx, yy, z, [0])
plt.scatter(data[:,0], data[:,1], c='red', alpha=0.9)
plt.xlim([0,10])
plt.ylim([0,10])
plt.title('Line fitted using total least squares')

Out[236]:
Text(0.5, 1.0, 'Line fitted using total least squares')

## Aside: Singular Value Decomposition (SVD)¶

SVD of a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ is defined as $$\mathbf{A} = \mathbf{U} \mathbf{D} \mathbf{V}^T$$ such that $\mathbf{U}$ is an $m \times n$ matrix with orthognal columns, $\mathbf{D}$ is an $n \times n$ diagonal matrix containing singular values, and $\mathbf{V}$ is an $n \times n$ orthognal matrix.

The column of $\mathbf{V}$ corresponding to the smallest singular value in $\mathbf{D}$ is the smallest eigenvector of $\mathbf{A}^T \mathbf{A}$.

In [ ]: