PukiWiki contents have been moved into SONOTS Plugin (20070703)

Project: Shape and motion from image streams: A factorization method

Table of Contents


The structure from motion - recovering scene geometry and camera motion from a sequence of images - is an important task and has wide applicability in many tasks, such as navigation and robot manipulation. Tomasi and Kanade [1] first developed a factorization method to recover shape and motion under an orthographic projection model, and obtained robust and accurate results. Poelman and Kanade [2] have extended the factorization method to paraperspective projection. Triggs [3] further extended the factorization method to fully perspective projection. This method recovers a consistent set of projective depths (projective scale factors) for the image points.

In this project, we implemented these three factorization methods, and comparisons are shown.

The report is available at fileFactorization.pdf

1st frame50th100th


First Edition: Nov 2006. Last Modified Nov 2006
Tag: Scientific ComputerVision SfM Matlab

Factorization Method

Orthographic Case [1]

Given an image stream, suppose that we have tracked P feature points over F frames. We then trajectories of image coordinates \{ (u_{fp}, v_{fp}) | f = 1, ..., F, p = 1, ..., P \}. We write the horizontal feature coordinates u_{fp} into an F \time P matrix \mathbf{U}. Similarly, an F \time P matrix \mathbf{V} is built from the vertical coordinates v_{fp}.

Define the measurement matrix of size 2F \time P

\mathbf{W} = \[ \frac{\mathbf{U}}{\mathbf{V}} \].

Compute the 2F \time 1 translation vector \mathbf{t} whose each entry is the mean of the entries in the same row of the measurement matrix \mathbf{W}.

Define the registered measument matrix

\tilde{\mathbf{W}} = \mathbf{W} - \mathbf{t} [1 .. 1]

The \tilde{\mathbf{W}} can be expressed in a matrix form:

\tilde{\mathbf{W}} = \mathbf{R} \mathbf{S},

where the matrix of size 2F \time 3

\mathbf{R} = \[\mathbf{i}_1^T \\\vdots \\\mathbf{i}_F^T \\\mathbf{j}_1^T \\\vdots \\\mathbf{j}_F^T \]

represents the camera rotation, and the matrix of size 3 \time P

\mathbf{S} = [\mathbf{s}_1  ... \mathbf{s}_P]

is the shape matrix. The rows of \mathbf{R} represent the orientations of the horizontal and vertical camera reference axes throughout the stream, while the colums of \mathbf{S} are the coordinates of the P feature points with respect to their centroid. The goal of the factorization method is to compute the matrices \mathbf{R} and \mathbf{S}.

Step 1

Compute the singular-value decomposition

\tilde{\mathbf{W}} = \mathbf{O_1} \mathbf{\Sigma} \mathbf{O_2}.

Step 2

Define \mathbf{O_1}' which is the first three columns of \mathbf{O_1}, \mathbf{\Sigma}' which is the first 3 \time 3 submatrix of \mathbf{\Sigma}, and \mathbf{O_2}' which is the first three rows of \mathbf{O_2}.

Define \hat{R} = \mathbf{O_1}'(\mathbf{\Sigma}')^{1/2} and \hat{S} = (\mathbf{\Sigma}')^{1/2}\mathbf{O_2}'.

Step 3 - Metric Constraints

Impose the metric constraints,

\hat{\mathbf{i}}_f^T Q Q^T  \hat{\mathbf{i}}_f = 1, \ \ \ \hat{\mathbf{j}}_f^T Q Q^T  \hat{\mathbf{j}}_f = 1, \ \ \ \hat{\mathbf{i}}_f^T Q Q^T  \hat{\mathbf{j}}_f = 0

where Q is a 3 \time 3 matrix, and \hat{\mathbf{i}}_f^T, \hat{\mathbf{j}}_f^T are elements of \hat{\mathbf{R}}.

To solve Q, we can

  • use Newton's method, or
  • define L = Q Q^T and solve the linear system of equations for L and use Cholesky decomposition to get Q.

Step 3.1 - Solving the linear system

We used the 2nd way. Morita and Kanade [4] explains the steps to compute Q. By denoting

L = \[ I_1 I_2 I_3 \\ I_2 I_4 I_5 \\ I_3 I_5 I_6 \],

the quadratic system can be rewritten as

GI = c,

where the 3F \time 6 matrix G, the 6 \time 1 vector I, and the 3F \time 1 vector c are defined by

G = \[ \mathbf{g}^T(\mathbf{i}_1, \mathbf{i}_1) \\\vdots \\\mathbf{g}^T(\mathbf{i}_F, \mathbf{i}_F) \\\mathbf{g}^T(\mathbf{j}_1, \mathbf{j}_1) \\\vdots \\\mathbf{g}^T(\mathbf{j}_F, \mathbf{j}_F) \\\mathbf{g}^T(\mathbf{i}_1, \mathbf{j}_1) \\\vdots \\\mathbf{g}^T(\mathbf{i}_F, \mathbf{j}_F) \], I = \[ I_1    \\\vdots \\I_6 \], c = \[1 \\\vdots \\\vdots \\\vdots \\\vdots \\\vdots \\1 \\0 \\\vdots \\0 \].

where c has 2F ones and F zeros, and

\mathbf{g}(\mathbf{a}, \mathbf{b}) = \[a_1 b_1 \\a_1 b_2 + a_2 b_1 \\a_1 b_3 + a_3 b_1 \\a_2 b_2 \\a_2 b_3 + a_3 b_2 \\a_3 b_3 \].

The simplest solution of the system is given by the pseudo inverse method, such that

I = G^* c.

The vector I determines the symmetric matrix L. The Cholesky decomposition or eigen decomposition or square root of a matrix of L gives Q. as long as the L is positive definite.

Step 3.2- Enforcing positive definiteness

We might obtain a matrix that is not positive definite when we estimeated L using the linear method. Higham [5] introduced a method computing a 'nearest' symmetric positive semidefinite matrix from a matrix. The lecture note [6] gives the easy notation.

First, we compute the symmetrix matrix as

L = \frac{L + L^T}{2}.

In our case, the matrix L is already a symmetric matrix. Now we eigen decompose L:

L = \mathbf{U} \mathbf{\Sigma} \mathbf{U}^T

and form the matrix \mathbf{\Sigma_+} by setting any negative eigenvalues to zero. The positive semidefinite matrix that is closed to L is then given by

L_{psd} = \mathbf{U} \mathbf{\Sigma_+} \mathbf{U}^T.

However, we want a positive definite matrix. In this case, we set any nagative eigenvalues to \epsilon > 0. The reason is apparent from the definition of the positive definite matrix. Furthermore, we want the matrix Q, it can be simply obtained as \mathbf{U} \mathbf{\Sigma_+}^{1/2} without computing L.

As a result,

  1. Eigen decompose L = \mathbf{U} \mathbf{\Sigma} \mathbf{U}^T
  2. Form a matrix \mathbf{\Sigma_+} by setting any negative values to \epsilon > 0,
  3. Compute Q = \mathbf{U} \mathbf{\Sigma_+}^{1/2}

Step 4

Compute the rotation matrix \mathbf{R} and the shape matrix \mathbf{S} as

\mathbf{R} = \hat{\mathbf{R}}Q, \ \ \ \mathbf{S} = Q^{-1}\hat{\mathbf{S}}.

Step 5

Align the first camera reference system with world reference system by forming the products

\mathbf{R} = \mathbf{R}\mathbf{R_0}, \ \ \ \mathbf{S} = \mathbf{R_0}^T \mathbf{S},

where the orthonomal matrix \mathbf{R_0} = [\mathbf{i_1} \mathbf{j_1} \mathbf{k_1}] rotates the first camera reference system into the indentity matrix.

Paraperspective case [2]

Only the Step 3 - Metric Constraints - is different with the orthographic case.

Let \mathbf{M} be the estimated motion matrix

\mathbf{M} = \[\mathbf{m}_1^T \\\vdots \\\mathbf{m}_F^T \\\mathbf{n}_1^T \\\vdots \\\mathbf{n}_F^T \]

Let T be the translation vector

\mathbf{T} = \[x_1 \\\vdots \\x_F \\y_1 \\\vdots \\y_F \]

The metric constraints are as followings:

\frac{|\mathbf{m}_f|^2}{1+x_f^2} - \frac{|\mathbf{n}_f|^2}{1+y_f^2} = 0
\mathbf{m}_f \cdot \mathbf{n}_f - x_f y_f \frac{1}{2} \( \frac{|\mathbf{m}_f|^2}{1+x_f^2} + \frac{|\mathbf{n}_f|^2}{1+y_f^2} \) = 0
| \mathbf{m}_1 | = 1.

We impose

|\mathbf{m}_f|^2 = \hat{\mathbf{m}}_f^T Q Q^T \hat{\mathbf{m}}_f, \ \ \ |\mathbf{n}_f|^2 = \hat{\mathbf{n}}_f^T Q Q^T \hat{\mathbf{n}}_f, \ \ \ \mathbf{m}_f \cdot \mathbf{n}_f = \hat{\mathbf{m}}_f^T Q Q^T \hat{\mathbf{n}}_f

where Q is a 3 \time 3 matrix, and \hat{\mathbf{m}}_f, \hat{\mathbf{n}}_f are elements of the estimated motion matrix \hat{\mathbf{M}}. The matrix Q is solvable by the same way with the orthographic case.

Projective case [3]

See the report fileFactorization.pdf

Source Codes

Orthographic and Paraperspective

option 'orthographic' and 'paraperspective'


For projective case, a matlab codes set is provided by Bill Triggs - Software. Furthermore, you may find normalise2dpts.m which is provided by Peter Kovesi is useful.

Data Preparation - KLT Feature Tracker

I used klt library [1] version 1.3.2. Extraction of Feature Tracking Points.


Download [2].

The klt library supports pgm files, thus convert png files into pgm. You can do it as

mogrify -format pgm *.png

if you have Linux-like environments (I worked with cygwin on windows).

Source codes

Compile kltrun.c as Makefile does.

make # compile klt library if not yet
gcc -O3 -DNDEBUG  -o kltrun kltrun.c -L. -lklt -L/usr/local/lib -L/usr/lib -lm

How to use

Usage: ./kltrun  <basename_of_pgm_files>  [-fmt <pgm_sequence_format = %d>]  
           [-ef <index_of_end_frame = 0>]  [-np <#_of_tracking_points = 430>]  
           [-sf <index_of_start_frame = 0>]
Ex) ./kltrun ../hotel/hotel.seq -fmt %d -ef 100 -np 430 -sf 0
Ex) ./kltrun ../castle/castle. -fmt %03d -ef 27 -np 110 -sf 0
Ex) ./kltrun ../medusa/medusa -fmt %03d -sf 110 -ef 180 -np 830

Example of cvuKltPlot.m


Experiments and Results

Orthographic case

>> cvFactorizationDemo(1)
1st frame50th100th


Paraperspective case

Unfortunately, the paraperspective project method failed for this particular image sequence. The G = Q*Qt matrix in equation (3) (see the Factorization method section) turned out to be non-positive definite and thus we could not simply calculate the Jacobi Transformation and take the square root. One can solve the problem by means of an iterative method called Newton's method [4]. For time constrained reasons, we did not explore this option in the project. Moreover, we also encountered that the image sequence does not contain enough translation along the optical axis or it contains too much measurement noise.

>> cvFactorizationDemo(3)

See report fileFactorization.pdf

Projective case

>> cvProjectiveFactorizationDemo

See report fileFactorization.pdf