Introduction to Factor Graph

Here give an introduction to non-linear least squares and the connection between sparse least squares and factor graphs.

Non-linear Least Square Optimization

Non-linear least squares optimization is defined by

(1)\[x^{*} = \operatorname*{argmin}_{x} \sum_i \rho_i \big( \parallel f_i(x) \parallel^{2}_{{\Sigma}_i} \big),\]

where \(x \in \mathcal{M}\) is a point on a general \(n\)-dimensional manifold, \(x^{*} \in \mathcal{M}\) is the solution, \(f_i \in \mathbb{R}^m\) is a \(m\)-dimensional vector-valued error function, \(\rho_i\) is a robust kernel function, and \({\Sigma}_i \in \mathbb{R}^{m \times m}\) is a covariance matrix. The Mahalanobis distance is defined by \(\parallel v \parallel^{2}_{{\Sigma}} \doteq v^T {\Sigma}^{-1} v\) where \(v \in \mathbb{R}^m\) and \({\Sigma}^{-1}\) is the information matrix. If we factorize the information matrix by Cholesky factorization \({\Sigma}^{-1} = {R}^T {R}\), where \({R}\) is upper triangular, we have

(2)\[\parallel v \parallel^{2}_{{\Sigma}} = v^T {\Sigma}^{-1} v = v^T {R}^T {R} v = \parallel {R} v \parallel^{2}.\]

If we consider the simplified case where \(\rho_i\) is identity and define \(h_i(x) \doteq {R}_i f_i(x)\), then Eq. (1) is equivalent to

(3)\[x^{*} = \operatorname*{argmin}_{x} \sum_i \parallel h_i(x) \parallel^{2}\]

as per Eq. (2). If we define a linearization point \(x_0 \in \mathcal{M}\), and the Jacobian matrix of \(h_i(x)\)

(4)\[{J}_i \doteq \frac{\partial h_i(x)}{\partial x} \Big|_{x=x_0}\]

then the Taylor expansion is given by

(5)\[h_i(x_0 + \Delta x) = h_i(x_0) + {J}_i \Delta x + O(\Delta x^2),\]

which we can use to solve the least square problem by searching a local region near \(x_0\), and find the solution by iteratively solving a linearized least squares problem

(6)\[\Delta x^{*} = \operatorname*{argmin}_{\Delta x} \sum_i \parallel {J}_i \Delta x + h_i(x_0) \parallel^{2},\]

where \(\Delta x^{*} \in \mathbb{R}^n\), and the solution is updated by

(7)\[x^{*} = x_0 + \Delta x^{*}.\]

If \(\mathcal{M}\) is simply a vector space \(\mathbb{R}^n\) then the above procedure is performed iteratively in general by setting \(x_0\) of next iteration from \(x^*\) of current iteration, until \(x^*\) converges. Trust-region policies like Levenberg-Marquardt can be also applied when looking for \(\Delta x^{*}\).

When \(\mathcal{M}\) is a general manifold, we need to define a local coordinate chart of \(\mathcal{M}\) near \(x_0\), which is an invertible map between a local region of \(\mathcal{M}\) around \(x_0\) and the local Euclidean space, and also an operator \(\oplus\) that maps a point in local Euclidean space back to \(\mathcal{M}\). Thus Eq. (7) on general manifolds is

(8)\[x^{*} = x_0 \oplus \Delta x^{*}.\]

A simple example of \(\oplus\) is for the Euclidean space where it is simply the plus operator.

To solve the linear least squares problem in Eq. (6), we first rewrite Eq. (6) as

(9)\[\Delta x^{*} = \operatorname*{argmin}_{\Delta x} \parallel {J} \Delta x + b \parallel^{2},\]

where \(J\) is defined by stacking all \(J_i\) vertically, similarly \(b\) is defined by stacking all \(h_i(x_0)\) vertically. Cholesky factorization is commonly used solve Eq. (9). Since the solution of linear least squares problem in Eq. (9) is given by the normal equation

(10)\[J^T J \Delta x^{*} = J^T b,\]

we apply Cholesky factorization to symmetric \(J^T J\), and we have \(J^T J = R^T R\) where \(R\) is upper triangular. Then solving Eq. (10) is equivalent to solving both

(11)\[\begin{split}&R^T y = J^T b\\ &R \Delta x^{*} = y\end{split}\]

in two steps, which can be both solved by back-substitution given that \(R\) is triangular. Other than Cholesky factorization, QR and SVD factorizations can be also used to solve Eq. (9), although with significantly slower speeds. Iterative methods like pre-conditioned conjugate gradient (PCG) are also widely used to solve Eq. (10), especially when \(J^T J\) is very large.

Connection between Factor Graphs and Sparse Least Squares

In previous literature [1] researchers have shown factor graphs have a tight connections with non-linear least square problems. A factor graph is a probabilistic graphical model, which represents a joint probability distribution of all factors

(12)\[p(x) \propto \prod_i p_i(x_i),\]

where \(x_i \subseteq x\) is a subset of variables involved in factor \(i\), \(p(x)\) is the overall distribution of the factor graph, and \(p_i(x_i)\) is the distribution of each factor. The maximum a posteriori (MAP) estimate of the graph is

(13)\[x^{*} = \operatorname*{argmax}_{x} p(x) = \operatorname*{argmax}_{x} \prod_i p_i(x_i).\]

If we consider the case where each factor has Gaussian distribution on \(f_i(x_i)\) with covariance \(\Sigma_i\),

(14)\[p_i(x_i) \propto \mathrm{exp} \big( - \frac{1}{2} \parallel f_i(x_i) \parallel^{2}_{{\Sigma}_i} \big),\]

then MAP inference is

(15)\[\begin{split}x^{*} & = \operatorname*{argmax}_{x} \prod_i p_i(x_i) = \operatorname*{argmax}_{x} \mathrm{log} \big( \prod_i p_i(x_i) \big), \\ & = \operatorname*{argmin}_{x} \prod_i -\mathrm{log} \big( p_i(x_i) \big) = \operatorname*{argmin}_{x} \sum_i \parallel f_i(x_i) \parallel^{2}_{{\Sigma}_i}.\end{split}\]

The MAP inference problem in Eq. (15) is converted to the same non-linear least squares optimization problem in Eq. (1), which can be solved following the same steps in section before.

There are several advantages of using factor graph to model the non-linear least squares problem in SLAM. Factor graphs encode the probabilistic nature of the problem, and easily visualize the underlying sparsity of most SLAM problems since for most (if not all) factors \(x_i\) are very small sets.

References

[1]Dellaert, F., & Kaess, M. (2006). Square Root SAM: Simultaneous localization and mapping via square root information smoothing. The International Journal of Robotics Research, 25(12), 1181-1203.