MATLAB: LSQNONLIN performances varying according to the input data structure in the residual computation function.

'trust-region-reflective'bundle adjustmentlevenberg marquardtlsqnonlinoptimizationoptimoptions

Hi everybody, I'm puzzled with the behaviour of lsqnonlin using the default finite differences algorithm. I am writing a bundle adjustment algorithm that takes as input the following parameters:
  • 3D locations of points observed by a set of cameras
  • orientation (3 euler angles) and translation of the cameras
  • feature coordinates of the observed points
I have defined the array X of parameters to be optimized in two different ways. The first is the following
x1 x2 x3 . . . . . xn phi1 theta1 psi1 tx1 phi1 theta1 psi1 tx2 ...
X = y1 y2 y3 . . . . . xn 0 0 0 ty1 0 0 0 ty2 ...
z1 z2 z3 . . . . . zn 0 0 0 tz1 0 0 0 tz2 ...
The second one (which should be more correct)
X = x1 y1 z1 x2 y2 z2 x3 y3 z3 ... phi1 theta1 psi1 tx1 ty1 tz1 phi2 theta2 psi2 tx2 ty2 tz2 ...
Now for every observation the vector of residuals is computed as
function Res=res_comp(X, PARAMS)
for i=1:nObservations
Res(i) = reprojectionError(obs, ... % pick from PARAMS
point, ... % pick from X
T) % pick from X
end
end
where obs is {u,v} with the coordinates of the image feature correspondent to the 3D point {xi, yi, zi}. T is the transformation matrix from the world reference to the camera reference.
Now the problem:
  • when solving with lsqnonlin and 'trust region reflective' using the first parametrization, everything goes well and the solution converges after approx 20 iterations.
  • when solving with lsqnonlin and 'trust region reflective' using the second parametrization, the solution does not converge to a global minimum.
  • when solving with lsqnonlin and 'Levenberg-Marquardt' using the second parametrization, the solution converges after 5/6 iterations
I am absolutely sure that the computation of the residuals in all cases is correct. Why the hell the performances of the optimization change so much just after changing the shape of the parameters to be optimized? I know that it implies a different computation of the Jacobian, but I don't understand why the difference would be so substantial.
Thank you.
Riccardo
——–
PhD candidate at CISAS, University of Padova, Italy

Best Answer

  • This is just a reflection of fact in the world of optimization. Of a concept called a basin of attraction. And of the curse of dimensionality.
    Optimization methods are simple things really. They wander around some complex, high dimensional surface, trying to find a low point on that surface. At any point in their search, all they can do is remember what they saw before about the shape of the surface, and get a measure of what direction points downhill from the local gradient.
    On a complicated function, there will often be MANY local minima. Some may lie at infinity, so when the optimizer starts chasing such a minimizer, it diverges. But you need to understand that given any local minimizer, there is a locus of points such that when the optimizer starts from those points, it converges to this given minimizer. This is called a basin of attraction. These basins need not be convex sets. In fact, they are not always even connected sets.
    Different optimizers/optimization algorithms have different basins of attraction. This should be obvious. Not all solvers will converge to the same solution from a given point.
    Finally, when you work in a high number of dimensions, the search space gets very complicated. It is easy to find something if you need look in only one dimension, just as it is easy to overlook something in 15 or 20 dimensions or more.