Recovering pose from essential matrix gives inconsistent sign of translation vector

Question

I have three views with point correspondences and I want to compute the pose of the camera at the second and the third view. I therefore generate a random dataset (no noise) containing points in the different views with known rotation and translation of the camera at second and third view with respect to the first view. I first generate random 2D points in the first view, then I assign random (positive) depths to obtain the corresponding 3D points and finally use randomly generated rotations and translations to project these 3D points into the second and third view.

First, I compute the trifocal tensor (see Hartley & Zisserman, Multiple View Geometry Chapter 15). Then I follow the approach that is described in this answer to obtain the rotations R_i and the translations t_i of the second and third view.

The calculation always yields the correct rotation, but unfortunately, the sign of the translation vectors is not always correct. t2and t3 have the correct scale, but it happens sometimes (!), when I use a new randomly generated dataset, that the sign is inverted with respect to the ground truth translations, e.g.:

Ground truth:

R2 = [0.9942   -0.0998    0.0393
      0.1069    0.9541   -0.2798
     -0.0096    0.2823    0.9593]
t2 = [0.4267
      0.3747
      0.3544]
R3 = [0.9764   -0.0626    0.2069
      0.1358    0.9222   -0.3622
      -0.1681    0.3817    0.9089]
t3 = [0.3963
      0.0285
      0.2093]

Output of my algorithm (with translation determined up-to-scale):

R2 = [0.994229 -0.0998196  0.0393062
      0.106851   0.954105  -0.279761
     -0.00957664   0.282346   0.959265]
t2 = [-0.637428
      -0.559842
      -0.529398]
R3 = [0.976367 -0.0625748   0.206861
      0.135829    0.92217  -0.362151
      -0.168099    0.38169   0.908876]
t3 = [-0.591991
      -0.0426261
      -0.312637]

Comparing ground truth and my output of t2 and t3, we see that they are identical up-to-scale (and in this example, inverted by sign), i.e. element-wise division of the translation vectors t2./t3 (using matlab notation) from ground truth and my algorithm yield:

for ground truth:
[ 1.0768
 13.1338
  1.6933]

for my algorithm:
[ 1.0768
 13.1338
  1.6933]

My first question is: What could possibly be the cause of this inconsistency of the sign of the translation vectors? (Especially given the fact that the results are correct otherwise).

My second question is: Where do these formulas given in the above linked answer, Step 4, come from? I have the book "Multiple View Geometry" by Hartley & Zisserman, but could not find the described algorithm there.

Here is a code snippet of my implementation of step 4 of the algorithm in the above link (using the Eigen library, I do not want to use OpenCV) for finding the right solution of rotation R and translation vector t from an essential matrix E, given a 3-view homogeneous 2D point correspondence p1, p2, and p3:

getPoseFromEssentialMat(Matrix3d E)
{
  Matrix3d W, U, V;
  W << 0.0, -1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0;

  // calculate the SVD of the essential matrix
  JacobiSVD<MatrixXd> svdE(E, ComputeThinU | ComputeThinV);
  // if det(U) < 0 -> U = -U, if det(V) < 0 -> V = -V
  U = svdE.matrixU();
  if (U.determinant() < 0.0)
  {
    U *= -1.0;
  }
  V = svdE.matrixV();
  if (V.determinant() < 0.0)
  {
    V *= -1.0;
  }

  R = U * W * V.transpose();
  t = U.col(2);

  findCorrectSolution(R, t, W, U, V);
}


findCorrectSolution(Matrix3d& R, Vector3d& t, Matrix3d W, Matrix3d U, Matrix3d V)
{
  MatrixXd P(3, 4); // P = [R | t]
  P.block(0, 0, 3, 3) = R;
  P.col(3) = t;

  Vector3d Rtpt = R.transpose() * t;
  Matrix3d M = crossProductMatrix(Rtpt);
  Vector3d X_1 = M * K_inv_ * p1; // point in 1. view

  Vector3d X_i = M * R.transpose() * K_inv_ * pi; // point in i. view

  if (X_1(2) * X_i(2) < 0.0) // depth components
  {
    R = U * W.transpose() * V.transpose();
    Rtpt = R.transpose() * t;
    M = crossProductMatrix(Rtpt);
    X_1 = M * K_inv_ * p1;
  }
  if (X_1(2) < 0.0) // depth of 1. 3D point
 {
    t = -t;
 }

Recovering pose from essential matrix gives inconsistent sign of translation vector

0 Answers0