48

I have 4 coplanar points in a video (or image) representing a quad (not necessarily a square or rectangle) and I would like to be able to display a virtual cube on top of them where the corners of the cube stand exactly on the corners of the video quad.

Since the points are coplanar I can compute the homography between the corners of a unit square (i.e. [0,0] [0,1] [1,0] [1,1]) and the video coordinates of the quad.

From this homography I should be able to compute a correct camera pose, i.e. [R|t] where R is a 3x3 rotation matrix and t is a 3x1 translation vector so that the virtual cube lies on the video quad.

I have read many solutions (some of them on SO) and tried implementing them but they seem to work only in some "simple" cases (like when the video quad is a square) but do not work in most cases.

Here are the methods I tried (most of them are based on the same principles, only the computation of the translation are slightly different). Let K be the intrinsics matrix from the camera and H be the homography. We compute:

A = K-1 * H

Let a1,a2,a3 be the column vectors of A and r1,r2,r3 the column vectors of the rotation matrix R.

r1 = a1 / ||a1||
r2 = a2 / ||a2||
r3 = r1 x r2
t = a3 / sqrt(||a1||*||a2||)

The issue is that this does not work in most cases. In order to check my results, I compared R and t with those obtained by OpenCV's solvePnP method (using the following 3D points [0,0,0] [0,1,0] [1,0,0] [1,1,0]).

Since I display the cube in the same way, I noticed that in every case solvePnP provides correct results, while the pose obtained from the homography is mostly wrong.

In theory since my points are coplanar, it is possible to compute the pose from a homography but I couldn't find the correct way to compute the pose from H.

Any insights on what I am doing wrong?

Edit after trying @Jav_Rock's method

Hi Jav_Rock, thanks very much for your answer, I tried your approach (and many others as well) which seems to be more or less OK. Nevertheless I still happen to have some issues when computing the pose based on 4 coplanar point. In order to check the results I compare with results from solvePnP (which will be much better due to the iterative reprojection error minimization approach).

Here is an example:

cube

  • Yellow cube: Solve PNP
  • Black Cube: Jav_Rock's technique
  • Cyan (and Purple) cube(s): some other techniques given the exact same results

As you can see, the black cube is more or less OK but doesn't seem well proportioned, although the vectors seem orthonormal.

EDIT2: I normalized v3 after computing it (in order to enforce orthonormality) and it seems to solve some problems as well.

alexburtnik
  • 7,243
  • 4
  • 27
  • 63
JimN
  • 703
  • 1
  • 6
  • 8
  • 3
    So opencv's solvepnp provides correct results while your implementation is wrong ? – nav Jan 28 '12 at 16:40
  • 2
    Yes solvePnP gives correct results while my implementation using homographies only does not gives correct rotation/translation vectors. – JimN Jan 30 '12 at 14:15
  • 1
    If you share your code we can go through it and see how it can be fixed. One thing you might have forgotten is to enforce orthonormality of the rotation matrix. – fireant May 26 '12 at 17:52
  • 1
    I believe you have all the steps you need: 1.-Obtain camera intrinsics 2.-Define 4-point correspondences and compute H with DLT 3.-Left-multiply H with K.inv() 4.-Decompose the result as explained by @Jav_Rock – marcos.nieto Oct 25 '13 at 17:22
  • I tryed both methods but I all the time get wrong results. With solvePnP at least some parts of my projection manke sense. Can you please have a look at my question for providing an answer? http://stackoverflow.com/a/29078048/663551 – Jakob Alexander Eichler Mar 17 '15 at 13:37
  • Hey. Would somebody help me solving my latest question? It is something similar to this question but I'm not really sure how I have to use the solution provided below. How to call `cameraPoseFromHomography`? What parameter is H and what parameter is pose? **How to draw a cube like in the questions image?** Please help me because I'm clueless how to go on! Greetings- Jonas (You can find the question here: https://stackoverflow.com/questions/51009968/how-to-draw-cube-c) –  Jun 24 '18 at 15:31

7 Answers7

32

If you have your Homography, you can calculate the camera pose with something like this:

void cameraPoseFromHomography(const Mat& H, Mat& pose)
{
    pose = Mat::eye(3, 4, CV_32FC1);      // 3x4 matrix, the camera pose
    float norm1 = (float)norm(H.col(0));  
    float norm2 = (float)norm(H.col(1));  
    float tnorm = (norm1 + norm2) / 2.0f; // Normalization value

    Mat p1 = H.col(0);       // Pointer to first column of H
    Mat p2 = pose.col(0);    // Pointer to first column of pose (empty)

    cv::normalize(p1, p2);   // Normalize the rotation, and copies the column to pose

    p1 = H.col(1);           // Pointer to second column of H
    p2 = pose.col(1);        // Pointer to second column of pose (empty)

    cv::normalize(p1, p2);   // Normalize the rotation and copies the column to pose

    p1 = pose.col(0);
    p2 = pose.col(1);

    Mat p3 = p1.cross(p2);   // Computes the cross-product of p1 and p2
    Mat c2 = pose.col(2);    // Pointer to third column of pose
    p3.copyTo(c2);       // Third column is the crossproduct of columns one and two

    pose.col(3) = H.col(2) / tnorm;  //vector t [R|t] is the last column of pose
}

This method works form me. Good luck.

Jav_Rock
  • 21,011
  • 18
  • 115
  • 164
  • 8
    Hi Jav_Rock, thanks very much for your answer, I tried your method and edited the post so that you can see the obtained results. Thanks again. – JimN Jun 20 '12 at 09:16
  • 3
    I think the image is not visible. Anyway, if you want to go deeper into theory you can read this question from the dsp.stackexchange http://dsp.stackexchange.com/q/2736/1473 – Jav_Rock Jun 29 '12 at 13:37
  • 5
    Either I'm not getting it right (code is 100% the same as yours) or OpenCV has changed in the way it handles the Mat-object since you've posted this answer. Using assignments such as yours (p1,p2...) does NOT change the pose-argument and leads to a resulting pose identical to its initialization - a 3x4 identity matrix. Using copyTo() resolves the issue. It seems that deep copy is necessary. Check @Jacob's reply at http://stackoverflow.com/questions/6411476/opencv-matoperator-does-it-support-copy-on-write – rbaleksandar Jun 19 '14 at 14:28
  • I tryed to translate the code to Java but my returned results are bad. – Jakob Alexander Eichler Mar 17 '15 at 13:31
  • Is it recommended to estimate the camera pose with solvePnP or homography? – Jakob Alexander Eichler Mar 25 '15 at 22:05
  • Why is normalization needed before copying 1st two columns? – Gaurav Fotedar May 12 '15 at 10:22
  • 1
    @Jav_Rock, How is your approach working without using intrinsics of the camera? – alexburtnik May 16 '17 at 17:10
  • Hey guys. I see @Jav_Rock answer is marked as correct, so I'm pretty sure it is a working solution to the question but tbh I can not really get how I have to use @Jav_Rock approach to end up with a 3d cube like it is mentioned in the question. What are the parameters for `cameraPoseFromHomography`? Can I calculate the cube without knowing anything more than the 4 corner points of the rectangle? Any help would be very appreciated. Greetings –  Jun 24 '18 at 15:35
11

The answer proposed by Jav_Rock does not provide a valid solution for camera poses in three-dimensional space.

For estimating a tree-dimensional transform and rotation induced by a homography, there exist multiple approaches. One of them provides closed formulas for decomposing the homography, but they are very complex. Also, the solutions are never unique.

Luckily, OpenCV 3 already implements this decomposition (decomposeHomographyMat). Given an homography and a correctly scaled intrinsics matrix, the function provides a set of four possible rotations and translations.

Emiswelt
  • 3,651
  • 1
  • 30
  • 52
  • The calculation of picking the correct solution out of last two possible solutions is very complicated. Do you know any implementation of the paper which can return one solution out of final two solutions? – Sanjeev Kumar Feb 18 '17 at 17:54
  • @YonatanSimson A homography describes the perspective transform given by four coplanar points. Your own answer below utilizes a homography matrix. What's the issue? – Emiswelt Jun 17 '20 at 08:19
9

Just in case anybody needs python porting of the function written by @Jav_Rock:

def cameraPoseFromHomography(H):
    H1 = H[:, 0]
    H2 = H[:, 1]
    H3 = np.cross(H1, H2)

    norm1 = np.linalg.norm(H1)
    norm2 = np.linalg.norm(H2)
    tnorm = (norm1 + norm2) / 2.0;

    T = H[:, 2] / tnorm
    return np.mat([H1, H2, H3, T])

Works fine in my tasks.

Viktor Latypov
  • 13,683
  • 3
  • 36
  • 53
Dmytriy Voloshyn
  • 972
  • 10
  • 24
9

Computing [R|T] from the homography matrix is a little more complicated than Jav_Rock's answer.

In OpenCV 3.0, there is a method called cv::decomposeHomographyMat that returns four potential solutions, one of them is correct. However, OpenCV didn't provide a method to pick out the correct one.

I'm now working on this and maybe will post my codes here later this month.

Yang Kui
  • 488
  • 5
  • 11
0

Plane that contain your Square on image has vanishing lane agents your camera. Equation of this line is Ax+By+C=0.

Normal of your plane is (A,B,C)!

Let p00,p01,p10,p11 are coordinates of point after applying camera's intrinsic parameters and in homogenous form e.g, p00=(x00,y00,1)

Vanishing line can be calculated as:

  • down = p00 cross p01;
  • left = p00 cross p10;
  • right = p01 cross p11;
  • up = p10 cross p11;
  • v1=left cross right;
  • v2=up cross down;
  • vanish_line = v1 cross v2;

Where cross in standard vector cross product

DejanM
  • 81
  • 1
  • 4
0

You could use this function. Works for me.

def find_pose_from_homography(H, K):
    '''
    function for pose prediction of the camera from the homography matrix, given the intrinsics 
    
    :param H(np.array): size(3x3) homography matrix
    :param K(np.array): size(3x3) intrinsics of camera
    :Return t: size (3 x 1) vector of the translation of the transformation
    :Return R: size (3 x 3) matrix of the rotation of the transformation (orthogonal matrix)
    '''
    
    
    #to disambiguate two rotation marices corresponding to the translation matrices (t and -t), 
    #multiply H by the sign of the z-comp on the t-matrix to enforce the contraint that z-compoment of point
    #in-front must be positive and thus obtain a unique rotational matrix
    H=H*np.sign(H[2,2])

    h1,h2,h3 = H[:,0].reshape(-1,1), H[:,1].reshape(-1,1) , H[:,2].reshape(-1,1)
    
    R_ = np.hstack((h1,h2,np.cross(h1,h2,axis=0))).reshape(3,3)
    
    U, S, V = np.linalg.svd(R_)
    
    R = U@np.array([[1,0,0],
                   [0,1,0],
                    [0,0,np.linalg.det(U@V.T)]])@V.T
    
    t = (h3/np.linalg.norm(h1)).reshape(-1,1)
    
    return R,t
-1

Here's a python version, based on the one submitted by Dmitriy Voloshyn that normalizes the rotation matrix and transposes the result to be 3x4.

def cameraPoseFromHomography(H):  
    norm1 = np.linalg.norm(H[:, 0])
    norm2 = np.linalg.norm(H[:, 1])
    tnorm = (norm1 + norm2) / 2.0;

    H1 = H[:, 0] / norm1
    H2 = H[:, 1] / norm2
    H3 = np.cross(H1, H2)
    T = H[:, 2] / tnorm

    return np.array([H1, H2, H3, T]).transpose()
Clay
  • 1
  • 1