How do you judge the (real world) distance of an object in a picture?

Question

I am building a recognition program in C++ and to make it more robust, I need to be able to find the distance of an object in an image.

Say I have an image that was taken 22.3 inches away of an 8.5 x 11 picture. The system correctly identifies that picture in a box with the dimensions 319 pixels by 409 pixels.
What is an effective way for relating the actual Height and width (AH and AW) and the pixel Height and width (PH and PW) to the distance (D)?

I am assuming that when I actually go to use the equation, PH and PW will be inversely proportional to D and AH and AW are constants (as the recognized object will always be an object where the user can indicate width and height).

There are more variables involved here. What is the angle of view (focal length) of the camera used? Number of effective pixels in the image and the aspect ration. Was the image cropped? Is it in sharp focus? (Changing the focus affects the angle of view a little bit). I believe that based on the angle of view, the pixel density and the distance, a formula can be worked out. Also, note that when the picture/object is close to the camera, even a small difference in distance D can make a relatively big difference in the number of pixels covered. — Raze, Jun 03 '11 at 07:00
Notice that if the object photographed is 8' by 11' and flat, a camera that's 22' away from the center will be 23' away from the corner. Saying that a camera is 22.3' away from such an object is precise beyond reality. — MSalters, Jun 03 '11 at 10:26
@MSalters: It seems perfectly reasonable to me to call something such as the distance along the camera axis "the distance" and to measure it as accurately as possible. — jilles de wit, Jun 03 '11 at 14:12
The only missing variable is the angle of view. If you know that you can use my answer below to compute distance. — jilles de wit, Jun 03 '11 at 14:13

ysdx · Accepted Answer · 2011-06-03T12:44:51.063

I don't know if you changed your question at some point but my first answer it quite complicated for what you want. You probably can do something simpler.

1) Long and complicated solution (more general problems)

First you need the know the size of the object.

You can to look at computer vision algorithms. If you know the object (its dimensions and shape). Your main problem is the problem of pose estimation (that is find the position of the object relative the camera) from this you can find the distance. You can look at [1] [2] (for example, you can find other articles on it if you are interested) or search for POSIT, SoftPOSIT. You can formulate the problem as an optimization problem : find the pose in order to minimize the "difference" between the real image and the expected image (the projection of the object given the estimated pose). This difference is usually the sum of the (squared) distances between each image point Ni and the projection P(Mi) of the corresponding object (3D) point Mi for the current parameters.

From this you can extract the distance.

For this you need to calibrate you camera (roughly, find the relation between the pixel position and the viewing angle).

Now you may not want do code all of this for by yourself, you can use Computer Vision libs such as OpenCV, Gandalf [3] ...

Now you may want to do something more simple (and approximate). If you can find the image distance between two points at the same "depth" (Z) from the camera, you can relate the image distance d to the real distance D with : d = a D/Z (where a is a parameter of the camera related to the focal length, number of pixels that you can find using camera calibration)

2) Short solution (for you simple problem)

But here is the (simple, short) answer : if you picture in on a plane parallel to the "camera plane" (i.e. it is perfectly facing the camera) you can use :

PH = a AH / Z
PW = a AW / Z

where Z is the depth of the plane of the picture and a in an intrinsic parameter of the camera.

For reference the pinhole camera model relates image coordinated m=(u,v) to world coordinated M=(X,Y,Z) with :

m   ~       K       M

[u]   [ au as u0 ] [X]
[v] ~ [    av v0 ] [Y]
[1]   [        1 ] [Z]

[u] = [ au as ] X/Z + u0
[v]   [    av ] Y/Z + v0

where "~" means "proportional to" and K is the matrix of intrinsic parameters of the camera. You need to do camera calibration to find the K parameters. Here I assumed au=av=a and as=0.

You can recover the Z parameter from any of those equations (or take the average for both). Note that the Z parameter is not the distance from the object (which varies on the different points of the object) but the depth of the object (the distance between the camera plane and the object plane). but I guess that is what you want anyway.

[1] Linear N-Point Camera Pose Determination, Long Quan and Zhongdan Lan

[2] A Complete Linear 4-Point Algorithm for Camera Pose Determination, Lihong Zhi and Jianliang Tang

[3] http://gandalf-library.sourceforge.net/

jilles de wit · Answer 2 · 2011-06-03T14:18:09.280

If you know the size of the real-world object and the angle of view of the camera then assuming you know the horizontal angle of view alpha(*), the horizontal resolution of the image is xres, then the distance dw to an object in the middle of the image that is xp pixels wide in the image, and xw meters wide in the real world can be derived as follows (how is your trigonometry?):

# Distance in "pixel space" relates to dinstance in the real word 
# (we take half of xres, xw and xp because we use the half angle of view):
(xp/2)/dp = (xw/2)/dw 
dw = ((xw/2)/(xp/2))*dp = (xw/xp)*dp (1)

# we know xp and xw, we're looking for dw, so we need to calculate dp:
# we can do this because we know xres and alpha 
# (remember, tangent = oposite/adjacent):
tan(alpha) = (xres/2)/dp
dp = (xres/2)/tan(alpha) (2)

# combine (1) and (2):
dw = ((xw/xp)*(xres/2))/tan(alpha)
# pretty print:
dw = (xw*xres)/(xp*2*tan(alpha))

(*) alpha = The angle between the camera axis and a line going through the leftmost point on the middle row of the image that is just visible.

Link to your variables: dw = D, xw = AW, xp = PW

score 1 · Answer 3 · answered Jun 03 '11 at 07:08

This may not be a complete answer but may push you in the right direction. Ever seen how NASA does it on those pictures from space? The way they have those tiny crosses all over the images. Thats how they get a fair idea about the deapth and size of the object as far as I know. The solution might be to have an object that you know the correct size and deapth of in the picture and then calculate the others' relative to that. Time for you to do some research. If thats the way NASA does it then it should be worth checking out.

I have got to say This is one of the most interesting questions i have seen for a long time on stackoverflow :D. I just noticed you have only two tags attached to this question. Adding something more in relation to images might help you better.

How do you judge the (real world) distance of an object in a picture?

3 Answers3