How can I stitch images from video cameras in real time?

Question

I use 4 stationary cameras. Cameras do not move relative to each other. And I want to stitch video images from them into the one video image in real time.

I use for this OpenCV 2.4.10, and cv:stitcher class, like this:

// use 4 video-cameras
cv::VideoCapture cap0(0), cap1(1), cap2(2), cap3(3);

bool try_use_gpu = true;    // use GPU
cv::Stitcher stitcher = cv::Stitcher::createDefault(try_use_gpu);
stitcher.setWarper(new cv::CylindricalWarperGpu());
stitcher.setWaveCorrection(false);
stitcher.setSeamEstimationResol(0.001);
stitcher.setPanoConfidenceThresh(0.1);

//stitcher.setSeamFinder(new cv::detail::GraphCutSeamFinder(cv::detail::GraphCutSeamFinderBase::COST_COLOR_GRAD));
stitcher.setSeamFinder(new cv::detail::NoSeamFinder());
stitcher.setBlender(cv::detail::Blender::createDefault(cv::detail::Blender::NO, true));
//stitcher.setExposureCompensator(cv::detail::ExposureCompensator::createDefault(cv::detail::ExposureCompensator::NO));
stitcher.setExposureCompensator(new cv::detail::NoExposureCompensator());


std::vector<cv::Mat> images(4);
cap0 >> images[0];
cap1 >> images[1];
cap2 >> images[2];
cap3 >> images[3];

// call once!
cv::Stitcher::Status status = stitcher.estimateTransform(images);


while(true) {

    // **lack of speed, even if I use old frames**
    // std::vector<cv::Mat> images(4);
    //cap0 >> images[0];
    //cap1 >> images[1];
    //cap2 >> images[2];
    //cap3 >> images[3];

    cv::Stitcher::Status status = stitcher.composePanorama(images, pano_result);
}

I get only 10 FPS (frame per seconds), but I need 25 FPS. How can I accelerate this example?

When I use stitcher.setWarper(new cv::PlaneWarperGpu()); then I get a very enlarged image, this I do not need.

I need only - Translations.

For example, I'm ready to don't use:

Perspective transformation
Scale operations
and may be even Rotations

How can I do it? Or how can I get from cv::Stitcher stitcher parameters x,y of translations for each of images?

UPDATE - profiling in MSVS 2013 on Windows 7 x64: enter image description here

What kind of machine are you using? Do you have tbb enabled? Besides, can you provide a few images as example to be stitched together? — Antonio, Apr 01 '15 at 19:51
@Antonio 8 GB RAM + CPU (Intel Core i5 760 - 4 Cores) + GPU (nVidia GeForce GTX 970 - 1664 Cores). TBB disabled. OpenCV 2.4.10 compiled with CUDA 6.5 and disabled OpenMP/TBB. — Alex, Apr 01 '15 at 19:55
TBB might help by multithreading the process... For which system did you build, and with which build tools? Also, one quick thing to try is to put the image array declaration/definition out of the while loop. (You are allocating and deallocating at each cycle). I suggest that you put some timer around your stitching function, to check that that function call is the actual bottleneck. — Antonio, Apr 01 '15 at 19:58
@Antonio GCC 4.7.2 + CUDA 6.5 on Linux x86_64 Debian 7 (Wheezy). **Fixed:** I put the image array declaration/definition out of the while loop. — Alex, Apr 01 '15 at 20:02
Could you test if that gives any speed improvement? Putting the timer I mentioned before would be very important, to exclude the possibility you are stuck in reading frames. Do you get 25fps if you skip completely the stitching? — Antonio, Apr 01 '15 at 20:09
I would be also curious to know how your cameras are plugged to the machine (network cards? Usb 2? Usb 3?), what is the resolution and if you have greyscale or color images. — Antonio, Apr 01 '15 at 20:15
@Antonio No, lack of speed, even if I use old frames on each iteration. I use 640x480 RGB frames. I added profiling information, from MSVS 2013 on Windows. — Alex, Apr 01 '15 at 20:30
Fantastic profiling! In the meantime, [here](http://stackoverflow.com/a/18529327/2436175) there is some useful information: if you only want translation, probably stitching is not what you need. You say translation is sufficient: are your cameras all on the same axis and plane, and oriented perpendicularly to this axis? — Antonio, Apr 01 '15 at 20:41
Another thing: in OpenCV 2.4 don't you have to use [GpuMat](http://docs.opencv.org/modules/gpu/doc/data_structures.html?highlight=gpumat#gpu-gpumat) instead of Mat? I think it might be so far you haven't been using the gpu... — Antonio, Apr 01 '15 at 20:44
If you only need translations and the cameras don't move, why can't you precompute the position of each source video on the target and then later on just put them there and blend them? Is it because you get seams? — isarandi, Apr 01 '15 at 21:00
This might be relevant: http://stackoverflow.com/questions/29446694/opencv-stitching-images-from-google-maps — Antonio, Apr 04 '15 at 21:08

score 14 · Accepted Answer · edited May 23 '17 at 12:02

cv::Stitcher is fairly slow. If your cameras definitely don't move relative to one another and the transformation is as simple as you say, you should be able to overlay the images onto a blank canvas simply by chaining homographies.

The following is somewhat mathematical - if this isn't clear I can write it up properly using LaTeX, but SO doesn't support pretty maths :)

You have a set of 4 cameras, from left to right, (C_1, C_2, C_3, C_4), giving a set of 4 images (I_1, I_2, I_3, I_4).

To transform from I_1 to I_2, you have a 3x3 transformation matrix, called a homography. We'll call this H_12. Similarly for I_2 to I_3 we have H_23 and for I_3 to I_4 you'll have H_34.

You can pre-calibrate these homographies in advance using the standard method (point matching between the overlapping cameras).

You'll need to create a blank matrix, to act as the canvas. You can guess the size of this (4*image_size would suffice) or you can take the top-right corner (call this P1_tr) and transform it by the three homographies, giving a new point at the top-right of the panorama, PP_tr (the following assumes that P1_tr has been converted to a matrix):

PP_tr = H_34 * H_23 * H_12 * P1_tr'

What this is doing, is taking P1_tr and transforming it first into camera 2, then from C_2 to C_3 and finally from C_3 to C_4

You'll need to create one of these for combining images 1 and 2, images 1,2 and 3 and finally images 1-4, I'll refer to them as V_12, V_123 and V_1234 respectively.

Use the following to warp the image onto the canvas:

cv::warpAffine(I_2, V_12, H_12, V_12.size( ));

Then do the same with the next images:

cv::warpAffine(I_3, V_123, H_23*H_12, V_123.size( ));
cv::warpAffine(I_4, V_1234, H_34*H_23*H_12, V_1234.size( ));

Now you have four canvases, all of which are the width of the 4 combined images, and with one of the images transformed into the relevant place on each.

All that remains is to merge the transformed images onto eachother. This is easily achieved using regions of interest.

Creating the ROI masks can be done in advance, before frame capture begins.

Start with a blank (zeros) image the same size as your canvases will be. Set the leftmost rectangle the size of I_1 to white. This is the mask for your first image. We'll call it M_1.

Next, to get the mask for the second transformed image, we do

cv::warpAffine(M_1, M_2, H_12, M_1.size( ));
cv::warpAffine(M_2, M_3, H_23*H_12, M_1.size( ));
cv::warpAffine(M_3, M_4, H_34*H_23*H_12, M_1.size( ));

To bring all the images together into one panorama, you do:

cv::Mat pano = zeros(M_1.size( ), CV_8UC3);
I_1.copyTo(pano, M_1);
V_12.copyTo(pano, M_2): 
V_123.copyTo(pano, M_3): 
V_1234.copyTo(pano, M_4):

What you're doing here is copying the relevant area of each canvas onto the output image, pano - a fast operation.

You should be able to do all this on the GPU, substituting cv::gpu::Mat's for cv::Mats and cv::gpu::warpAffine for its non-GPU counterpart.

To make this approach even more efficient, you could precalculate the affine warping maps, since they are constant, and use `cv::remap` (see [doc](http://docs.opencv.org/modules/imgproc/doc/geometric_transformations.html#remap)) which is much faster than `cv::warpAffine`. — BConic, Apr 02 '15 at 11:25
Nice, yes that's a very good idea. I've actually done that in the past and it works very well, just didn't spring to mind when I was writing out the above. — n00dle, Apr 02 '15 at 14:39
@n00dle Thank you very much! But you said, that I must find: **3x3** transformation matrix, called a **homography**, and use: `cv::warpAffine();`, but as I know `warpAffine();` uses **affine transformation matrix (2x3)**. Does it mean that I must use `warpAffine();` after `estimateRigidTransform();`? Or do I must use `warpPerspective();` after `findHomography();`? And how can I get `map1, map2` for `ramap();` from homography which I found by using `findHomography();`? — Alex, Apr 03 '15 at 13:13
Apologies, I mixed up the two. In this case you say you need only a translation in which case the affine matrix, estimated with estimateRigidTransform would do the trick. If you find this doesn't represent the full transformation (i.e. the overlaid images look a bit out of line in a non-uniform way) you may wish to use the homography. I'm not 100% on the remap question - I think I have the code at work, but won't be back in until Wednesday as it's the easter break here in the UK. I would advise looking at initUndistortRectifyMap()as I feel that may have something to do with it. — n00dle, Apr 03 '15 at 22:24
Hi, thanks for your explanation. I'm trying something very similar. What bothers me is that whn doing homographies is that you get that big useless black area around your image. Is there any way to only obtain the usefull part of the image? (I'd rather not use for loops for performance reasons). Here is what I mean: original image: http://imgur.com/a/wUlwE , to which I then apply a hompgraphy transformation to get a birdview: http://imgur.com/a/49Ee5 (note the big unwanted black area in my window), and I'd like: http://imgur.com/a/gCoEC (I manually cut it out) without losing too much data — LandonZeKepitelOfGreytBritn, Sep 02 '17 at 15:49
You can work out the position of the corners of your original image (lets call it `A`) in the new image space (`B`) quite easily once you've got the 3x3 homography (`H`). For some corner `A = (x,y)`, express it in homogenous coordinates `A' = (x, y, 1)` and do `B' = HA'`. Using this will give you the four corner points in the new image and so you can crop it as you choose based on those. — n00dle, Sep 04 '17 at 10:18

Antonio · Answer 2 · 2015-04-02T15:13:54.967

1

Note: I leave this answer just as documentation of what was tried, as the method I suggested doesn't seem to work, while apparently the GPU is already in use when using cv::Mat.

Try using gpu::GpuMat:

std::vector<cv::Mat> images(4);
std::vector<gpu::GpuMat> gpuImages(4);
gpu::GpuMat pano_result_gpu;
cv::Mat pano_result; 
bool firstTime = true;

[...] 

cap0 >> images[0];
cap1 >> images[1];
cap2 >> images[2];
cap3 >> images[3];
for (int i = 0; i < 4; i++)
   gpuImages[i].upload(images[i]);
if (firstTime) {
    cv::Stitcher::Status status = stitcher.estimateTransform(gpuImages);
    firstTime = false;
    }
cv::Stitcher::Status status = stitcher.composePanorama(gpuImages, pano_result_gpu);
pano_result_gpu.download(pano_result);

edited Apr 02 '15 at 15:13

answered Apr 01 '15 at 20:52

Antonio

17,405
10
78
178

When I use `cv::gpu::GpuMat` instead of `cv::Mat` in `estimateTransform();` or in `composePanorama();` then I get an error on Windows 7x64 + OpenCV 2.4.9: *OpenCV Error: Assertion failed (func != 0) in cv::resize, file C:\opencv_2.4.9\o pencv\sources\modules\imgproc\src\imgwarp.cpp, line 1980* – Alex Apr 02 '15 at 11:42
@Alex That at least proves the GPU was not used before :) Please update your question and post the code you find in `C:\opencv_2.4.9\o pencv\sources\modules\imgproc\src\imgwarp.cpp` around line 1980 – Antonio Apr 02 '15 at 12:25
But MSVS 2013 result of profiling, which I added to my question show me that GPU used many times :) Code from error is here: http://pastebin.com/VVN1gMK2 – Alex Apr 02 '15 at 14:13
@Alex I saw I had a copy of 2.4.9, but that's difficult to debug... Who is calling the `cv::resize` with apparently a wrong interpolation attribute? Maybe after all using gpu::GpuMat is not the correct way. – Antonio Apr 02 '15 at 14:41

How can I stitch images from video cameras in real time?

2 Answers2