11

I'm looking for the fastest and more efficient method of detecting an object in a moving video. Things to note about this video: It is very grainy and low resolution, also both the background and foreground are moving simultaneously.

Note: I'm trying to detect a moving truck on a road in a moving video.

Methods I've tried:

Training a Haar Cascade - I've attempted training the classifiers to identify the object by taking copping multiple images of the desired object. This proved to produce either many false detects or no detects at all (the object desired was never detected). I used about 100 positive images and 4000 negatives.

SIFT and SURF Keypoints - When attempting to use either of these methods which is based on features, I discovered that the object I wanted to detect was too low in resolution, so there were not enough features to match to make an accurate detection. (Object desired was never detected)

Template Matching - This is probably the best method I've tried. It's the most accurate although the most hacky of them all. I can detect the object for one specific video using a template cropped from the video. However, there is no guaranteed accuracy because all that is known is the best match for each frame, no analysis is done on the percentage template matches the frame. Basically, it only works if the object is always in the video, otherwise it will create a false detect.

So those are the big 3 methods I've tried and all have failed. What would work best is something like template matching but with scale and rotation invariance (which led me to try SIFT/SURF), but i have no idea how to modify the template matching function.

Does anyone have any suggestions how to best accomplish this task?

endolith
  • 21,410
  • 30
  • 114
  • 183
monky822
  • 221
  • 4
  • 9
  • 2
    How is the truck oriented? Does its shape/orientation change? Does the camera change position? Is this a one-off video, or a system that needs to work in many different conditions? – endolith Dec 01 '09 at 17:11
  • I agree with endolith, it is crucial you define the problem with more details. The choice of method will affect the robustness. – Ivan Dec 02 '09 at 14:32
  • The view of the truck by its side and it is moving horizontally. The shape of the vehicle does not change much, which is why the template matching works, but I still want my method to be robust. Basically the camera pans left and right, following a few different vehicles, with some other vehicles driving past in the background. Essentially, I want this to work in more situations than one (but mainly dealing with similar quality video). The least I want to accomplish is a detector of moving objects inside a moving video. – monky822 Dec 10 '09 at 00:58
  • 2
    Can you post a sample frame from your video? – Martin Thompson Oct 19 '11 at 10:48

5 Answers5

5

Apply optical flow to the image and then segment it based on flow field. Background flow is very different from "object" flow (which mainly diverges or converges depending on whether it is moving towards or away from you, with some lateral component also).

Here's an oldish project which worked this way:

http://users.fmrib.ox.ac.uk/~steve/asset/index.html

Martin Thompson
  • 15,738
  • 1
  • 36
  • 52
2

This vehicle detection paper uses a Gabor filter bank for low level detection and then uses the response to create the features space where it trains an SVM classifier.

The technique seems to work well and is at least scale invariant. I am not sure about rotation though.

Ivan
  • 7,156
  • 1
  • 19
  • 21
1

Not knowing your application, my initial impression is normalized cross-correlation, especially since I remember seeing a purely optical cross-correlator that had vehicle-tracking as the example application. (Tracking a vehicle as it passes using only optical components and an image of the side of the vehicle - I wish I could find the link.) This is similar (if not identical) to "template matching", which you say kind of works, but this won't work if the images are rotated, as you know.

However, there's a related method based on log-polar coordinates that will work regardless of rotation, scale, shear, and translation.

I imagine this would also enable tracking that the object has left the scene of the video, too, since the maximum correlation will decrease.

endolith
  • 21,410
  • 30
  • 114
  • 183
0

How low resolution are we talking? Could you also elaborate on the object? Is it a specific color? Does it have a pattern? The answers affect what you should be using.

Also, I might be reading your template matching statement wrong, but it sounds like you are overtraining it (by testing on the same video you extracted the object from??).

UsAaR33
  • 3,218
  • 2
  • 26
  • 53
  • The resolution is 720x480, but the quality of the video is very poor. The video is very pixelated at this resolution. Regarding the template matching, i'm not training anything. I am just using a cropped object from the video and just searching for in from each frame. – monky822 Dec 08 '09 at 17:06
  • Well you are training it, just on one set of data. The template will match darn well if the lighting and orientation of the object hardly changes. The moment that does, accuracy will really drop off. Again though, use all the cues you can - esp. color if it is there. – UsAaR33 Dec 10 '09 at 12:34
0

A Haar Cascade is going to require significant training data on your part, and will be poor for any adjustments in orientation.

Your best bet might be to combine template matching with an algorithm similar to camshift in opencv (5,7MB PDF), along with a probabilistic model (you'll have to figure this one out) of whether the truck is still in the image.

NGLN
  • 41,230
  • 8
  • 102
  • 186
BrainCore
  • 4,650
  • 3
  • 28
  • 35