11

I am working on a project where I have a to detect a known picture in a scene in "real time" in a mobile context (that means I'm capturing frames using a smartphone camera and resizing the frame to be 150x225). The picture itself can be rather complex. Right now, I'm processing each frame in 1.2s in average (using OpenCV). I'm looking for ways to improve this processing time and global accuracy. My current implementation work as follow :

  1. Capture the frame
  2. Convert it to grayscale
  3. Detect the keypoint and extract the descriptors using ORB
  4. Match the descriptor (2NN) (object -> scene) and filter them with ratio test
  5. Match the descriptor (2NN) (scene -> object) and filter them with ratio test
  6. Non-symmetrical matching removal with 4. and 5.
  7. Compute the matching confidence (% of matched keypoints against total keypoints)

My approach might not be the right one but the results are OK even though there's a lot of room for improvement. I already noticed that SURF extraction is too slow and I couldn't manage to use homography (it might be related to ORB). All suggestions are welcome!

Ilmari Karonen
  • 44,762
  • 9
  • 83
  • 142
Cladouros
  • 121
  • 1
  • 4
  • When you profile this process, how long does each listed step take? What part of the 1.2s does each listed item account for? – Brad Larson Jul 16 '12 at 19:22
  • In average, the grayscale conversion takes 15ms, the detection and extraction phase 300ms and the rest (~900ms) is spent in the matching phase. – Cladouros Jul 16 '12 at 19:33
  • 3
    I've been attempting the same process myself, only doing it entirely on-GPU. I have up to the keypoint detection (using Harris corners, although I'm working on a FAST corner implementation), and am working on the rest. I was able to detect and extract key points for a 640x480 RGB frame in ~60 ms on an iPhone 4, although I think I caused the performance to regress a little recently with some failed optimizations. I've seen a few fast GPU-bound brute force matchers that I'm thinking of applying here. The code for what I have so far can be found here: https://github.com/BradLarson/GPUImage – Brad Larson Jul 16 '12 at 19:52
  • Great work, I'm definitively going to take a close look at it. – Cladouros Jul 17 '12 at 09:37

2 Answers2

7

Performance is always an issue on mobiles :)

There are a few things you can do. OpenCV: C++ and C performance comparison explains generic methods on processing time improvements.

And some specifics for your project:

  • If you capture color images and the convert them to grayscale, that is a biig waste of resources. The native camera format is YUV. It gets converted to RGB, which is costly, then to gray, which again is costly. All this while the first channel in YUV (Y) is the grayscale... So, capture YUV, and extract the first channel by copying the first part of the image data (YUV on Android is planar, that means that the first w*h pixels belong to the Y channel)
  • ORB was created to be fast. And it is. But just a few weeks ago FREAK was added to OpenCV. That is a new descriptor, whose authors claim is both more accurate and faster than ORB/SIFT/SURF/etc. Give it a try.YOu can find it in opencv >= 2.4.2 (This is the current now)

EDIT

Brad Larsen question is illuminating - if the matcher stays 900ms to process, then that's a problem! Check this post by Andrey Kamaev How Does OpenCV ORB Feature Detector Work? where he explains the possible combinations between descriptors and matchers. Try the FLANN-based uchar matcher.

And also, I suppose you get an awful lot of detections - hundreds or thousands - if it takes that much to match them. Try to limit the detections, or select only the first n best values.

Community
  • 1
  • 1
Sam
  • 18,653
  • 4
  • 53
  • 78
  • Thank you for your answer, the YUV trick is a great one (would it be faster than using ARM NEON directly?). I'm looking into FREAK right now. Do you have any idea on how to optimize the matching phase? – Cladouros Jul 16 '12 at 16:36
  • 1
    taking gray from yuv is just a memcpy, so it is faster than anything else. And while I did not check the FREAK source code, they say it can be sped up by SIMD (NEON). So, check the OpenCV code. Without profiling data, I cannot give you but general advices. – Sam Jul 16 '12 at 17:18
  • 2
    That OpenCV integration of FREAK was fast, I was just reading their conference prepublication paper on this a month ago. I'm also experimenting with a GPU-accelerated BRISK-style feature extraction and matching, and had been thinking about modifying FREAK for use on the GPU as well. You're right about just grabbing the Y plane, which is pretty easy on iOS using a YUV planar format for the camera and something like `CVPixelBufferGetBaseAddressOfPlane(cameraFrame, 0)`. – Brad Larson Jul 16 '12 at 19:37
  • @BradLarson FREAK was committed by the guys that developed it. I really appreciate their openness and willingness to support the opensource community – Sam Jul 17 '12 at 06:05
  • 2
    Extracting the Y channel from YUV worked out great, taking out 15ms of processing time. However, FREAK turned out to be way slower than ORB in this particulier case (0,3s vs 1,5s). I tweaked ORB parameters and was able to process higher resolution images (352x288 instead of 150x225) and get better results in approximately 1.5s per frame. That's more processing time than before but the matching is better so there are fewer frames to process and it seems faster to the end user. I couldn't manage to find any information regarding the FLANN-based uchar matcher, do you have some link about it? – Cladouros Jul 17 '12 at 09:36
  • I did not use the matcher, just know about it, and saw in that post. It may be added recently, and in this case,the docs are scarce. – Sam Jul 17 '12 at 09:46
3

You should try FAST to detect the object in the scene, is faster than SURF and you can find articles that use a pyramidal version of FAST. To improve performance on mobiles you can optimize loops, use fixed-poit arithmetics, etc. Good luck.

Mar de Romos
  • 699
  • 2
  • 9
  • 19