21

I would like to capture the number from this kind of picture.

enter image description here

I tried multi-scale matching from the following link.

http://www.pyimagesearch.com/2015/01/26/multi-scale-template-matching-using-python-opencv/

All I want to know is the red number. But the problem is, the red number is blurry for openCV recognize/match template. Would there be other possible way to detect this red number on the black background?

en_Knight
  • 4,793
  • 1
  • 23
  • 40
spencerJANG
  • 240
  • 2
  • 8
  • Multi-scale won't help you resolve the image more clearly, unfortunately. Furthermore, you'll either need to recognize multiple fonts or prioritize the fonts you will recognize (such as the number above). Check out [this related question](http://stackoverflow.com/questions/7765810/is-there-a-way-to-detect-if-an-image-is-blurry) – Aaron3468 Jun 05 '16 at 19:19
  • Here are some latest research approaches: 1) [Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks](http://research.google.com/pubs/pub42241.html); 2) [Reading Text in the Wild](http://www.robots.ox.ac.uk/~vgg/research/text/) . Deep convolutional neural network is the common building block for these approaches. – Jon Jun 14 '16 at 01:39

2 Answers2

19

Classifying Digits

You clarified in comments that you've already isolated the number part of the image pre-detection, so I'll start under that assumption.

Perhaps you can approximate the perspective effects and "blurriness" of the number by treating it as a hand-written number. In this case, there is a famous data-set of handwritten numerals for classification training called mnist.

Yann LeCun has enumerated the state of the art on this dataset here mnist hand-written dataset.

At the far end of the spectrum, convolutional neural networks yield outrageously low error rates (fractions of 1% error). For a simpler solution, k-nearest neighbours using deskewing, noise removal, blurring, and 2 pixel shift, yielded about 1% error, and is significantly faster to implement. Python opencv has an implementation. Neural networks and support vector machines with deskewing also have some pretty impressive performance rates.

Note that convolutional networks don't have you pick your own features, so the important color-differential information here might just be used for narrowing the region-of-interest. Other approaches, where you define your feature space, might incorporate the known color difference more precisely.

Python supports a lot of machine learning techniques in the terrific package sklearn - here are examples of sklearn applied to mnist. If you're looking for an tutorialized explanation of machine learning in python, sklearn's own tutorial is very verbose

From the sklearn link: Classifying mnist

Those are the kinds of items you're trying to classify if you learn using this approach. To emphasize how easy it is to start training some of these machine learning-based classifiers, here is an abridged section from the example code in the linked sklearn package:

digits = datasets.load_digits() # built-in to sklearn!
data = digits.images.reshape((len(digits.images), -1))

# Create a classifier: a support vector classifier
classifier = svm.SVC(gamma=0.001)

# We learn the digits on the first half of the digits
classifier.fit(data[:n_samples / 2], digits.target[:n_samples / 2])

If you're wedded to openCv (possibly because you want to port to a real-time system in the future), opencv3/python has a tutorial on this exact topic too! Their demo uses k-nearest-neighbor (listed in the LeCun page), but they also have svms and the many of the other tools in sklearn. Their ocr page using SVMs uses deskewing, which might be useful with the perspective effect in your problem:

Deskewed digit


UPDATE: I used the out-of-the box skimage approach outlined above on your image, heavily cropped, and it correctly classified it. A lot more testing would be required to see if this is rhobust in practice

enter image description here

^^ That tiny image is the 8x8 crop of the image you embedded in your question. mnist is 8x8 images. That's why it trains in less than a second with default arguments in skimage.

I converted it the correct format by scaling it up to the mnist range using

number = scipy.misc.imread("cropped_image.png")
datum  =  (number[:,:,0]*15).astype(int).reshape((64,))
classifier.predict(datum) # returns 8

I didn't change anything else from the example; here, I'm only using the first channel for classification, and no smart feature computation. 15 looked about right to me; you'll need to tune it to get within the target range or (ideally) provide your own training and testing set


Object Detection

If you haven't isolated the number in the image you'll need an object detector. The literature space on this problem is gigantic and I won't start down that rabbit hole (google Viola and Jones, maybe?) This blog covers the fundamentals of a "sliding window" detector in python. Adrian Rosebrock looks like he's even a contributor on SO, and that page has some good examples of opencv and python-based object detectors fairly tutorialized (you actually linked to that blog in your question, I didn't realize).

In short, classify windows across the image and pick the window of highest confidence. Narrowing down the search space with a region of interest will of course yield huge improvements in all areas of performance

en_Knight
  • 4,793
  • 1
  • 23
  • 40
  • oh thanks en_Knight. As I am very new to opencv, would you kindly direct me to some tutorials on how to use these classification features that python offers? My initial attempt would be crop the image so that the black background and red digits only appear. Then run though classifier to identify the digit? Does it sound legit? – spencerJANG Jun 06 '16 at 08:22
  • That sounds like a great approach. If you can crop the image, the problem becomes 10,000 times more fun; in my example where we used their built-in digits dataset to train a classifier, I cropped your image around the number and then downscaled it to be 8x8. is that skimage link tough to follow? I can look for some alternatives, there are certainly a lot of classification tutorials out there – en_Knight Jun 06 '16 at 13:36
  • datum = (number[:,:,0]*15).astype(int).reshape((64,)) ------- I am not sure if I understand this line. could you explain to me in detail? I presume that it resizes the cropped image to 8x8 but when I try I get a value error saying that array size must be unchanged. – spencerJANG Jun 06 '16 at 14:09
  • @spencerJANG The line `imread("cropped_image.png")` loads your image from memory as an `NxMx3` matrix. In my case, I've already cropped it to be an `8x8` matrix (I attached that image to the answer so you can download it and try for yourself). You'll need to supply you're own additional cropping/downscaling code to do that. The next line, `number[:,:,0]` says "extract just the red channel" because we expect a grayscale image. You can make it gray any way you want. x15 scales it to the expected range (again, experiment with that). Make sense? – en_Knight Jun 06 '16 at 14:17
  • Yup. Thanks for the comments. Just another quick question. Is there another library or tool to figure out arrows? Alphabet? Just like digit recognition ? – spencerJANG Jun 06 '16 at 20:46
  • @spencerJANG I think that's too broad as a quick question :) "classification" in general is an unaproachably broad topic for this format (it's lies someone on the union of stats.se, math.se, SO, and idk what else) - I recommended this specific approach for digits largely because the mnist set is so accessible. Arrows seem like you could do simple geometric analysis. Alphabet - google around for a dataset or build your own - but sure, blurry letters seem like blurry numbers to me, and I doubt any other approach will yield terrific results in that space – en_Knight Jun 06 '16 at 21:10
  • I have bunch of up, down arrow shapes that needs to be shrink to be 8x8 size. Then i can compare with the arrow in the picture to see if they match. Does this sound legit to you? btw, regarding cropping image to 8x8, will cv2.resize (number, (8,8)) do the job? – spencerJANG Jun 07 '16 at 13:59
  • @spencerJANG sounds plausible to me, yes. Especially if the arrows come at weird perspective skews, it might make an elegant solution; if you're looking at them flat-on, I'd imagine there are easier ways since it has to fit one of two exact formats, but in a real image - sure (but surely don't take my word for it - try it and find out!) – en_Knight Jun 07 '16 at 14:20
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/114041/discussion-between-en-knight-and-spencerjang). – en_Knight Jun 07 '16 at 14:20
3

You have a couple of things you can use to your advantage:

  • The number is within the black rectangular bezel and one colour
  • The number appears to be a segmented LCD type display, if so there are only a finite number of segments which are off or on.

So I suggest you:

  • Calibrate your camera and preprocess the image to remove lens distortion
  • Rectify the display rectangle:
    • Detect the display rectangle using either the intersection of hough lines, or edge detection followed by contour detection and then pick the biggest, squarest contours
    • use GetPerspectiveTransform to get the transform between image coordinates and an ideal rectangle, then transform the input image using WarpPerspective
  • Split image into R, G and B channels and work out r - avg(g, b), this is a bit lighting dependent but should give something like this:

    cleaned up number image

  • Then either try pattern matching on this, or perhaps re-segment the image and attempt to find which display segments are lit, or run through an OCR package.
Peter Wishart
  • 9,257
  • 1
  • 20
  • 41
  • "Calibrate your camera" how much of this will work if he doesn't have the camera parameters? – en_Knight Jun 08 '16 at 13:55
  • It will probably work without it, can use the 'GML C++ Camera Calibration Toolbox' or similar if you don't know the camera parameters – Peter Wishart Jun 08 '16 at 14:24