Improve OCR of image without scaling (using PIL, pixbuf)?

Question

I'm trying to make OCR-recognition on a screenshot, after screenshot taken (of desktop's region, on which you clicked) it goes to pibxbuffer, which content goes to pytesseract. But after using pixbuffer image quality is bad: it's skew (I tried to save it in a directory, instead of pixbuffer, and looked at it).

def takeScreenshot(self, x, y, width = 150, height = 30): 
    self.width=width 
    self.height=height 
    window = Gdk.get_default_root_window() 
    #x, y, width, height = window.get_geometry() 

    #print("The size of the root window is {} x {}".format(width, height)) 

    # get_from_drawable() was deprecated. See: 
    # https://developer.gnome.org/gtk3/stable/ch24s02.html#id-1.6.3.4.7 
    pixbufObj = Gdk.pixbuf_get_from_window(window, x, y, width, height) 
    height = pixbufObj.get_height() 
    width = pixbufObj.get_width() 
    image = Image.frombuffer("RGB", (width, height), 
                             pixbufObj.get_pixels(), 'raw', 'RGB', 0, 1) 
    image = image.resize((width*20,height*20), Image.ANTIALIAS) 
    #image.save("saved.png") 
    print(pytesseract.image_to_string(image)) 

    print("takenScreenshot:",x,y)

When I saved image to a directory it was ok (quality) and recognition was good.
Tried without Image.ANTIALIAS - makes no difference.

(Purpose of scaling by 20: I tried code which recognized image saved in a directory, without scaling quality of recognition was bad.)

The bad picture

THE PROBLEM IS THAT IMAGE IS SKEWED.

I was wondering if `Image.ANTIALIAS` was making the difference. That doesn't seem to be case. If I were to make a good guess, I would tell you that scaling the image 20x has probably given bigger scope for decision boundaries with minimal loss of pixel-accuracy. This means that when you scaled the image, it was easier for **Tesseract** to _tell_ the edges of the characters. — Quirk, Dec 16 '15 at 20:08
Looks like your image width is wrong, This is why your image looks skewed. Double check your image sizes! — Mailerdaimon, Dec 17 '15 at 13:42
@Mailerdaimon can you be more specific: how can it be wrong, where did I make it wrong ? — George J, Dec 17 '15 at 15:42
@GeorgeJ If your Width is, lets say, 2 Pixels too big, each row two pixels which belong to the next row are displayed in the current row. This makes an image look skewed. The black diagonal line on the left side looks like another hint to that problem. Where the problem comes from needs some debugging and printing out width and height of your image and constantly checking how the image displays. — Mailerdaimon, Dec 18 '15 at 07:00

score 2 · Answer 1 · answered Dec 16 '15 at 20:08

2

Such extreme scaling is generally bad for OCR, particularly in full color and with special processing (antialiasing)

I would:

upscale less (none?), or use NEAREST
convert to grayscale immediately after loading (to avoid the artifacts you're seeing):
```
image = image.convert('L')
```

answered Dec 16 '15 at 20:08

whunterknight

81
6

load from where ? problem for recognition is that image is skewed. – George J Dec 16 '15 at 20:14
grayscale prior to resize will help. Will create more cohesive edges, rather than warped ones where you can see the colors. – whunterknight Dec 16 '15 at 20:17
thanks for advice (this is not solves the main problem) – George J Dec 16 '15 at 20:25
Is the amount of skew consistent? If so, check out the AFFINE transform of PIL: http://stackoverflow.com/questions/14177744/how-does-perspective-transformation-work-in-pil – whunterknight Dec 16 '15 at 20:25

Shubham Vasaikar · Answer 2 · 2017-02-20T06:25:02.893

I don't know if you're still looking for a solution, but i ran into the same problem of the image being skewed. This is some kind of padding issue with GdkPixBuf. Basically, height and width of the image should always be divisible by 8. So this is what I do before taking the screenshot:

width = width + (8 - (width % 8))
height = height + (8 - (height % 8))

The screenshot should work after doing this.

You can read more about the issue here

Improve OCR of image without scaling (using PIL, pixbuf)?

2 Answers2