3

I'm trying to make OCR-recognition on a screenshot, after screenshot taken (of desktop's region, on which you clicked) it goes to pibxbuffer, which content goes to pytesseract. But after using pixbuffer image quality is bad: it's skew (I tried to save it in a directory, instead of pixbuffer, and looked at it).

def takeScreenshot(self, x, y, width = 150, height = 30): 
    self.width=width 
    self.height=height 
    window = Gdk.get_default_root_window() 
    #x, y, width, height = window.get_geometry() 

    #print("The size of the root window is {} x {}".format(width, height)) 

    # get_from_drawable() was deprecated. See: 
    # https://developer.gnome.org/gtk3/stable/ch24s02.html#id-1.6.3.4.7 
    pixbufObj = Gdk.pixbuf_get_from_window(window, x, y, width, height) 
    height = pixbufObj.get_height() 
    width = pixbufObj.get_width() 
    image = Image.frombuffer("RGB", (width, height), 
                             pixbufObj.get_pixels(), 'raw', 'RGB', 0, 1) 
    image = image.resize((width*20,height*20), Image.ANTIALIAS) 
    #image.save("saved.png") 
    print(pytesseract.image_to_string(image)) 

    print("takenScreenshot:",x,y) 

When I saved image to a directory it was ok (quality) and recognition was good.
Tried without Image.ANTIALIAS - makes no difference.

(Purpose of scaling by 20: I tried code which recognized image saved in a directory, without scaling quality of recognition was bad.)

The bad picture

THE PROBLEM IS THAT IMAGE IS SKEWED.

George J
  • 195
  • 1
  • 12
  • I was wondering if `Image.ANTIALIAS` was making the difference. That doesn't seem to be case. If I were to make a good guess, I would tell you that scaling the image 20x has probably given bigger scope for decision boundaries with minimal loss of pixel-accuracy. This means that when you scaled the image, it was easier for **Tesseract** to _tell_ the edges of the characters. – Quirk Dec 16 '15 at 20:08
  • Looks like your image width is wrong, This is why your image looks skewed. Double check your image sizes! – Mailerdaimon Dec 17 '15 at 13:42
  • @Mailerdaimon can you be more specific: how can it be wrong, where did I make it wrong ? – George J Dec 17 '15 at 15:42
  • @GeorgeJ If your Width is, lets say, 2 Pixels too big, each row two pixels which belong to the next row are displayed in the current row. This makes an image look skewed. The black diagonal line on the left side looks like another hint to that problem. Where the problem comes from needs some debugging and printing out width and height of your image and constantly checking how the image displays. – Mailerdaimon Dec 18 '15 at 07:00

2 Answers2

2

Such extreme scaling is generally bad for OCR, particularly in full color and with special processing (antialiasing)

I would:

  • upscale less (none?), or use NEAREST
  • convert to grayscale immediately after loading (to avoid the artifacts you're seeing):

    image = image.convert('L')
    
2

I don't know if you're still looking for a solution, but i ran into the same problem of the image being skewed. This is some kind of padding issue with GdkPixBuf. Basically, height and width of the image should always be divisible by 8. So this is what I do before taking the screenshot:

width = width + (8 - (width % 8))
height = height + (8 - (height % 8))

The screenshot should work after doing this.

You can read more about the issue here

Shubham Vasaikar
  • 561
  • 6
  • 19