How Can I only keep text with specific color from image via opencv and python?

Question

I have some invoice image with some text overlapping, which make some trouble for later processing, and what I only is the text in black. some I want to remove the text which is in other colors.

is there any way to achieve this?

the image is attached as example.

I have tried to solve it with opencv, but i still can't solve this:

import numpy as np import cv2
img = cv2.imread('11.png')

lower = np.array([150,150,150]) 

upper = np.array([200,200,200])

mask = cv2.inRange(img, lower, upper) 
res = cv2.bitwise_and(img, img, mask=mask) 
cv2.imwrite('22.png',res)

[image with multiple color][1]

[1]: https://i.stack.imgur.com/nWQrV.pngstrong text

score 5 · Accepted Answer · answered Dec 30 '18 at 15:26

The text is darker and less saturated. And as suggested as @J.D. the HSV color space is good. But his range is wrong.

In OpenCV, the H ranges in [0, 180], while the S/V ranges in [0, 255]

Here is a colormap I made in the last year, I think it's helpful.

(1) Use cv2.inRange

(2) Just threshold the V(HSV) channel:

th, threshed = cv2.threshold(v, 150, 255, cv2.THRESH_BINARY_INV)

(3) Just threshold the S(HSV) channel:

th, threshed2 = cv2.threshold(s, 30, 255, cv2.THRESH_BINARY_INV)

The result:

The demo code:

# 2018/12/30 22:21 
# 2018/12/30 23:25 

import cv2 

img = cv2.imread("test.png")
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
h,s,v = cv2.split(hsv)

mask = cv2.inRange(hsv, (0,0,0), (180, 50, 130))
dst1 = cv2.bitwise_and(img, img, mask=mask)

th, threshed = cv2.threshold(v, 150, 255, cv2.THRESH_BINARY_INV)
dst2 = cv2.bitwise_and(img, img, mask=threshed)

th, threshed2 = cv2.threshold(s, 30, 255, cv2.THRESH_BINARY_INV)
dst3 = cv2.bitwise_and(img, img, mask=threshed2)

cv2.imwrite("dst1.png", dst1)
cv2.imwrite("dst2.png", dst2)
cv2.imwrite("dst3.png", dst3)

I think I should make my question more concreted and completed, the reason why I want to only keep text with specific color(black/red) is that the text with several color overlapping in the image, which makes OCR impossible to recognize those overlapped texts. your solution can figure out the text with black color, but the image quality is dropped a lot, which make it hard for later OCR recognition. — jianhua zhou, Jan 01 '19 at 12:42

J.D. · Answer 2 · 2018-12-30T21:27:45.187

Converting to the HSV colorspace makes selecting colors easier.

The code below does what you want. Result:

import numpy as np 
import cv2

kernel = np.ones((2,2),np.uint8)
# load image
img = cv2.imread("image.png")

# Convert BGR to HSV
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)

# define range of black color in HSV
lower_val = np.array([0,0,0])
upper_val = np.array([179,100,130])

# Threshold the HSV image to get only black colors
mask = cv2.inRange(hsv, lower_val, upper_val)

# Bitwise-AND mask and original image
res = cv2.bitwise_and(img,img, mask= mask)
# invert the mask to get black letters on white background
res2 = cv2.bitwise_not(mask)

# display image
cv2.imshow("img", res)
cv2.imshow("img2", res2)
cv2.waitKey(0)
cv2.destroyAllWindows()

To change the level of black selected, tweak from the upper_val, the value currently set at 130. Higher = allow lighter shades (it's called the Value). Also the value currently at 100: lower = allow less color (actually: saturation). Read more about the HSV colorspace here.

I always find the image below very helpfull. The bottom 'disc' is all black. As you move up in Value, lighter pixels are also selected. The pixels with low saturation stay shades of gray until white (the center), the pixels with high saturation get colored(the edge).That's why you tweak those values.

Edit: As @Silencer pointed out, my range was off. Fixed it.

I think I should make my question more concreted and completed, the reason why I want to only keep text with specific color(black/red) is that the text with several color overlapping in the image, which makes OCR impossible to recognize those overlapped texts. your solution can figure out the text with black color, but the image quality is dropped a lot, which make it hard for later OCR recognition. — jianhua zhou, Jan 01 '19 at 12:43
If you zoom in on the letters in the top left of your image, you can see why this is difficult. To much light-gray-ish pixels. I tried increasing contrast but that didn't help much. Getting even, high contrast images with more pixels will increase the quality of the result... — J.D., Jan 01 '19 at 19:58

How Can I only keep text with specific color from image via opencv and python?

2 Answers2

Linked