Improve real-life results of neural network trained with mnist dataset

Question

I've built a neural network with keras using the mnist dataset and now I'm trying to use it on photos of actual handwritten digits. Of course I don't expect the results to be perfect but the results I currently get have a lot of room for improvement.

For starters I test it with some photos of individual digits written in my clearest handwriting. They are square and they have the same dimensions and color as the images in the mnist dataset. They are saved in a folder called individual_test like this for example: 7(2)_digit.jpg.

The network often is terribly sure of the wrong result which I'll give you an example for:

The results I get for this picture are the following:

result:  3 . probabilities:  [1.9963557196245318e-10, 7.241294497362105e-07, 0.02658148668706417, 0.9726449251174927, 2.5416460047722467e-08, 2.6078915027483163e-08, 0.00019745019380934536, 4.8302300825753264e-08, 0.0005754049634560943, 2.8358477788259506e-09]

So the network is 97% sure this is a 3 and this picture is by far not the only case. Out of 38 pictures only 16 were correctly recognised. What shocks me is the fact that the network is so sure of its result although it couldn't be farther from the correct result.

EDIT
After adding a threshold to prepare_image (img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1]) the performance has slightly improved. It now gets 19 out of 38 pictures right but for some images including the one shown above it still is pretty sure of the wrong result. This is what I get now:

result:  3 . probabilities:  [1.0909866760000497e-11, 1.1584616004256532e-06, 0.27739930152893066, 0.7221096158027649, 1.900260038212309e-08, 6.555900711191498e-08, 4.479645940591581e-05, 6.455550760620099e-07, 0.0004443934594746679, 1.0013242457418414e-09]

So it now is only 72% sure of its result which is better but still ...

What can I do to improve the performance? Can I prepare my images better? Or should I add my own images to the training data? And if so, how would I do such a thing?

EDIT

This is what the picture displayed above looks like after applying prepare_image to it:

After using threshold this is what the same picture looks like:

In comparison: This is one of the pictures provided by the mnist dataset:

They look fairly similar to me. How can I improve this?
Here's my code (including threshold):

# import keras and the MNIST dataset
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from keras.utils import np_utils
# numpy is necessary since keras uses numpy arrays
import numpy as np

# imports for pictures
import matplotlib.pyplot as plt
import PIL
import cv2

# imports for tests
import random
import os

class mnist_network():
    def __init__(self):
        """ load data, create and train model """
        # load data
        (X_train, y_train), (X_test, y_test) = mnist.load_data()
        # flatten 28*28 images to a 784 vector for each image
        num_pixels = X_train.shape[1] * X_train.shape[2]
        X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype('float32')
        X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype('float32')
        # normalize inputs from 0-255 to 0-1
        X_train = X_train / 255
        X_test = X_test / 255
        # one hot encode outputs
        y_train = np_utils.to_categorical(y_train)
        y_test = np_utils.to_categorical(y_test)
        num_classes = y_test.shape[1]


        # create model
        self.model = Sequential()
        self.model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
        self.model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
        # Compile model
        self.model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

        # train the model
        self.model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

        self.train_img = X_train
        self.train_res = y_train
        self.test_img = X_test
        self.test_res = y_test


    def predict_result(self, img, show = False):
        """ predicts the number in a picture (vector) """
        assert type(img) == np.ndarray and img.shape == (784,)

        if show:
            img = img.reshape((28, 28))
            # show the picture
            plt.imshow(img, cmap='Greys')
            plt.show()
            img = img.reshape(img.shape[0] * img.shape[1])

        num_pixels = img.shape[0]
        # the actual number
        res_number = np.argmax(self.model.predict(img.reshape(-1,num_pixels)), axis = 1)
        # the probabilities
        res_probabilities = self.model.predict(img.reshape(-1,num_pixels))

        return (res_number[0], res_probabilities.tolist()[0])    # we only need the first element since they only have one


    def prepare_image(self, img, show = False):
        """ prepares the partial images used in partial_img_rec by transforming them
            into numpy arrays that the network will be able to process """
        # convert to greyscale
        img = img.convert("L")
        # rescale image to 28 *28 dimension
        img = img.resize((28,28), PIL.Image.ANTIALIAS)
        # inverse colors since the training images have a black background
        #img =  PIL.ImageOps.invert(img)
        # transform to vector
        img = np.asarray(img, "float32")
        img = img / 255.
        img[img < 0.5] = 0.

        img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1]

        if show:
            plt.imshow(img, cmap = "Greys")

        # flatten image to 28*28 = 784 vector
        num_pixels = img.shape[0] * img.shape[1]
        img = img.reshape(num_pixels)

        return img


    def partial_img_rec(self, image, upper_left, lower_right, results=[], show = False):
        """ partial is a part of an image """
        left_x, left_y = upper_left
        right_x, right_y = lower_right

        print("current test part: ", upper_left, lower_right)
        print("results: ", results)
        # condition to stop recursion: we've reached the full width of the picture
        width, height = image.size
        if right_x > width:
            return results

        partial = image.crop((left_x, left_y, right_x, right_y))
        if show:
            partial.show()
        partial = self.prepare_image(partial)

        step = height // 10

        # is there a number in this part of the image? 
        res, prop = self.predict_result(partial)
        print("result: ", res, ". probabilities: ", prop)
        # only count this result if the network is at least 50% sure
        if prop[res] >= 0.5:        
            results.append(res)
            # step is 80% of the partial image's size (which is equivalent to the original image's height) 
            step = int(height * 0.8)
            print("found valid result")
        else:
            # if there is no number found we take smaller steps
            step = height // 20 
        print("step: ", step)
        # recursive call with modified positions ( move on step variables )
        return self.partial_img_rec(image, (left_x + step, left_y), (right_x + step, right_y), results = results)

    def individual_digits(self, img):
        """ uses partial_img_rec to predict individual digits in square images """
        assert type(img) == PIL.JpegImagePlugin.JpegImageFile or type(img) == PIL.PngImagePlugin.PngImageFile or type(img) == PIL.Image.Image

        return self.partial_img_rec(img, (0,0), (img.size[0], img.size[1]), results=[])

    def test_individual_digits(self):
        """ test partial_img_rec with some individual digits (shape: square) 
            saved in the folder 'individual_test' following the pattern 'number_digit.jpg' """
        cnt_right, cnt_wrong = 0,0
        folder_content = os.listdir(".\individual_test")

        for imageName in folder_content:
            # image file must be a jpg or png
            assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
            correct_res = int(imageName[0])
            image = PIL.Image.open(".\\individual_test\\" + imageName).convert("L")
            # only square images in this test
            if image.size[0]  != image.size[1]:
                print(imageName, " has the wrong proportions: ", image.size,". It has to be a square.")
                continue 
            predicted_res = self.individual_digits(image)

            if predicted_res == []:
                print("No prediction possible for ", imageName)
            else:
                predicted_res = predicted_res[0]

            if predicted_res != correct_res:
                print("error in partial_img-rec! Predicted ", predicted_res, ". The correct result would have been ", correct_res)
                cnt_wrong += 1
            else:
                cnt_right += 1
                print("correctly predicted ",imageName)
        print(cnt_right, " out of ", cnt_right + cnt_wrong," digits were correctly recognised. The success rate is therefore ", (cnt_right / (cnt_right + cnt_wrong)) * 100," %.")

    def multiple_digits(self, img):
        """ takes as input an image without unnecessary whitespace surrounding the digits """

        #assert type(img) == myImage
        width, height = img.size
        # start with the first square part of the image
        res_list = self.partial_img_rec(img, (0,0),(height ,height), results = [])
        res_str = ""
        for elem in res_list:
            res_str += str(elem)
        return res_str

    def test_multiple_digits(self):
        """ tests the function 'multiple_digits' using some images saved in the folder 'multi_test'.
            These images contain multiple handwritten digits without much whitespac surrounding them.
            The correct solutions are saved in the files' names followed by the characte '_'. """

        cnt_right, cnt_wrong = 0,0
        folder_content = os.listdir(".\multi_test")
        for imageName in folder_content:
            # image file must be a jpg or png
            assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"            
            image = PIL.Image.open(".\\multi_test\\" + imageName).convert("L")

            correct_res = imageName.split("_")[0]
            predicted_res = self.multiple_digits(image)
            if correct_res == predicted_res:
                cnt_right += 1
            else:
                cnt_wrong += 1
                print("Error in multiple_digits! The network predicted ", predicted_res, " but the correct result would have been ", correct_res)

        print("The network predicted correctly ", cnt_right, " out of ", cnt_right + cnt_wrong, " pictures. That's a success rate of ", cnt_right / (cnt_right + cnt_wrong) * 100, "%.")

network = mnist_network()
# this is the image shown above
result = network.individual_digits(PIL.Image.open(".\individual_test\\7(2)_digit.jpg"))

You should try more epochs for training as 10 is not enough for proper learning, you can change parameters such as batch size, learning rate, optimizer etc, your preprocessing is proper but as one of the answers suggests, you should do thresholding as well. After that, you should try different number of layers and neurons. Also, you can look at convolutional networks which work better for images. — SajanGohil, Dec 30 '19 at 19:14
@SajanGohil in this case with these parameters 10 epoch is more than enough, since it starts to overfit at epoch 5. — Geeocode, Dec 31 '19 at 01:47
If the trouble is usually on 7 it may be due to your use of a european 7 (with a slash) in test vs. american 7 (no slash) in train. — jeremy_rutman, Jan 01 '20 at 02:23
@jeremy_rutman It is the case with lots of other digits as well but I chose this particular 7 because it shows the problem really well. I tried feeding it american digits but it didn't work much better. — Johanna, Jan 01 '20 at 10:57
2 possible explanations for what you are seeing: 1. You applied a preprocessing step when training and didn’t do the same on your test set. 2. You are seeing distribution differences (domain shift) i.e your training examples are different from your testing example. In that case you need to look at fine-tuning (train on MNIST dataset and check you got good accuracy on validation then fine tune the model on a few examples from your dataset) — Ahmad Baracat, Jan 01 '20 at 21:40
@jeremy_rutman Though I didn't mentioned in my answer, because of the extent of it, I tested this scenario as well and found that the slashed version is underpresented in MNIST but esists. Thus if I made the preprocessing (see my answer below) the slashed bersion also became correctly classified. — Geeocode, Jan 02 '20 at 13:48

Geeocode · Accepted Answer · 2020-01-02T13:52:46.070

Update:

You have three options to achive a better performance in this particular task:

Use Convolutional network as it performs better in tasks with spatial data, like images and are more generative classifier, like this one.
Use or Create and/or generate more pictures of your types and train your network with them your network to be able to learn them too.
Preprocess your images to be better aligned to the original MNIST images, against which you trained your network before.

I've just made an experiment. I checked the MNIST images regarding one represented number each. I took your images and made some preprocessing I proposed to you earlier like:

1. made some threshold, but just downwards eliminating the background noice because the original MNIST data has some minimal threshold only for the blank background:

image[image < 0.1] = 0.

2. Surprisingly the size of the number inside of the image has proved to be crucial, so I scaled the number inside of the 28 x 28 image e.g. we have more padding around the number.

3. I inverted the images as the MNIST data from keras has inverted also.

image = ImageOps.invert(image)

4. Finally scaled data with, as we did it at the training as well:

image = image / 255.

After the preprocessing I trained the model with MNIST dataset with the parameters epochs=12, batch_size=200 and the results:

Result: 1 with probabilities: 0.6844741106033325

 result:  **1** . probabilities:  [2.0584749904628552e-07, 0.9875971674919128, 5.821426839247579e-06, 4.979299319529673e-07, 0.012240586802363396, 1.1566483948399764e-07, 2.382085284580171e-08, 0.00013023221981711686, 9.620113416985987e-08, 2.5273093342548236e-05]

Result: 6 with probabilities: 0.9221984148025513

result:  6 . probabilities:  [9.130864782491699e-05, 1.8290626258021803e-07, 0.00020504613348748535, 2.1564576968557958e-07, 0.0002401985548203811, 0.04510130733251572, 0.9221984148025513, 1.9014490248991933e-07, 0.03216308355331421, 3.323434683011328e-08]

Result: 7 with probabilities: 0.7105212807655334 Note:

result:  7 . probabilities:  [1.0372193770535887e-08, 7.988557626958936e-06, 0.00031014863634482026, 0.0056108818389475346, 2.434678014751057e-09, 3.2280522077599016e-07, 1.4190952857262573e-09, 0.9940618872642517, 1.612859932720312e-06, 7.102244126144797e-06]

Your number 9 was a bit tricky:

As I figured out the model with MNIST dataset picked up two main "features" regarding 9. Upper and lower parts. Upper parts with nice round shape, as on your image, is not a 9, but mostly 3 for your model trained against the MNIST dataset. Lower part of 9 is mostly a straighten curve as per the MNIST dataset. So basicly your perfect shaped 9 is always a 3 for your model because of the MNIST samples, unless you will train again the model with sufficiant amount of samples of your shaped 9. In order to check my thoughts I made a subexperiment with 9s:

My 9 with skewed upper parts (mostly OK for 9 as per MNIST) but with slightly curly bottom (Is not OK for 9 as per MNIST):

Result: 9 with probabilities: 0.5365301370620728

My 9 with skewed upper parts (mostly OK for 9 as per MNIST) and with straight bottom (Is OK for 9 as per MNIST):

Result: 9 with probabilities: 0.923724353313446

Your 9 with the misinterpreted shape properties:

Result: 3 with probabilities: 0.8158268928527832

result:  3 . probabilities:  [9.367801249027252e-05, 3.9978775021154433e-05, 0.0001467708352720365, 0.8158268928527832, 0.0005801069783046842, 0.04391581565141678, 6.44062723154093e-08, 7.099170943547506e-06, 0.09051419794559479, 0.048875387758016586]

Finally just a proof for the image scaling (padding) importance what I mentioned as crucial above:

Result: 3 with probabilities: 0.9845736622810364

Result: 9 with probabilities: 0.923724353313446

So we can see that our model picked up some features, which it interprets, classifies always as 3 in the case of an oversized shape inside of the image with low padding size.

I think that we can get a better performance with CNN, but the way of sampling and preprocessing is always crucial for getting the best performance in an ML task.

I hope it helps.

Update 2:

I found another issue, what I checked as well and proved to be true, that the placement of number inside of image is crucial as well, which makes sense by this type of NN. A good example the number 7 and 9 which have been placed of center in MNIST dataset, near to bottom of the image resulted in harder or flase classification if we place the new number for classifying in the center of image. I checked the theory shifting the 7s and 9s towards to the bottom, so lefting more place at the top of the image and the result was almost 100% accuracy. As this is a spatial type problem, I guess that, with CNN we could eliminate it with more effectiveness. However would be better, if MNIST was alligned to center, or we can do it programatically to avoid the issue.

@Johanna In this case I would train with MINST first, then I retrain the trained model with your new images and with their augmented samples. — Geeocode, Dec 30 '19 at 22:26
@Johanna tomorrow I will looking at it again, since it is quite suspicious for me, though I would be interested in what would be the results after training the net with your images. FYI I tried with CNN but the result is almost the same. — Geeocode, Dec 31 '19 at 01:51
I also did these kind of tests and I can confirm that a bit of padding around the digit can improve results. I had a 1 digit which padded by one pixel to left or right gave me completely different results. @Geeocode do you think that augmenting data inside MNIST would be a solution ? — lucians, Sep 28 '20 at 14:00
@lucians Definitely. To be honest, when I first got the results, I didn't want to believe to my own eyes. Then I repeated the tests and study from other aspects as well, so I think augmenting would improve the plain results anyway. — Geeocode, Sep 28 '20 at 22:54

score 1 · Answer 2 · answered Dec 30 '19 at 18:38

1

What was your test score,on MNIST dataset? And one thing that is coming to my mind that your images are missing thresholding,

Thresholding is a technique where the pixel value below a certain pixel is made to zero,See OpenCV thresholding examples anywhere,You probaly need to use inverse thresholding and check your results again.

Do,inform if there is some progress.

answered Dec 30 '19 at 18:38

MbeforeL

1,340
7
20

The test score was somewhere between 95 and 98 %. What thresholding is concerned - how is that different from ` img[img < 0.5] = 0.` ? – Johanna Dec 30 '19 at 20:54
@Johanna the pixels higher than 0.5 will go to 255 which won't happen with `img[img<0.5]=0` – SajanGohil Dec 31 '19 at 07:10
1

Johanna , you can yourself see the difference, your images are grayscale not thresholded, the light gray part should be full white.Try to achieve that thing. – MbeforeL Dec 31 '19 at 09:07
I used threshold on the images which has slightly improved the performance but it didn't solve the problem. Please have a look at the edited post. – Johanna Dec 31 '19 at 17:32
@SourabhSinha The MNIST dataset is only thresholded downwards like img[img<0.1]=0 just for the white background. Just check it, I did. – Geeocode Dec 31 '19 at 20:54

score 0 · Answer 3 · answered Dec 30 '19 at 19:04

0

The main problem you have is that the images you are testing are different from the MNIST images, probably due to the preparation of images you have done, can you show an image from the ones you are testing with after that you apply prepare_image on it.

answered Dec 30 '19 at 19:04

hola

512
4
17

Thank you for your answer. Please have a look at my edited post where I show the picture after applying prepare_image to it. – Johanna Dec 30 '19 at 20:39
I think you may have one of two problems, your images are color inverted (the black is white and vice-versa), or the pixxels distribution in you test imagesare really different in comparaison to the training images, I think both problems can be solved if you use convolutional layers instead of Dense layers – hola Dec 30 '19 at 20:47

Improve real-life results of neural network trained with mnist dataset

3 Answers3

Update:

Linked