Why CNN running in python is extremely slow in comparison to Matlab?

Question

I have trained a CNN in Matlab 2019b that classifies images between three classes. When this CNN was tested in Matlab it was functioning fine and only took 10-15 seconds to classify an image. I used the exportONNXNetwork function in Maltab so that I can implement my CNN in Tensorflow. This is the code I am using to use the ONNX file in python:

import onnx
from onnx_tf.backend import prepare 
import numpy as np
from PIL import Image 

onnx_model = onnx.load('trainednet.onnx')
tf_rep = prepare(onnx_model)
filepath = 'filepath.png' 

img = Image.open(filepath).resize((224,224)).convert("RGB") 
img = array(img).transpose((2,0,1))
img = np.expand_dims(img, 0) 
img = img.astype(np.uint8) 

probabilities = tf_rep.run(img) 
print(probabilities)

When trying to use this code to classify the same test set, it seems to be classifying the images correctly but it is very slow and freezes my computer as it reaches high memory usages of up to 95+% at some points.

I also noticed in the command prompt while classifying it prints this:

2020-04-18 18:26:39.214286: W tensorflow/core/grappler/optimizers/meta_optimizer.cc:530] constant_folding failed: Deadline exceeded: constant_folding exceeded deadline., time = 486776.938ms.

Is there any way I can make this python code classify faster?

I recommend you first check a few things: 1. do you use your GPU with python and Matlab? 2. What takes 15sec (Matlab) or more (python), is it the classification itself or loading model and image manipulation? 3. When is the memory full, after one image load? Also, what operating system are you using? — Yuval Harpaz, May 10 '20 at 09:02
Have you tried running a profiler to see where you're bottleneck is? See https://docs.python.org/3/library/profile.html#module-cProfile and https://toucantoco.com/en/tech-blog/tech/python-performance-optimization — Matt L., May 10 '20 at 14:17
You should not load all images at once! every image loaded is `int` type, but for faster learning you should scale it to `<0, 1>` which is obviously float32 or 64, and they consume much more memory. Please reduce image load :P, you can use numpy to convert images to float16, which will have less precision, but also less memory consumption `new_img = np.array(img, dtype=np.float16)` — Grzegorz Krug, May 15 '20 at 20:54

score 2 · Answer 1 · answered May 11 '20 at 22:28

Maybe you could try to understand what part of the code takes a long time this way:

import onnx
from onnx_tf.backend import prepare 
import numpy as np
from PIL import Image 
import datetime

now = datetime.datetime.now()
onnx_model = onnx.load('trainednet.onnx')
tf_rep = prepare(onnx_model)
filepath = 'filepath.png' 
later = datetime.datetime.now()
difference = later - now
print("Loading time : %f ms" % (difference.microseconds / 1000))

img = Image.open(filepath).resize((224,224)).convert("RGB") 
img = array(img).transpose((2,0,1))
img = np.expand_dims(img, 0) 
img = img.astype(np.uint8) 

now = datetime.datetime.now()
probabilities = tf_rep.run(img) 
later = datetime.datetime.now()
difference = later - now
print("Prediction time : %f ms" % (difference.microseconds / 1000))
print(probabilities)

Let me know what the output looks like :)

score 2 · Accepted Answer · answered May 16 '20 at 05:38

In this case, it appears that the Grapper optimization suite has encountered some kind of infinite loop or memory leak. I would recommend filing an issue against the Github repo.

It's challenging to debug why constant folding is taking so long, but you may have better performance using the ONNX TensorRT backend as compared to the TensorFlow backend. It achieves better performance as compared to the TensorFlow backend on Nvidia GPUs while compiling typical graphs more quickly. Constant folding usually doesn't provide large speedups for well optimized models.

import onnx
import onnx_tensorrt.backend as backend
import numpy as np

model = onnx.load("trainednet.onnx'")
engine = backend.prepare(model, device='CUDA:1')

filepath = 'filepath.png' 

img = Image.open(filepath).resize((224,224)).convert("RGB") 
img = array(img).transpose((2,0,1))
img = np.expand_dims(img, 0) 
img = img.astype(np.uint8) 
output_data = engine.run(img)[0]
print(output_data)

Mayank Gupta · Answer 3 · 2020-05-16T13:05:42.750

0

You should consider some points while working on TensorFlow with Python. A GPU will be better for work as it fastens the whole processing. For that, you have to install CUDA support. Apart from this, the compiler also sometimes matters. I can tell VSCode is better than Spyder from my experience.

I hope it helps.

edited May 16 '20 at 13:05

answered May 15 '20 at 18:12

Mayank Gupta

57
7

Bas Krahmer · Answer 4 · 2020-05-16T14:03:53.530

Since the command prompt states that your program takes a long time to perform constant folding, it might be worthwhile to turn this off. Based on this documentation, you could try running:

import numpy as np
import timeit
import traceback
import contextlib
import onnx
from onnx_tf.backend import prepare 
from PIL import Image 
import tensorflow as tf

@contextlib.contextmanager
def options(options):
  old_opts = tf.config.optimizer.get_experimental_options()
  tf.config.optimizer.set_experimental_options(options)
  try:
    yield
  finally:
    tf.config.optimizer.set_experimental_options(old_opts)


with options({'constant_folding': False}):

  onnx_model = onnx.load('trainednet.onnx')
  tf_rep - prepare(onnx_model)
  filepath = 'filepath.png' 

  img = Image.open(filepath).resize((224,224)).convert("RGB") 
  img = array(img).transpose((2,0,1))
  img = np.expand_dims(img, 0) 
  img = img.astype(np.uint8) 

  probabilities = tf_rep.run(img)
  print(probabilities)

This disables the constant folding performed in the TensorFlow Graph optimization. This can work both ways: on the one hand it will not reach the constant folding deadline, but on the other hand disabling constant folding can result in significant runtime increases. Anyway it is worth trying, good luck!

Why CNN running in python is extremely slow in comparison to Matlab?

4 Answers4