3

Quick summary: matplotlib savefig is too slow to PNG. ...looking for ideas/thoughts on how to speed it up, or alternative libraries (chaco? cairo?)


Updated: Added some (very rough and ready) code to illustrate at the bottom.


I'm using matplotlib (python 3.x, latest anaconda on quad core macbook) to create a plot of a single 1024x1024 np array (of int16's) via imshow(). My goal is to simply produce an annotated image file on disk (no interactive display needed).

The axes is set to fill the figure completely (so no splines/tics etc) and the dpi/size combo is set to match the size of the array - so no scaling/interpolation etc.

On top of that single axes, I'm display 3 text areas and a few (~6) rectangle patches.

...so nothing fancy and pretty much as simple as you can get from a plotting perspective.

However when I save the figure (with savefig) to PNG it takes around ~1.8 seconds (!!!). ...Saving as raw or jpg both come in at around ~0.7 sec.

I tried switching backends to Agg, but that increased the time to about ~2.1 sec for savefig()


Am I wrong in thinking this is too slow? I would prefer to save in PNG, not JPG - but I can't understand why PNG is that much slower than JPG. My goal is to deploy on AWS, so concerned about speed here.

Are there any faster libraries around? (I don't want interactive UI plotting, just basic save-to-file plotting)


Some rough and ready code that approximately illustrates this is below. The output on my machine is:

current backend: MacOSX
default save: 0.4048
default save - float64: 0.3446
full size figure: 0.8105
full size figure - with text/rect: 0.9023
jpg: full size figure - with text/rect: 0.7468
current backend:  agg
AGG: full size figure - with text/rect: 1.3511
AGG: jpg: full size figure - with text/rect: 1.1689

I couldn't (even after repeated trying) get the sample code to reproduce the ~1.7 sec (process time) savefig() that I'm seeing in my app, but I think the code below still illustrates a) jpg is faster than png (or conversely, png seems slow) b) it still seems slow (imo)

So should I not be expecting anything faster than this? ...is that just the speed it is? Are there any faster backends available? When I deploy on AWS (linux) what is the best/fastest backend to use there?


import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon, Rectangle
import time

def some_text(ax):
    pm = u'\u00b1'
    string = f'blah\nblah {pm}blah\nblah blah blah'
    ax.text(10, 10, string, color='red', ha='left')
    ax.text(990, 990, string, color='green', ha='right')
    ax.text(500, 500, string, color='green', ha='center')
    ax.text(500, 500, string, color='green', ha='center', va='top', fontsize=10)
    ax.text(800, 500, string, color='green', ha='center', multialignment='center', fontsize=16)

def some_rect(ax):
    rect = Rectangle((10,10),width=100, height=100, color='red', fill=False)
    ax.add_patch(rect)
    rect = Rectangle((300,10),width=100, height=100, color='yellow', fill=False)
    ax.add_patch(rect)
    rect = Rectangle((300,600),width=50, height=50, color='yellow', fill=False)
    ax.add_patch(rect)
    rect = Rectangle((800,600),width=50, height=50, color='yellow', fill=False)
    ax.add_patch(rect)

dim = 1024
test = np.arange(dim*dim).reshape((dim, dim))
dpi = 150
inches = test.shape[1]/dpi, test.shape[0]/dpi

print('current backend:', matplotlib.get_backend())

plt.imshow(test)
c0 = time.process_time()
plt.savefig('test.png')
print(f'default save: {(time.process_time()-c0):.4f}')
plt.close()

fig, ax = plt.subplots(figsize=inches, dpi=dpi)
fig.subplots_adjust(left=0, right=1, top=1, bottom=0, wspace=0, hspace=0)
ax.imshow(test)
c0 = time.process_time()
plt.savefig('test3.png')
print(f'full size figure: {(time.process_time()-c0):.4f}')

fig, ax = plt.subplots(figsize=inches, dpi=dpi)
fig.subplots_adjust(left=0, right=1, top=1, bottom=0, wspace=0, hspace=0)
ax.imshow(test)
some_text(ax)
some_rect(ax)
c0 = time.process_time()
plt.savefig('test4.png')
print(f'full size figure - with text/rect: {(time.process_time()-c0):.4f}')

fig, ax = plt.subplots(figsize=inches, dpi=dpi)
fig.subplots_adjust(left=0, right=1, top=1, bottom=0, wspace=0, hspace=0)
ax.imshow(test)
some_text(ax)
some_rect(ax)
c0 = time.process_time()
plt.savefig('test5.jpg')
print(f'jpg: full size figure - with text/rect: {(time.process_time()-c0):.4f}')

backend = 'agg'
matplotlib.use(backend, force=True)
import matplotlib.pyplot as plt
print('current backend: ', matplotlib.get_backend())


fig, ax = plt.subplots(figsize=inches, dpi=dpi)
fig.subplots_adjust(left=0, right=1, top=1, bottom=0, wspace=0, hspace=0)
ax.imshow(test)
some_text(ax)
some_rect(ax)
c0 = time.process_time()
plt.savefig('test6.png')
print(f'AGG: full size figure - with text/rect: {(time.process_time()-c0):.4f}')


fig, ax = plt.subplots(figsize=inches, dpi=dpi)
fig.subplots_adjust(left=0, right=1, top=1, bottom=0, wspace=0, hspace=0)
ax.imshow(test)
some_text(ax)
some_rect(ax)
c0 = time.process_time()
plt.savefig('test7.jpg')
print(f'AGG: jpg: full size figure - with text/rect: {(time.process_time()-c0):.4f}')


Richard
  • 1,409
  • 1
  • 7
  • 25
  • Saving a jpg with matplotlib should be slower than a png. Can you show the code that produces the problem? ([mcve]). Also using Agg backend should not take longer (but compared to what?). – ImportanceOfBeingErnest Oct 04 '19 at 12:43
  • Maybe try using PIL? If you want false colour, check this question: https://stackoverflow.com/questions/10965417/how-to-convert-numpy-array-to-pil-image-applying-matplotlib-colormap – kwinkunks Oct 04 '19 at 12:55
  • I just noticed you have text and patches too... I'll leave my answer for now in case it's useful, but without seeing your image I won't try to replicate. – kwinkunks Oct 04 '19 at 13:11
  • I'll get a semi-workable demo to post - the code's part of a 2,000 line script right now, so that's why I didn't initially :D – Richard Oct 04 '19 at 15:51
  • The text and patches don't seem to add significant time (see updated post). Unfortunately I couldn't get my sample code above to reproduce my app code's ~1.7 sec save time (not sure exactly why, but I've put figuring that out to later after a couple of hours of headscratching...) – Richard Oct 04 '19 at 20:25
  • @kwinkunks I thought about moving to PIL as well, but decided not to because in some of the other images I'm needing to save, I need things like arrows, color maps etc etc that would all need implementing at a low level in PIL in order to use it – Richard Oct 04 '19 at 20:26
  • Have you tried profiling? Might help figure out which bit is slow. (I haven't ever tried to dig into a `matplotlib` profile, but maybe it can be done?) – kwinkunks Oct 04 '19 at 20:31
  • I'll try profiling in a few days! ...want to get the script working properly first lol – Richard Oct 05 '19 at 08:51

2 Answers2

1

Try making a PIL image object, for me it's more than 100 times faster than matplotlib:

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

data = np.random.random((100, 100))
cm = plt.get_cmap('viridis')
img = Image.fromarray((cm(data)[:, :, :3] * 255).astype(np.uint8))
img.save('image.png')

If you just want greyscale, you can skip the get_cmap business — just scale your array to the range 0 to 255.

The annotations would have to be added in PIL.

One important difference from using matplotlib is that it's pixel-for-pixel. So if you want to apply some scaling, you'll have to interpolate first. You could use scipy.ndimage.zoom for that.

kwinkunks
  • 4,650
  • 1
  • 15
  • 31
  • 1
    Thanks @kwinkunks. See comment above - I potentially could use PIL, but I need a bit more richness in the available annotations than it has available by default (arrows, ellipses and rectangles rotated etc), so I stuck with matplotlib for that reason. The pixel-for-pixel is OK at the sizes I'm doing (possibly), but the lack of anti-aliasing might show up. – Richard Oct 04 '19 at 20:28
  • Right, makes sense. – kwinkunks Oct 04 '19 at 20:31
0

pip install cv-python Something.

cv2.imwrite is faster than both for this situation.