23

This is very close to this question, but I have added a few details specific to my question:

Matplotlib Plotting using AWS-EMR jupyter notebook

I would like to find a way to use matplotlib inside my Jupyter notebook. Here is the code-snippet in error, it's fairly simple:

notebook

import matplotlib
matplotlib.use("agg")
import matplotlib.pyplot as plt
plt.plot([1,2,3,4])
plt.show()

I chose this snippet because this line alone fails as it tries to use TKinter (which is not installed on an AWS EMR cluster):

import matplotlib.pyplot as plt

When I run the full notebook snippet, the result is no runtime error but also nothing happens (no graph is shown.) My understanding on one way this can work is by adding either of the following snips:

pyspark magic notation

%matplotlib inline

results

unknown magic command 'matplotlib'
UnknownMagic: unknown magic command 'matplotlib'

IPython explicit magic call

from IPython import get_ipython
get_ipython().run_line_magic('matplotlib', 'inline')

results

'NoneType' object has no attribute 'run_line_magic'
Traceback (most recent call last):
AttributeError: 'NoneType' object has no attribute 'run_line_magic'

to my notebook which invokes a spark magic command which inlines matplotlib plots (at least that's my interpretation.) I have tried both of these after using a bootstrap action:

EMR bootstrap

sudo pip install matplotlib
sudo pip install ipython

Even with these added, I still get an error that there is no magic for matplotlib. So my question is definitely:

Question

How do I make matplotlib work in an AWS EMR Jupyter notebook?

(Or how do I view graphs and plot images in AWS EMR Jupyter notebook?)

Matt
  • 4,580
  • 2
  • 23
  • 36
  • From the image posted by @FoxanNg, I could see that the jupyter instance is using a conda env(which could a virtualenv created for Jupyter). Could we try installing `matplotlib` using `conda` instead of `pip`, in the bootstrap and give it a try. – DaRkMaN May 26 '19 at 06:08
  • When trying to invoke conda in my bootstrap file it does not know where to find it (it gets a command not found error.) – Matt May 27 '19 at 14:45
  • I am not sure how the cluster is setup. But from the image looks like `/opt/conda/bin/conda'. Can we use the full path to install? – DaRkMaN May 27 '19 at 14:52
  • it doesn't think conda is installed at bootstrap: `/opt/conda/bin/conda: command not found` – Matt May 28 '19 at 23:46
  • Started an EMR cluster, and found that it doesn't provide conda support by default. Could you confirm if we are not installing Conda via bootstrap? – DaRkMaN May 29 '19 at 11:15
  • We are not installing conda during bootstrap – Matt May 29 '19 at 21:58
  • The `%` commands are IPython or Jupyter magic commands. Run `%lsmagic` and check `%matplotlib` is among them. If `%matplotlib` is found, run `%matplotlib -l` to list available backends. You can explicitly require a specific backend by running `%matplotlib ` – Nizam Mohamed May 30 '19 at 17:25
  • If you can't see `%matplotlib` among the output of `%lsmagic`, try `%pylab`. It justs imports `matplotlib` and `numpy` . If you want help for a particular magic command try `%command?` – Nizam Mohamed May 30 '19 at 17:28
  • @Matt Install matplotlb like this and then try: sudo python3 -m pip install matplotlib – Aman Mundra Jun 08 '19 at 18:15

5 Answers5

6

As you mentioned, matplotlib is not installed on the EMR cluster, therefore such error will occur:

error

However, it is actually available in the managed Jupyter notebook instance (the docker container). Using the %%local magic will allow you to run the cell locally:

local

Foxan Ng
  • 5,969
  • 4
  • 25
  • 37
  • 1
    This answer makes the first cell (where I put `%%local`) run much more quickly but adding any additional imports (such as tensorflow) fails despite being installed and working previously. Upvoting because it technically makes the code snippet run, but not accepting because it renders the notebook nearly unusable. – Matt May 23 '19 at 19:43
  • is not there a way to install matplotlib in that docker container??. I mean, maybe with `conda` – Kenry Sanchez May 30 '19 at 04:28
  • https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub-install-kernels-libs.html – Kenry Sanchez May 30 '19 at 04:29
  • 1
    This answer may be applicable to other contexts but it is not working on an AWS Jupyter notebook connected to an EMR cluster running the Sparkmagic (PySpark) kernel. – Pablo Adames Sep 15 '20 at 15:38
5

The answer by @00schneider actually works.

import matplotlib.pyplot as plt

# plot data here
plt.show()

after

plt.show()

re-run the magic cell that contains the below, and you will see a plot on your AWS EMR Jupyter PySpark notebook

%matplot plt
Madaditya
  • 73
  • 1
  • 8
  • This is the command that shows the plot. Finally, an answer from someone that was having the same use case of the original question and mine, an AWS Sagemaker Jupyter notebook connected to a Sparkmagic (PySpark) kernel – Pablo Adames Sep 15 '20 at 13:34
  • When I run %matplot plt after the plot I get error: UsageError: Cell magic `%%matplot` not found. – Yue Y Feb 05 '21 at 00:11
  • For me that just outputs a really long string that is maybe a base64 representation of the image. – Nic Scozzaro Mar 18 '21 at 18:22
2

The following should work:

import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([1,2,3,4])

Run the entire script in one cell

  • 3
    I appreciate the attempted solution. I tried to address this in the question but this fails on the line `%matplotlib inline` with this error (I'll add the error to the original question): unknown magic command 'matplotlib' UnknownMagic: unknown magic command 'matplotlib' – Matt May 22 '19 at 21:14
  • 3
    okay..one more attempt..could you try `get_ipython().magic(u'matplotlib inline')` instead of `%matplotlib inline` – Prachiti Prakash Prabhu May 22 '19 at 21:30
  • 1
    Thanks, but unfortunately `get_ipython()` returns `None` and thus `get_ipython().magic()` fails :( – Matt May 22 '19 at 21:33
  • Using @Matt recommendation I get `name 'get_ipython' is not defined` – Pablo Adames Sep 15 '20 at 13:28
2

Import matplotlib as

import matplotlib.pyplot as plt

and use the magic command %matplot plt instead as shown in the tutorial here: https://aws.amazon.com/de/blogs/big-data/install-python-libraries-on-a-running-cluster-with-emr-notebooks/

00schneider
  • 435
  • 3
  • 13
-1

Try below code. FYI we have matplotlib 3.1.1 installed in Python3.6 on emr-5.26.0 and i used PySpark Kernel. Make sure that "%matplotlib inline" is first line in cell

%matplotlib inline

import matplotlib
import matplotlib.pyplot as plt
plt.plot([1,2,3,4])
plt.show()
vinay
  • 1
  • 2