4

I'm having a strange issue using Jupyter to plot some simple data. There is a lot of nuance to my specific use-case, not the least of which is a Jupyter notebook connected to our cloud-based Spark cluster with a PySpark kernel.

I can't, for the life of me, figure out why this simple code will not run without error. In reality I have to have the code set up like this, because instead of "x" and "y" I'm dealing with a data frame sourced from a Hive query - using the %sql magic and manipulating it before I get ready to plot it.

Here's a re-creation of the code within my Jupyter notebook - wherein I'm trying to illustrate the separation of code cells. I've tried every combination of ordering the cells and whatnot, I can't fathom why it tells me the "x" variable is not defined.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5, 6, 7, 8]
y = [20, 21, 20.5, 20.81, 21.0, 21.48, 22.0, 21.89]

%matplotlib inline
xyplt = plt.plot(x, y, linestyle='dashed', marker='o', color='red')

NameError                                 Traceback (most recent call last)
<ipython-input-34-747daa8afe0d> in <module>()
      1 get_ipython().magic(u'pylab inline')
----> 2 xyplt = plt.plot(x, y, linestyle='dashed', marker='o', color='red')

NameError: name 'x' is not defined
Austin T
  • 107
  • 1
  • 13
  • I ran this in a notebook and it worked for me. Did you try restarting the kernel and running? Is there any other definition of `x`? Also, did you definitely run the cell with `x` and `y` defined in it? – nbryans Jun 16 '16 at 19:21
  • Yes I confirmed the code, ran in this order, produces the error. I am still researching but I think the integration with PySpark could be the confounding variable here. – Austin T Jun 17 '16 at 13:08
  • I guess you could also change `x` and `y` to `xx` and `yy` to see if that fixes things (i.e. in case pyspark is defining its own version o f `x`) – nbryans Jun 17 '16 at 13:12
  • Just tried that now -- unfortunately no change in result. Using that `%matplotlib inline` causes Python to not recognize variables in that cell. Maddening. – Austin T Jun 17 '16 at 19:49
  • Ok. I'm probably not going to be much help. I did find [this thread](http://stackoverflow.com/questions/19410042/how-to-make-ipython-notebook-matplotlib-plot-inline) where they define it at the very top of the notebook (even before import statements). Maybe take a look there. Goodluck! – nbryans Jun 17 '16 at 22:18
  • 1
    I think this is a known limitation in PySpark. As a work-around I used the `%local` magic to do my plotting from the Head node only. – Austin T Jul 19 '16 at 15:20
  • hi, I have the same issue - how did u workaround it? – Ofer Eliassaf Jan 17 '17 at 09:38
  • @OferEliassaf this is caused by using the PySpark kernel. You must use the %local magic to bring your data to the local context (i.e. only on the head-node of the Spark cluster), then you can use matplotlib to plot. – Austin T Jan 18 '17 at 12:40
  • i don't understand what is head-node of the spark cluster. – Ofer Eliassaf Jan 19 '17 at 13:59
  • @AustinT i asked the developers of sparkmagic (pyspark kernel) and got another answer: https://github.com/jupyter-incubator/sparkmagic/issues/322 – Ofer Eliassaf Jan 19 '17 at 13:59

0 Answers0