1

I wrote a script which uses nltk's FreqDist module then converts it into a pandas dataframe. The code snippet is as follows:

    .......
    import unicodedata
    str2 = unicodedata.normalize('NFKD', str1).encode('ascii','ignore')    
    words = nltk.tokenize.word_tokenize(str2)

    fdist = nltk.FreqDist(words)
    df = pd.DataFrame.from_dict(fdist, orient='index').reset_index()
    df = df.rename(columns={'index':'query_word', 0:'count'})
    df2 = df.sort_values(['count'], ascending=[False]) 

Now, I am trying to plot it using plotly for which my code snippet looks as follows:

import plotly.plotly as py
import plotly.graph_objs as go

data = [go.Bar(x= df.query_word, y= df.count)]
py.iplot(data, filename='basic-bar')

When I run this part, I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-87d0c9af254b> in <module>()
----> 1 py.iplot(data, filename='basic-bar')

/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in iplot(figure_or_data, **plot_options)
    150     if 'auto_open' not in plot_options:
    151         plot_options['auto_open'] = False
--> 152     url = plot(figure_or_data, **plot_options)
    153 
    154     if isinstance(figure_or_data, dict):

/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in plot(figure_or_data, validate, **plot_options)
    239 
    240     plot_options = _plot_option_logic(plot_options)
--> 241     res = _send_to_plotly(figure, **plot_options)
    242 
    243     if res['error'] == '':

/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in _send_to_plotly(figure, **plot_options)
   1407     fig = tools._replace_newline(figure)  # does not mutate figure
   1408     data = json.dumps(fig['data'] if 'data' in fig else [],
-> 1409                       cls=utils.PlotlyJSONEncoder)
   1410     credentials = get_credentials()
   1411     validate_credentials(credentials)

/usr/lib/python2.7/json/__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, sort_keys, **kw)
    249         check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    250         separators=separators, encoding=encoding, default=default,
--> 251         sort_keys=sort_keys, **kw).encode(obj)
    252 
    253 

/usr/local/lib/python2.7/dist-packages/plotly/utils.pyc in encode(self, o)
    144 
    145         # this will raise errors in a normal-expected way
--> 146         encoded_o = super(PlotlyJSONEncoder, self).encode(o)
    147 
    148         # now:

/usr/lib/python2.7/json/encoder.pyc in encode(self, o)
    205         # exceptions aren't as detailed.  The list call should be roughly
    206         # equivalent to the PySequence_Fast that ''.join() would do.
--> 207         chunks = self.iterencode(o, _one_shot=True)
    208         if not isinstance(chunks, (list, tuple)):
    209             chunks = list(chunks)

/usr/lib/python2.7/json/encoder.pyc in iterencode(self, o, _one_shot)
    268                 self.key_separator, self.item_separator, self.sort_keys,
    269                 self.skipkeys, _one_shot)
--> 270         return _iterencode(o, 0)
    271 
    272 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/usr/local/lib/python2.7/dist-packages/plotly/utils.pyc in default(self, obj)
    211             except NotEncodable:
    212                 pass
--> 213         return json.JSONEncoder.default(self, obj)
    214 
    215     @staticmethod

/usr/lib/python2.7/json/encoder.pyc in default(self, o)
    182 
    183         """
--> 184         raise TypeError(repr(o) + " is not JSON serializable")
    185 
    186     def encode(self, o):

TypeError: <bound method DataFrame.count of                     query_word  count
0                          1,2      1
1                         four      1
2                       prefix      1
..                      ......      ..
..                      ......      ..
3                    francesco      1

As far as I understand from the other SF questions on the topic "is not json serializable" and from the error message, it is the problem with encoding? and not of the datatype.

Because, when I print type(df2.query_word) it says <class 'pandas.core.series.Series'>. So how to make a series serializable? Since the traceback doesn't show any encoding error such as in here or here.

What is the easy turn-around? My main intension to post this question is to understand whether this is a problem with dataframe, the data, ipython or plotly.

kingmakerking
  • 1,697
  • 2
  • 21
  • 39
  • My guess is that, since `numpy.float64` is not serializable by the standard json module, the call to `json.dumps` fails. The usual workaround is to call the `to_json` method in pandas, but here it happens inside the plotly function you call. Can you try to pass the data in a different format, not as a pandas dataframe? – IanS Jan 19 '17 at 12:36
  • 1
    @IanS You mean, I can simply convert using ```df[column_name].tolist() ?``` – kingmakerking Jan 19 '17 at 12:44
  • I don't know what formats `iplot` accepts, but something like that, yes. See [this answer](http://stackoverflow.com/a/11389998/5276797) to convert to native Python types accepted by json. – IanS Jan 19 '17 at 12:56

0 Answers0