I wrote a script which uses nltk's FreqDist
module then converts it into a pandas dataframe. The code snippet is as follows:
.......
import unicodedata
str2 = unicodedata.normalize('NFKD', str1).encode('ascii','ignore')
words = nltk.tokenize.word_tokenize(str2)
fdist = nltk.FreqDist(words)
df = pd.DataFrame.from_dict(fdist, orient='index').reset_index()
df = df.rename(columns={'index':'query_word', 0:'count'})
df2 = df.sort_values(['count'], ascending=[False])
Now, I am trying to plot it using plotly
for which my code snippet looks as follows:
import plotly.plotly as py
import plotly.graph_objs as go
data = [go.Bar(x= df.query_word, y= df.count)]
py.iplot(data, filename='basic-bar')
When I run this part, I get the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-87d0c9af254b> in <module>()
----> 1 py.iplot(data, filename='basic-bar')
/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in iplot(figure_or_data, **plot_options)
150 if 'auto_open' not in plot_options:
151 plot_options['auto_open'] = False
--> 152 url = plot(figure_or_data, **plot_options)
153
154 if isinstance(figure_or_data, dict):
/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in plot(figure_or_data, validate, **plot_options)
239
240 plot_options = _plot_option_logic(plot_options)
--> 241 res = _send_to_plotly(figure, **plot_options)
242
243 if res['error'] == '':
/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in _send_to_plotly(figure, **plot_options)
1407 fig = tools._replace_newline(figure) # does not mutate figure
1408 data = json.dumps(fig['data'] if 'data' in fig else [],
-> 1409 cls=utils.PlotlyJSONEncoder)
1410 credentials = get_credentials()
1411 validate_credentials(credentials)
/usr/lib/python2.7/json/__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, sort_keys, **kw)
249 check_circular=check_circular, allow_nan=allow_nan, indent=indent,
250 separators=separators, encoding=encoding, default=default,
--> 251 sort_keys=sort_keys, **kw).encode(obj)
252
253
/usr/local/lib/python2.7/dist-packages/plotly/utils.pyc in encode(self, o)
144
145 # this will raise errors in a normal-expected way
--> 146 encoded_o = super(PlotlyJSONEncoder, self).encode(o)
147
148 # now:
/usr/lib/python2.7/json/encoder.pyc in encode(self, o)
205 # exceptions aren't as detailed. The list call should be roughly
206 # equivalent to the PySequence_Fast that ''.join() would do.
--> 207 chunks = self.iterencode(o, _one_shot=True)
208 if not isinstance(chunks, (list, tuple)):
209 chunks = list(chunks)
/usr/lib/python2.7/json/encoder.pyc in iterencode(self, o, _one_shot)
268 self.key_separator, self.item_separator, self.sort_keys,
269 self.skipkeys, _one_shot)
--> 270 return _iterencode(o, 0)
271
272 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
/usr/local/lib/python2.7/dist-packages/plotly/utils.pyc in default(self, obj)
211 except NotEncodable:
212 pass
--> 213 return json.JSONEncoder.default(self, obj)
214
215 @staticmethod
/usr/lib/python2.7/json/encoder.pyc in default(self, o)
182
183 """
--> 184 raise TypeError(repr(o) + " is not JSON serializable")
185
186 def encode(self, o):
TypeError: <bound method DataFrame.count of query_word count
0 1,2 1
1 four 1
2 prefix 1
.. ...... ..
.. ...... ..
3 francesco 1
As far as I understand from the other SF questions on the topic "is not json serializable" and from the error message, it is the problem with encoding? and not of the datatype.
Because, when I print type(df2.query_word)
it says <class 'pandas.core.series.Series'>
. So how to make a series serializable? Since the traceback doesn't show any encoding error such as in here or here.
What is the easy turn-around? My main intension to post this question is to understand whether this is a problem with dataframe, the data, ipython or plotly.