1

I have an ipython notebook based on pandas 2 calling pandas' get_dummies(). This function converts categorical variable into dummy/indicator variables. It works on one machine but not in another one. Both machines run linux mint, python 2.7. See the minimal example below.

I see the error (ValueError: Wrong number of items passed 4, indices imply 3) on some other posts but the workarounds do not help, and as I wrote the code works on another machine. Any idea what to do? For example, how to compare the two installation of ipython/jupiter and the packages?

import pandas as pandas
df = pandas.DataFrame({ 'A' : pandas.Series(1,index=list(range(4)),dtype='float32'),
                     'B' : 2.,
                     'C' : pandas.Categorical(["test","train","test","train"])})
print df
pandas.get_dummies(df)

The output:

A  B      C
0  1  2   test
1  1  2  train
2  1  2   test
3  1  2  train

[4 rows x 3 columns]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-cf3a14671e3b> in <module>()
      5                      'C' : pd.Categorical(["test","train","test","train"])})
      6 print df
----> 7 pd.get_dummies(df)

/usr/lib/python2.7/dist-packages/pandas/core/reshape.pyc in get_dummies(data, prefix, prefix_sep, dummy_na)
    946     """
    947     # Series avoids inconsistent NaN handling
--> 948     cat = Categorical.from_array(Series(data))
    949     levels = cat.levels
    950 

/usr/lib/python2.7/dist-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath)
    220                                        raise_cast_failure=True)
    221 
--> 222                 data = SingleBlockManager(data, index, fastpath=True)
    223 
    224         generic.NDFrame.__init__(self, data, fastpath=True)

/usr/lib/python2.7/dist-packages/pandas/core/internals.pyc in __init__(self, block, axis, do_integrity_check, fastpath)
   3591                 block = block[0]
   3592             if not isinstance(block, Block):
-> 3593                 block = make_block(block, axis, axis, ndim=1, fastpath=True)
   3594 
   3595         else:

/usr/lib/python2.7/dist-packages/pandas/core/internals.pyc in make_block(values, items, ref_items, klass, ndim, dtype, fastpath, placement)
   1991 
   1992     return klass(values, items, ref_items, ndim=ndim, fastpath=fastpath,
-> 1993                  placement=placement)
   1994 
   1995 

/usr/lib/python2.7/dist-packages/pandas/core/internals.pyc in __init__(self, values, items, ref_items, ndim, fastpath, placement)
   1356         super(ObjectBlock, self).__init__(values, items, ref_items, ndim=ndim,
   1357                                           fastpath=fastpath,
-> 1358                                           placement=placement)
   1359 
   1360     @property

/usr/lib/python2.7/dist-packages/pandas/core/internals.pyc in __init__(self, values, items, ref_items, ndim, fastpath, placement)
     62         if len(items) != len(values):
     63             raise ValueError('Wrong number of items passed %d, indices imply '
---> 64                              '%d' % (len(items), len(values)))
     65 
     66         self.set_ref_locs(placement)

ValueError: Wrong number of items passed 4, indices imply 3
Thomas K
  • 35,785
  • 7
  • 76
  • 82
aless80
  • 2,198
  • 3
  • 24
  • 46
  • 1
    do a pip list on each machine to compare package version numbers, of every package off hand I'd guess an out of date Pandas or Numpy..In Jupyter notebook you can execute !pip list in a new code cell. – dartdog Mar 17 '16 at 03:26

1 Answers1

0

In my experience the issue turned out to be that the code failed when running on a machine with an older version of pandas (0.13.X), and ran fine with an up-to-date pandas package (0.19.1) on a different machine (thank you, dartdog, for your suggestion on comparing the package versions with pip list).

If your code is packaged up with a setup.py, you can enforce the package version:

install_requires = ['pandas>=0.19.1']

Apparently Buildout honors setuptools, so when you install your package and its dependencies specified in setup.py, it will check for the correct version and update as necessary.

If you're on a machine that you don't have the permission to update the python library directly, use the --user flag with pip to update the local user library:

pip install --user foo

--upgrade flag will force-update packages, so if all else fails, that's another flag you can try with pip to get the package up to the right version.

Community
  • 1
  • 1