-1

I am trying to append a numpy.darray to a dataframe with little success. The dataframe is called user2 and the numpy.darray is called CallTime.

I tried:

user2["CallTime"] = CallTime.values

but I get an error message:

Traceback (most recent call last):
File "<ipython-input-53-fa327550a3e0>", line 1, in <module>
user2["CallTime"] = CallTime.values
AttributeError: 'numpy.ndarray' object has no attribute 'values'

Then I tried:

user2["CallTime"] = user2.assign(CallTime = CallTime.values)

but I get again the same error message as above.

I also tried to use the merge command but for some reason it was not recognized by Python although I have imported pandas. In the example below CallTime is a dataframe:

 user3 = merge(user2, CallTime)

Error message:

  Traceback (most recent call last):
  File "<ipython-input-56-0ebf65759df3>", line 1, in <module>
  user3 = merge(user2, CallTime)
  NameError: name 'merge' is not defined

Any ideas?

Thank you!

AlK
  • 403
  • 1
  • 10
  • 18

3 Answers3

0

pandas DataFrame is a 2-dimensional data structure, and each column of a DataFrame is a 1-dimensional Series. So if you want to add one column to a DataFrame, you must first convert it into Series. np.ndarray is a multi-dimensional data structure. From your code, I believe the shape of np.ndarray CallTime should be nx1 (n rows and 1 colmun), and it's easy to convert it to a Series. Here is an example:

df = DataFrame(np.random.rand(5,2), columns=['A', 'B'])

This creates a dataframe df with two columns 'A' and 'B', and 5 rows.

CallTime = np.random.rand(5,1)

Assume this is your np.ndarray data CallTime

df['C'] = pd.Series(CallTime[:, 0])

This will add a new column to df. Here CallTime[:,0] is used to select first column of CallTime, so if you want to use different column from np.ndarray, change the index.

Please make sure that the number of rows for df and CallTime are equal.

Hope this would be helpful.

rojeeer
  • 1,851
  • 1
  • 9
  • 13
  • I tried what you suggested this time with a different dataframe called user3 and a np.ndarray called labels. Both have the same number of rows: type(labels) Out[64]: numpy.ndarray labels.shape Out[65]: (1405,) user3.shape Out[66]: (1405, 4) I entered: user3['labels'] = pd.Series(labels[:, 0]) and I received the following error message: File "", line 1, in user3['labels'] = pd.Series(labels[:, 0]) IndexError: too many indices for array – AlK Oct 06 '16 at 06:15
0

I think instead to provide only documentation, I will try to provide a sample:

import numpy as np
import pandas as pd

data = {'A': [2010, 2011, 2012],
    'B': ['Bears', 'Bears', 'Bears'],
    'C': [11, 8, 10],
    'D': [5, 8, 6]}
user2 = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'])

#creating the array what will append to pandas dataframe user2
CallTime = np.array([1, 2, 3])

#convert to list the ndarray array CallTime, if you your CallTime is a matrix than after converting to list you can iterate or you can convert into dataframe and just append column required or just join the dataframe.

user2.loc[:,'CallTime'] = CallTime.tolist()


print(user2)

result of dataframe user2

I think this one will help, also check numpy.ndarray.tolist documentation if need to find out why we need the list and how to do, also here is example how to create dataframe from numpy in case of need https://stackoverflow.com/a/35245297/2027457

Community
  • 1
  • 1
n1tk
  • 1,867
  • 2
  • 16
  • 31
  • I am still receiving an error message. Look at the main body of the question. – AlK Oct 06 '16 at 07:54
  • @AlexanderKonstantinidis, try this one user2.loc[:,'CallTime'] = CallTime.tolist() , I edited the answer so should work now. – n1tk Oct 06 '16 at 08:57
  • Thank you. The column was appended although I still got the same error message which is kind of strange. – AlK Oct 06 '16 at 09:31
  • Will work, you can ignore the warning and here is the description for the warning: "The warning is here to stop you modifying a copy (user2.loc[:,'CallTime'] = CallTime.tolist() is potentially a copy, and if it is then any modifications would not change the original frame. It could be that it works correctly in some cases but pandas cannot guarantee it will work in all cases... use at your own risk (consider yourself warned! ;) )." – n1tk Oct 06 '16 at 09:41
0

Here is a simple solution.

user2["CallTime"] = CallTime

The problem here for you is that CallTime is an array, you couldn't use .values. Since .values is used to convert a dataframe to array. For example,

df = DataFrame(np.random.rand(10,2), columns=['A', 'B'])
# The followings are correct
df.values
df['A'].values
df['B'].values 
Xiaojian Chen
  • 139
  • 1
  • 6