Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?

Question

I have a Numpy array consisting of a list of lists, representing a two-dimensional array with row labels and column names as shown below:

data = array([['','Col1','Col2'],['Row1',1,2],['Row2',3,4]])

I'd like the resulting DataFrame to have Row1 and Row2 as index values, and Col1, Col2 as header values

I can specify the index as follows:

df = pd.DataFrame(data,index=data[:,0]),

however I am unsure how to best assign column headers.

@behzad.nouri's answer is correct, but I think you should consider if you cannot have the initial data in another form. Because now, your values will be strings and not ints (because of the numpy array mixing ints and strings, so all are casted to string because numpy arrays have to be homogeneous). — joris, Dec 24 '13 at 15:54

behzad.nouri · Accepted Answer · 2016-03-06T12:28:40.973

368

You need to specify data, index and columns to DataFrame constructor, as in:

>>> pd.DataFrame(data=data[1:,1:],    # values
...              index=data[1:,0],    # 1st column as index
...              columns=data[0,1:])  # 1st row as the column names

edit: as in the @joris comment, you may need to change above to np.int_(data[1:,1:]) to have correct data type.

edited Mar 06 '16 at 12:28

answered Dec 24 '13 at 15:50

behzad.nouri

61,871
15
109
113

8

this works - but for such a common structure of input data and desired application to a `DataFrame` is there not some "shortcut"? This is basically the way that `csv`s are loaded - and can be managed by the _default_ handling for many csv readers. An analogous structure for df's would be useful. – StephenBoesch Nov 17 '18 at 20:26
I added a mini helper/convenience method for this as a supplemental answer. – StephenBoesch Nov 17 '18 at 21:03

score 131 · Answer 2 · edited Aug 07 '19 at 08:34

131

Here is an easy to understand solution

import numpy as np
import pandas as pd

# Creating a 2 dimensional numpy array
>>> data = np.array([[5.8, 2.8], [6.0, 2.2]])
>>> print(data)
>>> data
array([[5.8, 2.8],
       [6. , 2.2]])

# Creating pandas dataframe from numpy array
>>> dataset = pd.DataFrame({'Column1': data[:, 0], 'Column2': data[:, 1]})
>>> print(dataset)
   Column1  Column2
0      5.8      2.8
1      6.0      2.2

edited Aug 07 '19 at 08:34

Jaroslav Bezděk

2,697
2
14
29

answered Jul 12 '18 at 14:28

Jagannath Banerjee

1,589
1
7
7

33

But you had to manually specify the `Series` names .. that's not scalable. – StephenBoesch Nov 17 '18 at 20:25

score 26 · Answer 3 · edited May 23 '17 at 12:26

I agree with Joris; it seems like you should be doing this differently, like with numpy record arrays. Modifying "option 2" from this great answer, you could do it like this:

import pandas
import numpy

dtype = [('Col1','int32'), ('Col2','float32'), ('Col3','float32')]
values = numpy.zeros(20, dtype=dtype)
index = ['Row'+str(i) for i in range(1, len(values)+1)]

df = pandas.DataFrame(values, index=index)

score 19 · Answer 4 · answered Oct 07 '18 at 12:31

19

This can be done simply by using from_records of pandas DataFrame

import numpy as np
import pandas as pd
# Creating a numpy array
x = np.arange(1,10,1).reshape(-1,1)
dataframe = pd.DataFrame.from_records(x)

answered Oct 07 '18 at 12:31

Aadil Srivastava

414
4
9

This answer does not work with the example data provided in the question, i.e. `data = array([['','Col1','Col2'],['Row1',1,2],['Row2',3,4]])`. – jpp Oct 07 '18 at 12:47
The simplest general solution when we have not specified the labels. – cerebrou Apr 17 '20 at 10:40

Rahul Verma · Answer 5 · 2019-08-08T07:48:28.010

16

    >>import pandas as pd
    >>import numpy as np
    >>data.shape
    (480,193)
    >>type(data)
    numpy.ndarray
    >>df=pd.DataFrame(data=data[0:,0:],
    ...        index=[i for i in range(data.shape[0])],
    ...        columns=['f'+str(i) for i in range(data.shape[1])])
    >>df.head()
    [![array to dataframe][1]][1]

edited Aug 08 '19 at 07:48

answered Jun 27 '19 at 09:17

Rahul Verma

1,872
1
7
20

score 9 · Answer 6 · answered Nov 17 '18 at 21:01

Adding to @behzad.nouri 's answer - we can create a helper routine to handle this common scenario:

def csvDf(dat,**kwargs): 
  from numpy import array
  data = array(dat)
  if data is None or len(data)==0 or len(data[0])==0:
    return None
  else:
    return pd.DataFrame(data[1:,1:],index=data[1:,0],columns=data[0,1:],**kwargs)

Let's try it out:

data = [['','a','b','c'],['row1','row1cola','row1colb','row1colc'],
     ['row2','row2cola','row2colb','row2colc'],['row3','row3cola','row3colb','row3colc']]
csvDf(data)

In [61]: csvDf(data)
Out[61]:
             a         b         c
row1  row1cola  row1colb  row1colc
row2  row2cola  row2colb  row2colc
row3  row3cola  row3colb  row3colc

score 6 · Answer 7 · answered Jun 25 '20 at 09:23

I think this is a simple and intuitive method:

data = np.array([[0, 0], [0, 1] , [1, 0] , [1, 1]])
reward = np.array([1,0,1,0])

dataset = pd.DataFrame()
dataset['StateAttributes'] = data.tolist()
dataset['reward'] = reward.tolist()

dataset

returns:

But there are performance implications detailed here:

How to set the value of a pandas column as list

score 3 · Answer 8 · answered Jul 06 '20 at 18:12

Here simple example to create pandas dataframe by using numpy array.

import numpy as np
import pandas as pd

# create an array 
var1  = np.arange(start=1, stop=21, step=1).reshape(-1)
var2 = np.random.rand(20,1).reshape(-1)
print(var1.shape)
print(var2.shape)

dataset = pd.DataFrame()
dataset['col1'] = var1
dataset['col2'] = var2
dataset.head()

score 1 · Answer 9 · answered Jun 25 '20 at 15:24

It's not so short, but maybe can help you.

Creating Array

import numpy as np
import pandas as pd

data = np.array([['col1', 'col2'], [4.8, 2.8], [7.0, 1.2]])

>>> data
array([['col1', 'col2'],
       ['4.8', '2.8'],
       ['7.0', '1.2']], dtype='<U4')

Creating data frame

df = pd.DataFrame(i for i in data).transpose()
df.drop(0, axis=1, inplace=True)
df.columns = data[0]
df

>>> df
  col1 col2
0  4.8  7.0
1  2.8  1.2

Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?

9 Answers9

Linked

Related