347

I have a Numpy array consisting of a list of lists, representing a two-dimensional array with row labels and column names as shown below:

data = array([['','Col1','Col2'],['Row1',1,2],['Row2',3,4]])

I'd like the resulting DataFrame to have Row1 and Row2 as index values, and Col1, Col2 as header values

I can specify the index as follows:

df = pd.DataFrame(data,index=data[:,0]),

however I am unsure how to best assign column headers.

Georgy
  • 6,348
  • 7
  • 46
  • 58
user3132783
  • 4,255
  • 3
  • 12
  • 7
  • 4
    @behzad.nouri's answer is correct, but I think you should consider if you cannot have the initial data in another form. Because now, your values will be strings and not ints (because of the numpy array mixing ints and strings, so all are casted to string because numpy arrays have to be homogeneous). – joris Dec 24 '13 at 15:54

9 Answers9

368

You need to specify data, index and columns to DataFrame constructor, as in:

>>> pd.DataFrame(data=data[1:,1:],    # values
...              index=data[1:,0],    # 1st column as index
...              columns=data[0,1:])  # 1st row as the column names

edit: as in the @joris comment, you may need to change above to np.int_(data[1:,1:]) to have correct data type.

behzad.nouri
  • 61,871
  • 15
  • 109
  • 113
  • 8
    this works - but for such a common structure of input data and desired application to a `DataFrame` is there not some "shortcut"? This is basically the way that `csv`s are loaded - and can be managed by the _default_ handling for many csv readers. An analogous structure for df's would be useful. – StephenBoesch Nov 17 '18 at 20:26
  • I added a mini helper/convenience method for this as a supplemental answer. – StephenBoesch Nov 17 '18 at 21:03
131

Here is an easy to understand solution

import numpy as np
import pandas as pd

# Creating a 2 dimensional numpy array
>>> data = np.array([[5.8, 2.8], [6.0, 2.2]])
>>> print(data)
>>> data
array([[5.8, 2.8],
       [6. , 2.2]])

# Creating pandas dataframe from numpy array
>>> dataset = pd.DataFrame({'Column1': data[:, 0], 'Column2': data[:, 1]})
>>> print(dataset)
   Column1  Column2
0      5.8      2.8
1      6.0      2.2
Jaroslav Bezděk
  • 2,697
  • 2
  • 14
  • 29
Jagannath Banerjee
  • 1,589
  • 1
  • 7
  • 7
26

I agree with Joris; it seems like you should be doing this differently, like with numpy record arrays. Modifying "option 2" from this great answer, you could do it like this:

import pandas
import numpy

dtype = [('Col1','int32'), ('Col2','float32'), ('Col3','float32')]
values = numpy.zeros(20, dtype=dtype)
index = ['Row'+str(i) for i in range(1, len(values)+1)]

df = pandas.DataFrame(values, index=index)
Community
  • 1
  • 1
ryanjdillon
  • 13,415
  • 6
  • 73
  • 96
19

This can be done simply by using from_records of pandas DataFrame

import numpy as np
import pandas as pd
# Creating a numpy array
x = np.arange(1,10,1).reshape(-1,1)
dataframe = pd.DataFrame.from_records(x)
  • This answer does not work with the example data provided in the question, i.e. `data = array([['','Col1','Col2'],['Row1',1,2],['Row2',3,4]])`. – jpp Oct 07 '18 at 12:47
  • The simplest general solution when we have not specified the labels. – cerebrou Apr 17 '20 at 10:40
16
    >>import pandas as pd
    >>import numpy as np
    >>data.shape
    (480,193)
    >>type(data)
    numpy.ndarray
    >>df=pd.DataFrame(data=data[0:,0:],
    ...        index=[i for i in range(data.shape[0])],
    ...        columns=['f'+str(i) for i in range(data.shape[1])])
    >>df.head()
    [![array to dataframe][1]][1]

enter image description here

Rahul Verma
  • 1,872
  • 1
  • 7
  • 20
9

Adding to @behzad.nouri 's answer - we can create a helper routine to handle this common scenario:

def csvDf(dat,**kwargs): 
  from numpy import array
  data = array(dat)
  if data is None or len(data)==0 or len(data[0])==0:
    return None
  else:
    return pd.DataFrame(data[1:,1:],index=data[1:,0],columns=data[0,1:],**kwargs)

Let's try it out:

data = [['','a','b','c'],['row1','row1cola','row1colb','row1colc'],
     ['row2','row2cola','row2colb','row2colc'],['row3','row3cola','row3colb','row3colc']]
csvDf(data)

In [61]: csvDf(data)
Out[61]:
             a         b         c
row1  row1cola  row1colb  row1colc
row2  row2cola  row2colb  row2colc
row3  row3cola  row3colb  row3colc
StephenBoesch
  • 46,509
  • 64
  • 237
  • 432
6

I think this is a simple and intuitive method:

data = np.array([[0, 0], [0, 1] , [1, 0] , [1, 1]])
reward = np.array([1,0,1,0])

dataset = pd.DataFrame()
dataset['StateAttributes'] = data.tolist()
dataset['reward'] = reward.tolist()

dataset

returns:

enter image description here

But there are performance implications detailed here:

How to set the value of a pandas column as list

blue-sky
  • 45,835
  • 124
  • 360
  • 647
3

Here simple example to create pandas dataframe by using numpy array.

import numpy as np
import pandas as pd

# create an array 
var1  = np.arange(start=1, stop=21, step=1).reshape(-1)
var2 = np.random.rand(20,1).reshape(-1)
print(var1.shape)
print(var2.shape)

dataset = pd.DataFrame()
dataset['col1'] = var1
dataset['col2'] = var2
dataset.head()
1

It's not so short, but maybe can help you.

Creating Array

import numpy as np
import pandas as pd

data = np.array([['col1', 'col2'], [4.8, 2.8], [7.0, 1.2]])

>>> data
array([['col1', 'col2'],
       ['4.8', '2.8'],
       ['7.0', '1.2']], dtype='<U4')

Creating data frame

df = pd.DataFrame(i for i in data).transpose()
df.drop(0, axis=1, inplace=True)
df.columns = data[0]
df

>>> df
  col1 col2
0  4.8  7.0
1  2.8  1.2
Rafa
  • 321
  • 4
  • 4