Convert structured array with various numeric data types to regular array

Question

Suppose I have a NumPy structured array with various numeric datatypes. As a basic example,

my_data = np.array( [(17, 182.1),  (19, 175.6)],  dtype='i2,f4')

How can I cast this into a regular NumPy array of floats?

From this answer, I know I could use

np.array(my_data.tolist())

but apparently it is slow since you "convert an efficiently packed NumPy array to a regular Python list".

In that previous question, all the fields were of the same type. — hpaulj, Oct 03 '14 at 17:19
I wrote [a quick script](https://gist.github.com/anonymous/74487ac64c2f69b781d5) to see which of the answers was the fastest for a 30000x3000 array, and they were pretty similar -- JohnZwinck's: 0.30s, Jaime's: 0.41s, hpaulj's: 0.46s, WarrenWeckesser: 0.47s. And they all used about 3GB of memory. — Garrett, Oct 03 '14 at 22:41

score 2 · Accepted Answer · answered Oct 03 '14 at 08:35

2

You can do it easily with Pandas:

>>> import pandas as pd
>>> pd.DataFrame(my_data).values
array([[  17.       ,  182.1000061],
       [  19.       ,  175.6000061]], dtype=float32)

answered Oct 03 '14 at 08:35

John Zwinck

207,363
31
261
371

Warren Weckesser · Answer 2 · 2014-10-03T12:51:58.477

1

Here's one way (assuming my_data is a one-dimensional structured array):

In [26]: my_data
Out[26]: 
array([(17, 182.10000610351562), (19, 175.60000610351562)], 
      dtype=[('f0', '<i2'), ('f1', '<f4')])

In [27]: np.column_stack(my_data[name] for name in my_data.dtype.names)
Out[27]: 
array([[  17.       ,  182.1000061],
       [  19.       ,  175.6000061]], dtype=float32)

edited Oct 03 '14 at 12:51

answered Oct 03 '14 at 12:46

Warren Weckesser

93,173
16
157
182

Jaime · Answer 3 · 2014-10-03T15:44:41.393

The obvious way works:

>>> my_data
array([(17, 182.10000610351562), (19, 175.60000610351562)],
      dtype=[('f0', '<i2'), ('f1', '<f4')])
>>> n = len(my_data.dtype.names)  # n == 2
>>> my_data.astype(','.join(['f4']*n))
array([(17.0, 182.10000610351562), (19.0, 175.60000610351562)],
      dtype=[('f0', '<f4'), ('f1', '<f4')])
>>> my_data.astype(','.join(['f4']*n)).view('f4')
array([  17.       ,  182.1000061,   19.       ,  175.6000061], dtype=float32)
>>> my_data.astype(','.join(['f4']*n)).view('f4').reshape(-1, n)
array([[  17.       ,  182.1000061],
       [  19.       ,  175.6000061]], dtype=float32)

hpaulj · Answer 4 · 2014-10-03T21:00:03.257

A variation on Warren's answer (which copies data by field):

x = np.empty((my_data.shape[0],len(my_data.dtype)),dtype='f4')
for i,n in enumerate(my_data.dtype.names):
    x[:,i]=my_data[n]

Or you could iterate by row. r is a tuple. It has to be converted to a list in order to fill a row of x. With many rows and few fields this will be slower.

for i,r in enumerate(my_data):
    x[i,:]=list(r)

It may be instructive to try x.data=r.data, and get an error: AttributeError: not enough data for array. x data is a buffer with 4 floats. my_data is a buffer with 2 tuples, each of which contains an int and a float (or sequence of [int float int float]). my_data.itemsize==6. One way or other, the my_data has to be converted to all floats, and the tuple grouping removed.

But using astype as Jaime shows does work:

x.data=my_data.astype('f4,f4').data

In quick tests using a 1000 item array with 5 fields, copying field by field is just as fast as using astype.

Convert structured array with various numeric data types to regular array

4 Answers4

Linked