This is my tweak of your sample, adjusted so it runs. Note that I iterate over the columns by field name
dt=np.dtype({"names": columns, "formats" : ['i'] + ['int8'] * (len(columns) - 1)})
mat=np.zeros((10,),dtype=dt)
for i in range(1,7,3):
mat[dt.names[i]]=default_age
mat[dt.names[i+1]]=default_height
mat[dt.names[i+2]]=default_shoe_size
producing
array([(0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9),
(0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9),
(0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9),
(0, 2, 5, 9, 2, 5, 9)],
dtype=[('id', '<i4'), ('collections.ChainMap(np.arange(6).reshape(3,2))[0]_age', 'i1'), ('a_height', 'i1'), ('a_shoe_size', 'i1'), ('b_age', 'i1'), ('b_height', 'i1'), ('b_shoe_size', 'i1')])
As long as the number of field names is substantially few than the number of rows, I think this will be as fast, or faster, than any other way.
In my sample x=(10,)
. Your mat[:,j+1]
expression has not been corrected to handle a structured 1d array.
A structured array is probably not the best way to go if you have very many columns (fields) (compared to the number of rows).
If all of your fields are 'int', I'd use a regular 2d array. Structured arrays are most useful when fields have differing types of elements.
Here's a way of initializing a regular 2d array with these values, and optionally casting it to a structured array
values=np.array([2,5,9])
x, y = 10, 2
mat1=np.repeat(np.repeat(values[None,:],y,0).reshape(1,3*y),x,0)
producing:
array([[2, 5, 9, 2, 5, 9],
[2, 5, 9, 2, 5, 9],
...,
[2, 5, 9, 2, 5, 9]])
Add on the id column
mat1=np.concatenate([np.zeros((x,1),int),mat1],1)
array([[0, 2, 5, 9, 2, 5, 9],
[0, 2, 5, 9, 2, 5, 9],
...
[0, 2, 5, 9, 2, 5, 9],
[0, 2, 5, 9, 2, 5, 9]])
A new dtype - with all plain 'int':
dt1=np.dtype({"names" : columns, "formats" : ['i'] + ['int'] * (len(columns) - 1)})
mat2=np.empty((x,),dtype=dt1)
If done right, the data
for mat1
should be the same size and byte order as for mat2
. In which case I can 'copy' it (actually just change pointers).
mat2.data=mat1.data
mat2
looks just like the earlier mat
, except the dtype
is a little different (with i4
instead of i1
fields)
array([(0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9),
(0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9),
(0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9),
(0, 2, 5, 9, 2, 5, 9)],
dtype=[('id', '<i4'), ('a_age', '<i4'), ('a_height', '<i4'), ('a_shoe_size', '<i4'), ('b_age', '<i4'), ('b_height', '<i4'), ('b_shoe_size', '<i4')])
Another way to use mat1
values to initialize a structured array is with an intermediary list of tuples:
np.array([tuple(row) for row in mat1],dtype=dt)
array([(0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9),
(0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9),
(0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9), (0, 2, 5, 9, 2, 5, 9),
(0, 2, 5, 9, 2, 5, 9)],
dtype=[('id', '<i4'), ('a_age', 'i1'), ('a_height', 'i1'), ('a_shoe_size', 'i1'), ('b_age', 'i1'), ('b_height', 'i1'), ('b_shoe_size', 'i1')])
I haven't run time tests, in part because I don't have an idea of what your x
,y
values are like.
Convert structured array with various numeric data types to regular array
or from the answer in https://stackoverflow.com/a/21818731/901925, the np.ndarray
constructor can be used to create a new array using preexisting data buffer. It still needs to use dt1
, the all i8
dtype.
np.ndarray((x,), dt1, mat1)
Also ndarray to structured_array and float to int, with more on using view
v. astype
for this conversion.