1

The problem I encounter is that, by using ndarray.view(np.dtype) to get a structured array from a classic ndarray seems to miscompute the float to int conversion.

Example talks better:

In [12]: B
Out[12]: 
array([[  1.00000000e+00,   1.00000000e+00,   0.00000000e+00,
      0.00000000e+00,   4.43600000e+01,   0.00000000e+00],
   [  1.00000000e+00,   2.00000000e+00,   7.10000000e+00,
      1.10000000e+00,   4.43600000e+01,   1.32110000e+02],
   [  1.00000000e+00,   3.00000000e+00,   9.70000000e+00,
      2.10000000e+00,   4.43600000e+01,   2.04660000e+02],
   ..., 
   [  1.28900000e+03,   1.28700000e+03,   0.00000000e+00,
      9.99999000e+05,   4.75600000e+01,   3.55374000e+03],
   [  1.28900000e+03,   1.28800000e+03,   1.29000000e+01,
      5.40000000e+00,   4.19200000e+01,   2.08400000e+02],
   [  1.28900000e+03,   1.28900000e+03,   0.00000000e+00,
      0.00000000e+00,   4.19200000e+01,   0.00000000e+00]])

In [14]: B.view(A.dtype)
Out[14]: 
array([(4607182418800017408, 4607182418800017408, 0.0, 0.0, 44.36, 0.0),
   (4607182418800017408, 4611686018427387904, 7.1, 1.1, 44.36, 132.11),
   (4607182418800017408, 4613937818241073152, 9.7, 2.1, 44.36, 204.66),
   ...,
   (4653383897399164928, 4653375101306142720, 0.0, 999999.0, 47.56, 3553.74),
   (4653383897399164928, 4653379499352653824, 12.9, 5.4, 41.92, 208.4),
   (4653383897399164928, 4653383897399164928, 0.0, 0.0, 41.92, 0.0)], 
  dtype=[('i', '<i8'), ('j', '<i8'), ('tnvtc', '<f8'), ('tvtc', '<f8'), ('tf', '<f8'), ('tvps', '<f8')])

The 'i' and 'j' columns are true integers:

Here you have two further check I have done, the problem seems to come from the ndarray.view(np.int)

In [21]: B[:,:2]
Out[21]: 
array([[  1.00000000e+00,   1.00000000e+00],
   [  1.00000000e+00,   2.00000000e+00],
   [  1.00000000e+00,   3.00000000e+00],
   ..., 
   [  1.28900000e+03,   1.28700000e+03],
   [  1.28900000e+03,   1.28800000e+03],
   [  1.28900000e+03,   1.28900000e+03]])

In [22]: B[:,:2].view(np.int)
Out[22]: 
array([[4607182418800017408, 4607182418800017408],
   [4607182418800017408, 4611686018427387904],
   [4607182418800017408, 4613937818241073152],
   ..., 
   [4653383897399164928, 4653375101306142720],
   [4653383897399164928, 4653379499352653824],
   [4653383897399164928, 4653383897399164928]])

In [23]: B[:,:2].astype(np.int)
Out[23]: 
array([[   1,    1],
   [   1,    2],
   [   1,    3],
   ..., 
   [1289, 1287],
   [1289, 1288],
   [1289, 1289]])

What am I doing wrong? Can't I change the type due to numpy allocation memory? Is there another way to do this (fromarrays, was accusing a shape mismatch ?

Touki
  • 603
  • 2
  • 8
  • 18
  • I don't know what's the bigger picture, but you may want to look as [Pandas](http://pandas.pydata.org/). – cyborg Dec 23 '13 at 19:22
  • Thanks. Right now I'm using h5py, but I will definitely look at Pandas. – Touki Dec 24 '13 at 02:37
  • h5py is for storage, but I mentioned Pandas because you use arrays with different types in different columns. – cyborg Dec 24 '13 at 05:13

3 Answers3

1

Actually, from_arrays work, but it doesn't explain this weird comportment.

Here is the solution I've found:

np.core.records.fromarrays(B.T, dtype=A.dtype)
Touki
  • 603
  • 2
  • 8
  • 18
  • As far as the transpose goes, it's because `fromarrays` expects a sequence of "columns". Passing in a 2d array will result in a sequence of "rows" (ndarrays iterate over their first axis). Again, for what it looks like you're doing, structured arrays probably aren't what you want. – Joe Kington Dec 23 '13 at 16:53
1

This is the difference between doing somearray.view(new_dtype) and calling astype.

What you're seeing is exactly the expected behavior, and it's very deliberate, but it's uprising the first time you come across it.

A view with a different dtype interprets the underlying memory buffer of the array as the given dtype. No copies are made. It's very powerful, but you have to understand what you're doing.

A key thing to remember is that calling view never alters the underlying memory buffer, just the way that it's viewed by numpy (e.g. dtype, shape, strides). Therefore, view deliberately avoids altering the data to the new type and instead just interprets the "old bits" as the new dtype.

For example:

In [1]: import numpy as np

In [2]: x = np.arange(10)

In [3]: x
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]: x.dtype
Out[4]: dtype('int64')

In [5]: x.view(np.int32)
Out[5]: array([0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0],
              dtype=int32)

In [6]: x.view(np.float64)
Out[6]:
array([  0.00000000e+000,   4.94065646e-324,   9.88131292e-324,
         1.48219694e-323,   1.97626258e-323,   2.47032823e-323,
         2.96439388e-323,   3.45845952e-323,   3.95252517e-323,
         4.44659081e-323])

If you want to make a copy of the array with a new dtype, use astype instead:

In [7]: x
Out[7]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [8]: x.astype(np.int32)
Out[8]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

In [9]: x.astype(float)
Out[9]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

However, using astype with structured arrays will probably surprise you. Structured arrays treat each element of the input as a C-like struct. Therefore, if you call astype, you'll run into several suprises.


Basically, you want the columns to have a different dtype. In that case, don't put them in the same array. Numpy arrays are expected to be homogenous. Structured arrays are handy in certain cases, but they're probably not what you want if you're looking for something to handle separate columns of data. Just use each column as its own array.

Better yet, if you're working with tabular data, you'll probably find its easier to use pandas than to use numpy arrays directly. pandas is oriented towards tabular data (where columns are expected to have different types), while numpy is oriented towards homogenous arrays.

Joe Kington
  • 239,485
  • 62
  • 555
  • 446
0

The only solution which worked for me in similar situation:

np.array([tuple(row) for row in B], dtype=A.dtype)
Sklavit
  • 1,813
  • 18
  • 23