4

I'm trying to initialize a NumPy matrix of size (x,y) where y is very large.

The first column of the matrix is an ID (integer), and the rest are triplets (int8), where each member of the triplet should have a different default value.

i.e. assuming the default values are [2,5,9] I'd like to initialize the following matrix:

0 2 5 9 2 5 9 2 5 9 ...
0 2 5 9 2 5 9 2 5 9 ...
0 2 5 9 2 5 9 2 5 9 ...
0 2 5 9 2 5 9 2 5 9 ...
...

The fastest way I could think of initializing the matrix is:

defaults = [2, 5, 9]
mat = numpy.zeros(shape=(x,y),
                  dtype=['i'] + ['int8'] * (y - 1))
# fill the triplets with default values
for i in range(1, y/3):
    j = i * 3
    mat[:, j]   = defaults[0]
    mat[:, j+1] = defaults[1]
    mat[:, j+2] = defaults[2]

What is the fastest way to initialize such a matrix?

Thanks!

NStiner
  • 107
  • 1
  • 7
  • You should look at [numpy.tile](http://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html) – xnx May 01 '15 at 14:33

3 Answers3

1

You can use np.tile with reshaping the value array,for example :

>>> b=np.array([2,5,9])
>>> b=b.reshape(3,1)
>>> np.tile(b,3)
array([[2, 2, 2],
       [5, 5, 5],
       [9, 9, 9]])

Then you can use np.dstack to rotate the array then use np.hstack to add the zeros columns :

>>> np.hstack((np.zeros((3,1)),np.dstack(new)[0]))
array([[ 0.,  2.,  5.,  9.],
       [ 0.,  2.,  5.,  9.],
       [ 0.,  2.,  5.,  9.]])

Or you can repeat the none zero part again with tile :

>>> np.hstack((np.zeros((3,1)),np.tile(np.dstack(new)[0],4)))
array([[ 0.,  2.,  5.,  9.,  2.,  5.,  9.,  2.,  5.,  9.,  2.,  5.,  9.],
       [ 0.,  2.,  5.,  9.,  2.,  5.,  9.,  2.,  5.,  9.,  2.,  5.,  9.],
       [ 0.,  2.,  5.,  9.,  2.,  5.,  9.,  2.,  5.,  9.,  2.,  5.,  9.]])

EDIT:

Just for clarification, the simple one liner is this:

defaults = [2, 5, 9]
np.hstack((np.zeros((x,1)), np.tile(defaults, (x,y))))
NStiner
  • 107
  • 1
  • 7
kasravnd
  • 94,640
  • 16
  • 137
  • 166
0

A simpler solution using just the np.tile() function:

# Create a footprint array
a = np.array([2, 5, 9])

# Use the footprint array to create the required final array
res = np.tile(a, (4, 3))

>>>res
array([[2, 5, 9, 2, 5, 9, 2, 5, 9],
       [2, 5, 9, 2, 5, 9, 2, 5, 9],
       [2, 5, 9, 2, 5, 9, 2, 5, 9],
       [2, 5, 9, 2, 5, 9, 2, 5, 9]])

For a detailed use of np.tile() see the official numpy page.

Nicola Pesavento
  • 584
  • 9
  • 22
-1

I'd do it this like this,

np.tile([0] + [2, 5, 9] * 4, (3, 1))

Here I've used list addition and list multiplication to create the first row, then used np.tile to replicate that over three rows. np.tile automatically converts the list to an array before replicating it vertically three times. You could wrap this into a function that looks something like this if you wanted to,

def make_array(triple, n_triple, n_row):
    return np.tile([0] + list(triple) * n_triple, (n_row, 1))

Here I've forced triple to be a list, but if you're careful to only pass a list to the triple variable when you call this function you wouldn't need that.

Good luck

farenorth
  • 8,082
  • 34
  • 39
  • I think you got confused- what you wrote as n_y is actually x in my example. You also did the replication in the y axis (the columns) by using list multiplication which I think is slower than NumPy's methods. – NStiner May 01 '15 at 20:58
  • I fixed my solution to be more descriptive. IMO your use of `x, y` is confusing because `x, y` are often used for horizontal and vertical coordinates (e.g. in a plot). As far as speed goes, sure `np.tile` is faster for large values of `n_triple`, but most of the time it is inconsequential and I'd argue my code is significantly simpler and easier to read. If I'm scripting something quick and dirty I don't have to think about how the dimensions of `np.zeros`, `np.tile`, and `np.hstack` all work together. – farenorth May 01 '15 at 21:33
  • you're right about the `x, y` notation, probably `m, n` would've been better. Regarding the list multiplication- it is indeed simpler, but for large y's it'll be a lot slower than the NumPy implementation. And in any case maybe using a generator expression would be better, like such- `(i for j in range(n_triple) for i in [2, 5, 9])` – NStiner May 02 '15 at 19:21