3

I am trying to use the function as_strided from numpy.lib.stride_tricks to extract sub series from a larger 2D array, but I struggled to find the right thing to write for the strides argument.

Let's say I have a matrix m which contains 5 1D array of length (a=)10. I want to extract sub 1D arrays of length (b=)4 for each 1D array in m.

import numpy
from numpy.lib.stride_tricks import as_strided

a, b = 10, 4
m = numpy.array([range(i,i+a) for i in range(5)])

# first try
sub_m = as_strided(m, shape=(m.shape[0], m.shape[1]-b+1, b))
print sub_m.shape # (5,7,4) which is what i expected
print sub_m[-1,-1,-1] # Some unexpected strange number: 8227625857902995061

# second try with strides argument
sub_m = as_strided(m, shape=(m.shape[0], m.shape[1]-b+1, b), strides=(m.itemize,m.itemize,m.itemize))
# gives error, see below

AttributeError: 'numpy.ndarray' object has no attribute 'itemize'

As you can see I succeed to get the right shape for sub_m in my first try. However I can't find what to write in strides=()

For information:

m = [[ 0  1  2  3  4  5  6  7  8  9]
 [ 1  2  3  4  5  6  7  8  9 10]
 [ 2  3  4  5  6  7  8  9 10 11]
 [ 3  4  5  6  7  8  9 10 11 12]
 [ 4  5  6  7  8  9 10 11 12 13]]

Expected output:

sub_n = [
         [[0 1 2 3] [1 2 3 4] ... [5 6 7 8] [6 7 8 9]]
         [[1 2 3 4] [2 3 4 5] ... [6 7 8 9] [7 8 9 10]]
         [[2 3 4 5] [3 4 5 6] ... [7 8 9 10] [8 9 10 11]]
         [[3 4 5 6] [4 5 6 7] ... [8 9 10 11] [9 10 11 12]]
         [[4 5 6 7] [5 6 7 8] ... [9 10 11 12] [10 11 12 13]]
        ]

edit: I have much more data, that's the reason why I want to use as_strided (efficiency)

Divakar
  • 204,109
  • 15
  • 192
  • 292
Nuageux
  • 1,644
  • 1
  • 13
  • 24

1 Answers1

2

Here's one approach with np.lib.stride_tricks.as_strided -

def strided_lastaxis(a, L):
    s0,s1 = a.strides
    m,n = a.shape
    return np.lib.stride_tricks.as_strided(a, shape=(m,n-L+1,L), strides=(s0,s1,s1))

Bit of explanation on strides for as_strided :

We have 3D strides, that increments by one element along the last/third axis, so s1 there for the last axis striding. The second axis strides by the same one element "distance", so s1 for that too. For the first axis, the striding is same as the first axis stride length of the array, as we move on the next row, so s0 there.

Sample run -

In [46]: a
Out[46]: 
array([[0, 5, 6, 2, 3, 6, 7, 1, 4, 8],
       [2, 1, 3, 7, 0, 3, 5, 4, 0, 1]])

In [47]: strided_lastaxis(a, L=4)
Out[47]: 
array([[[0, 5, 6, 2],
        [5, 6, 2, 3],
        [6, 2, 3, 6],
        [2, 3, 6, 7],
        [3, 6, 7, 1],
        [6, 7, 1, 4],
        [7, 1, 4, 8]],

       [[2, 1, 3, 7],
        [1, 3, 7, 0],
        [3, 7, 0, 3],
        [7, 0, 3, 5],
        [0, 3, 5, 4],
        [3, 5, 4, 0],
        [5, 4, 0, 1]]])
Divakar
  • 204,109
  • 15
  • 192
  • 292
  • Thanks! It works. However I still don't understand what I should put in the last argument. Where does `(s0,s1,s1)` comes from? I would really appreciate if you can add more details so next time I don't have to ask :) – Nuageux Jun 01 '17 at 11:29
  • @Nuageux Heartfelt apologies Sir! Added comments. – Divakar Jun 01 '17 at 11:37
  • Many thanks for your explanation, it's not the first time that I'm fighting with this function, next time I think I won't! (PS: Why the sir?) – Nuageux Jun 01 '17 at 11:43