3

I am currently starting to learn Pandas and struggling to do a task with it. What I am trying to do is to augment the data stored in a dataframe by combining two succesive rows with an increasing overlap between them. Just like a rolling window.

I believe the question can exemplified with this small dataframe:

df = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]], columns=['A', 'B', 'C', 'D'])

which gives:

    A   B   C   D
0   1   2   3   4
1   5   6   7   8
2   9   10  11  12

With it, what I want to accomplish but I don't know how to, is a dataframe like the next one:

    A   B   C   D
0   1   2   3   4
1   2   3   4   5
2   3   4   5   6
3   4   5   6   7
4   5   6   7   8
5   6   7   8   9
6   7   8   9   10
7   8   9   10  11
8   9   10  11  12

As if we were using multiple rolling windows between each pair of the initial dataframe. Note that I am not using this specific dataframe (the values are not really ordered like 1,2,3,4...). What I am using is a general dataframe imported from a csv.

Is this possible?, thanks in advance!


Edit

Thanks to all the responses. Both answers given by anky and Shubham Sharma work perfect. Here are the results obtained by using both of them with my real dataframe:

Initial dataframe enter image description here

After adding multiple rolling windows as my question needed enter image description here

Javier TG
  • 149
  • 2
  • 7
  • My answer below addresses the specific case of building the dataframe you are asking for. If the question is about the more general problem of taking a list and turning into a matrix of strided view then @anky 's solution is probably a better starting point. In that case, there is probably some numpy trick available to move from a list comprehension based solution to a faster one based on manipulating arrays. – GuillemB Jan 12 '21 at 16:48
  • Yes, my question is for a general dataframe, sorry about the confusion. – Javier TG Jan 12 '21 at 16:54
  • 1
    Numpy 1.2 provies a sliding window function for exactly this: https://numpy.org/doc/1.20/reference/generated/numpy.lib.stride_tricks.sliding_window_view.html#numpy.lib.stride_tricks.sliding_window_view – GuillemB Jan 31 '21 at 12:50

3 Answers3

2

May be not as elegant, but try:

def fun(dataframe,n):
    l = dataframe.stack().tolist()
    return (pd.DataFrame([l[e:e+n] for e,i in enumerate(l)],
        columns=dataframe.columns).dropna().astype(dataframe.dtypes))

fun(df,df.shape[1])

   A   B   C   D
0  1   2   3   4
1  2   3   4   5
2  3   4   5   6
3  4   5   6   7
4  5   6   7   8
5  6   7   8   9
6  7   8   9  10
7  8   9  10  11
8  9  10  11  12
anky
  • 64,269
  • 7
  • 30
  • 56
2

Let's try rolling with numpy:

def rolling(a, w=4):
    s = a.strides[-1]
    return np.lib.stride_tricks.as_strided(a, (len(a)-w+1, w), (s, s))

pd.DataFrame(rolling(df.values.reshape(-1)), columns=df.columns)

   A   B   C   D
0  1   2   3   4
1  2   3   4   5
2  3   4   5   6
3  4   5   6   7
4  5   6   7   8
5  6   7   8   9
6  7   8   9  10
7  8   9  10  11
8  9  10  11  12
Shubham Sharma
  • 38,395
  • 6
  • 14
  • 40
  • Thanks Shubham Sharma, but I believe this answer doesn't hold for a general dataframe imported from a csv. As commented with GuillemB I think this is because of my small example provided in the question (sorry about the confusion). – Javier TG Jan 12 '21 at 17:03
  • @JavierTG What do you mean by `general dataframe` can you please elaborate? – Shubham Sharma Jan 12 '21 at 17:05
  • 1
    My bad, that works also perfect. Sorry about my previous comment, I thought this answer was like the one provided by GuillemB, Thaks again for it! – Javier TG Jan 12 '21 at 17:11
1

You can do all the weight lifting with numpy and then drop the resulting matrix into a dataframe.

import numpy as np
import pandas as pd

n_columns = 4
n_rows = 9

aux = np.tile(
    np.arange(1, n_columns+1),  # base row
    (n_rows, 1)  # replicate it as many times as needed
)

# use broadcasting to add a per row offset to each row
aux = aux + np.arange(n_rows)[:, np.newaxis]

# put everything into a dataframe
pd.DataFrame(aux)
GuillemB
  • 344
  • 2
  • 9
  • Thanks @GuillemB, but the dataframe is imported from a csv with ton of data and the values are not ordered like 1, 2, 3, 4... as in the small example I provided (sorry about the confusion), So I believe this answer does not hold for a more general dataframe. I am going to update the question to make it clearer. – Javier TG Jan 12 '21 at 16:43