5

Let's say that I have the following dataframe:

index    K1    K2    D1    D2    D3
N1       0     1     12    4     6
N2       1     1     10    2     7
N3       0     0     3     5     8

Basically, I want to transform this dataframe into the following:

index    COL1   COL2
K1       D1     = 0*12+1*10+0*3
K1       D2     = 0*4+1*2+0*5
K1       D3     = 0*6+1*7+0*8
K2       D1     = 1*12+1*10+0*3
K2       D2     = 1*4+1*2+0*5
K2       D3     = 1*6+1*7+0*8

The content of COL2 is basically the dot product (aka the scalar product) between the vector in index and the one in COL1. For example, let's take the first line of the resulting df. Under index, we have K1 and, under COL1 we have D1. Looking at the first table, we know that K1 = [0,1,0] and D1 = [12,10,3]. The scalar product of these two "vectors" is the value inside COL2 (first line).

I'm trying to find a way of doing this without using a nested loop (because the idea is to make something efficient), however, I don't exactly know how. I tried using the pd.melt() function and, although it gets me closer to what I want, it doesn't exactly get me to where I want. Could you give me a hint?

Skyris
  • 153
  • 1
  • 9

2 Answers2

7

This is matrix multiplication:

(df[['D1','D2','D3']].T@df[['K1','K2']]).unstack().reset_index()

Output:

  level_0 level_1   0
0      K1      D1  10
1      K1      D2   2
2      K1      D3   7
3      K2      D1  22
4      K2      D2   6
5      K2      D3  13
David Erickson
  • 14,448
  • 1
  • 13
  • 30
Quang Hoang
  • 117,517
  • 10
  • 34
  • 52
  • 1
    Contrary, since matrix multiplication is central in many applications, people spend a lot of time optimizing it. Rest assure that it's among the most efficient operations :-). – Quang Hoang Oct 14 '20 at 22:16
  • Great Answer - I think I understand how to do this now. Pretty straightforward, thanks! It seems like the matrix on the left goes horizontally, while the matrix on the right goes vertically. You could also solve the problem by doing the opposite (transposing the other matrix/dataframe and swapping from right to left), but you would need to sort: `(df[['K1','K2']].T@df[['D1','D2','D3']]).unstack().reset_index()` – David Erickson Oct 14 '20 at 22:18
  • Excellent ! Thank you very much ! – Skyris Oct 14 '20 at 22:32
  • @Skyris check out my answer, which is a numpy version of QuangHoang's answer for performance. This is my first time doing some matrix multiplication so Quang Hoang, please feel free to put in some input on my answer as well. If it's faster, you can still leave this as accepted answer, since I built off Quang's answer. – David Erickson Oct 14 '20 at 22:38
1

Building off @QuangHoang's answer, you can see if the numpy .dot matrix multiplication version is a little bit more performant. Since .to_numpy() just creates a matrix of numbers that is not tied to K and D indices, you have to do some manipulation to get it back in the pandas format you want besides purely numbers:

a1 = df[['D1','D2','D3']].T.to_numpy()
a2 = df[['K1','K2']].to_numpy()
df1 = pd.DataFrame(a1.dot(a2)).unstack().reset_index() #see other options below
df1['level_0'] = 'K' + (df1['level_0'] + 1).astype(str)
df1['level_1'] = 'D' + (df1['level_1'] + 1).astype(str)
df1
Out[1]: 
  level_0 level_1   0
0      K1      D1  10
1      K1      D2   2
2      K1      D3   7
3      K2      D1  22
4      K2      D2   6
5      K2      D3  13

Other numpy array options:

df1 = pd.DataFrame(a1 @ a2).unstack().reset_index()
df1 = pd.DataFrame(np.matmul(a1, a2)).unstack().reset_index()

More information (numpy matrix vector multiplication)

David Erickson
  • 14,448
  • 1
  • 13
  • 30