0

I have 2 arrays to concatenate:

X_train's shape is (3072, 50000) y_train's shape is (50000,)

I'd like to concatenate them so I can shuffle the indices all in one go. I have tried the following, but neither works:

np.concatenate([X_train, np.transpose(y_train)])
np.column_stack([X_train, np.transpose(y_train)])

How can I concatenate them?

Nathan
  • 646
  • 1
  • 8
  • 25
  • Concatenate to what? You got input-dimensions, what output-dimension do you want? (from a ML-perspective i don't see this making sense) – sascha Feb 05 '18 at 16:37
  • 2
    Can't you just reshape `Y_train` to `(1,50000)`? – DavidG Feb 05 '18 at 16:38
  • @DavidG Yes, thanks! Btw, why do I get (50000,) in the first place? Is that a numpy array? Seems like it's some kind of vector or list, idk. I'm new to numpy – Nathan Feb 05 '18 at 16:41
  • 2
    [This post](https://stackoverflow.com/questions/22053050/difference-between-numpy-array-shape-r-1-and-r) might help with the difference between the two – DavidG Feb 05 '18 at 16:45
  • 1
    In `numpy` 1-d arrays are just as useful as 2-d (or higher). – hpaulj Feb 05 '18 at 17:06
  • @DavidG If I could upvote that [this post](https://stackoverflow.com/questions/22053050/difference-between-numpy-array-shape-r-1-and-r) comment 10 times, I would. I don't know if I would have known how to find that post without your help. Perhaps I should add some tags to it to make it easier to dig up – Nathan Feb 05 '18 at 18:59

2 Answers2

1

To give you some recommendation targeting the task, not your problem: don't do this!

Assuming X are your samples / observations, y are your targets:

Just generate a random-permutation and create views (nothing copied or modified) into those, e.g. (untested):

import numpy as np

X = np.random.random(size=(50000, 3072))
y = np.random.random(size=50000)

perm = np.random.permutation(X.shape[0])  # assuming X.shape[0] == y.shape[0]
X_perm = X[perm]  # views!!!
y_perm = y[perm]

Reminder: your start-shapes are not compatible to most python-based ml-tools as the usual interpretation is:

  • first-dim / rows: samples
  • second-dim / cols: features

As #samples need to be the same as #target-values y, you will see that my example is correct in regards to this, while yours need a transpose on X

sascha
  • 27,544
  • 6
  • 57
  • 97
0

As DavidG said, I realized the answer is that y_train has shape (50000,) so I needed to reshape it before concat-ing

np.concatenate([X_train,         
     np.reshape(y_train, (1, 50000))])

Still, this evaluated very slowly in Jupyter. If there's a faster answer, I'd be grateful to have it

Nathan
  • 646
  • 1
  • 8
  • 25