0

I wanted to know the difference between these two lines of code

 X_train = training_dataset.iloc[:, 1].values
 X_train = training_dataset.iloc[:, 1:2].values

My guess is that the latter is a 2-D numpy array and the former is a 1-D numpy array. For inputs in a neural network, the latter is the proper way for input data, is there are specific reason for that?

Please help!

Osama Akhtar
  • 33
  • 1
  • 5

2 Answers2

1

Not quite that, they have both have ndim=2, just check by doing this:

X_train.ndim

The difference is that in the second one it doesn't have a defined second dimension if you want to see the difference between the shapes I suggest reading this: Difference between numpy.array shape (R, 1) and (R,)

Bruno Mello
  • 3,751
  • 1
  • 5
  • 27
1

Difference is iloc returns a Series with a single row or column is selected but a Dataframe with a multiple row or column ranges reference

Although they both refer to column 1, 1 and 1:2 are different types, with 1 representing an int and 1:2 representing a slice.

With,

X_train = training_dataset.iloc[:, 1].values

You specify a single column so training_dataset.iloc[:, 1] is a Pandas Series, so .values is a 1D Numpy array

Vs.,

X_train = training_dataset.iloc[:, 1:2].values

Although it becomes one column, [1:2] is a slice you represents a column range so training_dataset.iloc[:, 1:2] is a Pandas Dataframe. Thus, .values is a 2D Numpy array

Test as follows:

Create training_dataset Dataframe

data = {'Height':[1, 14, 2, 1, 5], 'Width':[15, 25, 2, 20, 27]} 
training_dataset = pd.DataFrame(data)

Using .iloc[:, 1]

print(type(training_dataset.iloc[:, 1]))
print(training_dataset.iloc[:, 1].values)

# Result is: 
<class 'pandas.core.series.Series'>
# Values returns a 1D Numpy array
0    15
1    25
2     2
3    20
4    27
Name: Width, dtype: int64, 

Using iloc[:, 1:2]

print(type(training_dataset.iloc[:, 1:2]))
print(training_dataset.iloc[:, 1:2].values)
# Result is: 
<class 'pandas.core.frame.DataFrame'>
# Values is a 2D Numpy array (since values of Pandas Dataframe)
[[15]
 [25]
 [ 2]
 [20]
 [27]], 
X_train Values Var Type <class 'numpy.ndarray'>
DarrylG
  • 11,572
  • 2
  • 13
  • 18