1

I have been working with the Titanic dataset from Kaggle. I have been trying to use an ifelse condition with an aim to do some feature engineering work but unfortunately I have been struck with my if else condition which is not producing an error but at the same time not working at all. What am I doing wrong and how do I deal with it? Here is my code:

if(train$titles=="Dr" && train$Sex=="male"){
  train$titles<-"Mr"
}else if(train$titles=="Dr" && train$Sex=="female"){
  train$titles<-"Mrs"
}

Here is my output which is the same before and after:

> table(train$titles)

  Mr  Mrs   Dr Miss 
 571  128    7  185 

Is this because I have left out the final else condition?

The thing is if the conditions in the if else don't match at all then I don't want to change the values in the column(i.e. I want them to be as it is). What do I do?

AdeeThyag
  • 95
  • 13
  • Actually tried it but I am getting a warning message. Here it is: Warning messages: 1: In if (train$titles == "Dr" & train$Sex == "male") { : the condition has length > 1 and only the first element will be used 2: In if (train$titles == "Dr" & train$Sex == "female") { : the condition has length > 1 and only the first element will be used – AdeeThyag Aug 30 '18 at 21:25
  • 1
    Use `ifelse` instead of `if` because the former is vectorized. – DanY Aug 30 '18 at 21:27

2 Answers2

1

Try logical indexing.

inx <- train$titles == "Dr"
train$titles[inx & train$sex == "male"] <- "Mr"
train$titles[inx & train$sex == "female"] <- "Mrs"

Also, like user Dan Y said in a comment to the question, repeated here because sometimes comments are deleted,

Use ifelse instead of if because the former is vectorized.

A ifelse solution still using inx as defined above could be

train$titles[inx] <- ifelse(train$sex[inx] == "male", "Mr", "Mrs")

I am using inx to avoid a longer code line. You can put the definition of inx in the indices of the ifelse if you prefer.

Rui Barradas
  • 44,483
  • 8
  • 22
  • 48
0

You should probably use ifelse, which is a vectorised form and will do what you want:

train$titles = ifelse(train$titles=="Dr" & train$Sex=="male", "Mr", "Mrs")

Also, beware of the difference between & and &&.

If you have multiple cases, you can nest multiple ifelse statements. You may also be interested in dplyr::case_when.

mikeck
  • 3,090
  • 23
  • 30
  • Actually I have got four different categories in the ordinal variable titles:"Mr","Mrs","Miss" and "Doctor". So using this all of them get changed into "Mr" and "Mrs" which I don't wan't. I want the "Miss" to remain as it is. – AdeeThyag Aug 30 '18 at 21:42
  • Right, so you need to nest them: `ifelse(x== "Foo", "bar", ifelse(x == "baz", "blah", ifelse(...` which obviously gets clunky pretty fast, hence the use of `case_when` or an alternative formulation of the problem using `match`. – mikeck Aug 30 '18 at 21:46