2

I have the following data frame

df <- data.frame(c(1, 2, 3, 4), c("T-A1", "T-A1-2", "T-A1-3", "T-A1-4"), c("apple", "banana", "pear", "orange"))
names(df) <- c("num", "name", "fruit")

  num   name  fruit
1   1   T-A1  apple
2   2 T-A1-2 banana
3   3 T-A1-3   pear
4   4 T-A1-4 orange

I need to change "T-A1" to "T-A1-1"

num   name  fruit        num   name  fruit
1   1 T-A1   apple       1   1 T-A1-1 apple
2   2 T-A1-2 banana   -> 2   2 T-A1-2 banana
3   3 T-A1-3 pear        3   3 T-A1-3 pear
4   4 T-A1-4 orange      4   4 T-A1-4 orange

I have used this function:

df$name <- gsub("T-A1", "T-A1-1", df$name)

But the result I get is this one:

  num   name   fruit
1   1 T-A1-1   apple
2   2 T-A1-1-2 banana
3   3 T-A1-1-3 pear
4   4 T-A1-1-4 orange

I then tried this formula:

df$name <- gsub("T-A1", "T-A1-1", df$name, fixed = TRUE)

But I still get the same results as previously mentioned.

The ideal situation would be to be able to replace "T-A1" when its only "T-A1" no matter if is nested in any other word be it at the begging, middle or end.

In other words, if some of the entries would look like this "T-A1-word", "word-T-A1" or "wo-T-A1-rd" they should still not be affected and their "T-A1" sections would still remain intact. The only time I want to replace "T-A1" is when it's just "T-A1" by itself.

R version 3.4.1 Winodws 7 64 bit

M--
  • 18,939
  • 7
  • 44
  • 76
Ricardo M
  • 77
  • 5

1 Answers1

2

You need to tell gsub that T-A1 is the exact string that you are looking for.

df$name <- gsub("^T-A1$", "T-A1-1", df$name)

##   num   name  fruit
## 1   1 T-A1-1  apple
## 2   2 T-A1-2 banana
## 3   3 T-A1-3   pear
## 4   4 T-A1-4 orange

This works because $ tells gsub that T-A1 should happen at the very end of the string and ^ tells it that it should be the beginning of it. Depends on your actual dataset, you may need to use a different expression.

This regex-faq can give you some ideas.

M--
  • 18,939
  • 7
  • 44
  • 76