2

I am trying to remove the string [start section id="20107"] that appears in every cell of a column in a dataframe.

I have tried df1$Col1<- gsub("[start section id="20107"]", "", df1$Col1)but I got an error unexpected numeric constant in df1$Col1<- gsub("[start section id="20107" , not sure what else I can try, any help is appreciated folks.

[start section id="20107"]

(11-Feb-2013 13:22 DK04)
#1 Preventive exam
#2 Mild hyperlipidemia
#3 Hyperglycemia
#4 Peripheral neuropathy
oguz ismail
  • 34,491
  • 11
  • 33
  • 56
  • 1
    It's hard to help you without a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Please provide it. My guess is that you're dealing with factors and not with strings, but seeing some data (by doing dput(head(data)) would make things easier. – Heroka Oct 22 '15 at 08:06
  • @Heroka, good point , I have added a snippet of the string below my question.hope this works. – Bridgeport Byron Tucker Oct 22 '15 at 08:09
  • At the very least you need to escape the double quotes in your pattern. – 27 ϕ 9 Oct 22 '15 at 08:12
  • @Jay, escaping ?? not sure how to do that – Bridgeport Byron Tucker Oct 22 '15 at 08:14
  • `df1$Col1 – zx8754 Oct 22 '15 at 08:14
  • Should be marked as Replicated. Asked before [here](http://stackoverflow.com/questions/9704213/r-remove-part-of-string), [here](http://stackoverflow.com/questions/11776287/remove-pattern-from-string-with-gsub), [here](http://stackoverflow.com/questions/11936339/in-r-how-do-i-replace-text-within-a-string) just to give a few examples. – Paulo E. Cardoso Oct 22 '15 at 09:58

2 Answers2

5

gsub requires regular expression, so [ are treated as special regex-char. use fixed = TRUE. Furthermore, you have to escape quotation marks with \, so use \" if you have " in strings.

df1 <- data.frame(Col1 = "fdsfd [start section id=\"20107\"]")
df1$Col1<- gsub("[start section id=\"20107\"]", "", df1$Col1, fixed = TRUE)

With fixed = TRUE, your search pattern is considered as "plain string", not as regular expression.

Daniel
  • 6,454
  • 5
  • 21
  • 35
2

The pattern in gsub is messed up because of the nested double quotes. Either escape the quotes around the numbers or use single quotes around the pattern. Also use fixed = TRUE as you are not trying to match a regular expression but a fix pattern.

gsub('[start section id="20107"]', "", df1$Col1, fixed = TRUE)
Jaap
  • 71,900
  • 30
  • 164
  • 175
Duf59
  • 442
  • 4
  • 14