-2

I have a vector of strings that looks something like this: c("abc@40gmail.com", "xyz@50gmail.com"). For some reason, there are random/different digits after the @ and I'm trying to remove them. Using regular expression, how can I tell R to remove or replace the digits that come after "@", so I end up with c("abc@gmail.com", "xyz@gmail.com"). I don't know much about Regex, so I'd really really appreciate if someone can provide not just the code, but also a brief explanation of the code. Thanks!

hsl
  • 530
  • 8
  • 18
  • 1
    @Thomas has is that a dupe? From now on every text replacement question is a dupe of `gsub("e", "", x)`? The regex in the "dupe" is of exact match type, while in this question, it is a bit more complicated – David Arenburg May 17 '15 at 17:23

2 Answers2

3

One option is

x <- c("abc@40gmail.com", "xyz@50gmail.com")
sub("@\\d+", "@", x)
## [1] "abc@gmail.com" "xyz@gmail.com"
David Arenburg
  • 87,271
  • 15
  • 123
  • 181
1

You could use Positive lookbehind or \K

sub("(?<=@)\\d+", "", x, perl=T)

\\d+ matches one or more digits characters. So (?<=@) forces the regex engine to look immediate after to the @ symbol and then make it to match the following one or more digit characters. Since lookarounds belong to the PCRE family, you need to enable perl=TRUE parameter.

OR

sub("@\\K\\d+", "", x, perl=T)
Avinash Raj
  • 160,498
  • 22
  • 182
  • 229
  • Thanks a lot! Is there any reason why you wouldn't just use the simpler `sub("@\\d+", "@", x)`? – hsl May 17 '15 at 15:55
  • @hsl because it's already mentioned. We could write atleast two answers for a single regex based question. :-) That's the beauty of regex. – Avinash Raj May 17 '15 at 15:57