Extract characters between 2 dots in R

Question

I have a column in a dataframe that looks as such:

>head(df$col, 10)

"glut.AN02737.1.1" "glut.AN02737.1.2" "glut.AN02737.1.15" "glut.AN02737.2.1" "glut.AN02737.1.1" "gad.AN17896.1.9" "gad.AN17896.1.9" "gad.AN17896.1.9" "gad.AN17896.1.9" "gad.AN17896.1.9" "gad.AN17896.1.9"

I want to extract all characters after the first ".", and before the second ".".

So I want:

"AN02737" "AN02737" "AN17896"

How can I achieve this?

score 0 · Answer 1 · answered Dec 08 '20 at 17:59

One option can be:

#Data
text <- c("glut.AN02737.1.1","glut.AN02737.1.2","glut.AN02737.1.15",
          "glut.AN02737.2.1","glut.AN02737.1.1","gad.AN17896.1.9")
#Code
vec <- unlist(lapply(strsplit(text,'.',fixed=TRUE),function(x) x[[2]]))

Output:

[1] "AN02737" "AN02737" "AN02737" "AN02737" "AN02737" "AN17896"

Or more practical:

#Code2
vec <- sapply(strsplit(text, ".", fixed=TRUE), "[", 2)

Same output.

Onyambu · Answer 2 · 2020-12-08T18:15:20.117

Here are options

sub("^.*?[.](\\w+).*","\\1", string)
[1] "AN02737" "AN02737" "AN02737" "AN02737" "AN02737" "AN17896"

regmatches(string, regexpr("(?<=[.])\\w+", string, perl = TRUE))
[1] "AN02737" "AN02737" "AN02737" "AN02737" "AN02737" "AN17896"

read.table(text = string, sep='.', fill = TRUE)[,2]
[1] "AN02737" "AN02737" "AN02737" "AN02737" "AN02737" "AN17896"

stringr::str_extract(string, "(?<=[.])(\\w+)")
[1] "AN02737" "AN02737" "AN02737" "AN02737" "AN02737" "AN17896"

Data:

string <- c("glut.AN02737.1.1","glut.AN02737.1.2","glut.AN02737.1.15",
      "glut.AN02737.2.1","glut.AN02737.1.1","gad.AN17896.1.9")

Another option is `word` from `stringr` `word(string, 2, sep="\\.")` — akrun, Dec 08 '20 at 19:36

Extract characters between 2 dots in R

2 Answers2