-2

I have a column in a dataframe that looks as such:

>head(df$col, 10)

"glut.AN02737.1.1" "glut.AN02737.1.2" "glut.AN02737.1.15" "glut.AN02737.2.1" "glut.AN02737.1.1" "gad.AN17896.1.9" "gad.AN17896.1.9" "gad.AN17896.1.9" "gad.AN17896.1.9" "gad.AN17896.1.9" "gad.AN17896.1.9"

I want to extract all characters after the first ".", and before the second ".".

So I want:

"AN02737" "AN02737" "AN17896"

How can I achieve this?

Cyrus
  • 69,405
  • 13
  • 65
  • 117
Workhorse
  • 1,094
  • 1
  • 11
  • 16

2 Answers2

0

One option can be:

#Data
text <- c("glut.AN02737.1.1","glut.AN02737.1.2","glut.AN02737.1.15",
          "glut.AN02737.2.1","glut.AN02737.1.1","gad.AN17896.1.9")
#Code
vec <- unlist(lapply(strsplit(text,'.',fixed=TRUE),function(x) x[[2]]))

Output:

[1] "AN02737" "AN02737" "AN02737" "AN02737" "AN02737" "AN17896"

Or more practical:

#Code2
vec <- sapply(strsplit(text, ".", fixed=TRUE), "[", 2)

Same output.

Duck
  • 37,428
  • 12
  • 34
  • 70
0

Here are options

sub("^.*?[.](\\w+).*","\\1", string)
[1] "AN02737" "AN02737" "AN02737" "AN02737" "AN02737" "AN17896"

regmatches(string, regexpr("(?<=[.])\\w+", string, perl = TRUE))
[1] "AN02737" "AN02737" "AN02737" "AN02737" "AN02737" "AN17896"

read.table(text = string, sep='.', fill = TRUE)[,2]
[1] "AN02737" "AN02737" "AN02737" "AN02737" "AN02737" "AN17896"

stringr::str_extract(string, "(?<=[.])(\\w+)")
[1] "AN02737" "AN02737" "AN02737" "AN02737" "AN02737" "AN17896"

Data:

string <- c("glut.AN02737.1.1","glut.AN02737.1.2","glut.AN02737.1.15",
      "glut.AN02737.2.1","glut.AN02737.1.1","gad.AN17896.1.9")
Onyambu
  • 31,432
  • 2
  • 14
  • 36