0

I have the following string:

str <- "add2AHJJK_GLX_KLKNKMEMa13"

How can I use R to extract "GLX" from it, the word between the underscores? In the example, there are exactly two underscores, not more.

User878239
  • 489
  • 3
  • 10
  • 1
    `sub(".*_(.*)_.*", "\\1", str)` – G5W Jun 07 '20 at 23:24
  • Is the string guaranteed to have exactly two non-adjacent underscores? I am wondering about `"J_KG_LX_KL"` (two matches?), `"JK__GLX_KL"` (one match or two matches, the first empty?), `"JK_KL"` (no match or an empty match?) and `"JKL"` (no match?). – Cary Swoveland Jun 08 '20 at 00:07
  • @CarySwoveland good point, I actually just realized there is another string with multiple underscores. That makes it more complicated I guess. There are however no cases with multiple underscores in a row ("__"). – User878239 Jun 08 '20 at 00:09
  • Since questions have been posted that assume there are exactly two underscores, perhaps you should edit to state that is the case, and then post a separate if you want to generalize. btw, this illustrates the ambiguity that often results when a question is expressed in terms of a single example. I believe questions should be expressed in words with examples used only for illustration. That's generally more difficult and time-consuming, but it's a skill demanded by the workplace. – Cary Swoveland Jun 08 '20 at 00:16
  • By defining unambiguous rules, you are well on the way to able to code the appropriate regex. – TonyR Jun 08 '20 at 04:04

3 Answers3

4

An option with gsub to match characters that are not a _ ([^_]*) from the start (^) of the string to the _ or (|) characters from _ to the rest and replace with blank ("")

gsub("^[^_]*_|_.*", "", str)
#[1] "GLX"

Or another option is extraction with regexpr/regmatches

regmatches(str, regexpr('(?<=_)\\w+(?=_)', str, perl = TRUE))
#[1] "GLX"
akrun
  • 674,427
  • 24
  • 381
  • 486
2

If it's always just the middle of three parts between "_"s we can.

library(stringr)

str_split(str, "_", simplify = TRUE)[[2]]
[1] "GLX"
Chuck P
  • 3,541
  • 3
  • 6
  • 18
1

You can use sub to extract a word between underscores.

sub('.*_(\\w+)_.*', '\\1', str)
#[1] "GLX"

Or str_match :

stringr::str_match(str, '_(\\w+)_')[, 2]
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143