0

Hello I have a df such as

COL1  COL2
G1    [OK_+__SP2,PL_-__SP2]
G2    [IQO_-__SP1_8,PL2_-__SP2]
G2    [IRO_+__SP8]

and I would like to remove for each element of the COL2, the part before __

and get

COL1  COL2
G1    ['SP2','SP2']
G2    ['SP1_8','SP2']
G2    ['SP8']
zx8754
  • 42,109
  • 10
  • 93
  • 154
chippycentra
  • 1,719
  • 3
  • 10

2 Answers2

2

An option with str_remove_all

library(dplyr)
library(stringr)
df1 %>% 
     mutate(COL2 = str_remove_all(COL2, "[^\\[,]+__"))

-output

#   COL1        COL2
#1   G1   [SP2,SP2]
#2   G2 [SP1_8,SP2]
#3   G2       [SP8]

data

df1 <- structure(list(COL1 = c("G1", "G2", "G2"), COL2 = c("[OK_+__SP2,PL_-__SP2]", 
"[IQO_-__SP1_8,PL2_-__SP2]", "[IRO_+__SP8]")), 
class = "data.frame", row.names = c(NA, 
-3L))
akrun
  • 674,427
  • 24
  • 381
  • 486
1

Try this:

gsub("(?<=[\\[,])(.*?__)", "", df1$COL2, perl = TRUE)
# [1] "[SP2,SP2]"   "[SP1_8,SP2]" "[SP8]"      

Remove between:

  • (?<=[\\[,]) look behind, check if it is [ or ,
  • (.*?__) up to, including __

Note: not great with regex, probably there are better ways.

zx8754
  • 42,109
  • 10
  • 93
  • 154