Regex replace within list in r column

Question

Hello I have a df such as

COL1  COL2
G1    [OK_+__SP2,PL_-__SP2]
G2    [IQO_-__SP1_8,PL2_-__SP2]
G2    [IRO_+__SP8]

and I would like to remove for each element of the COL2, the part before __

and get

COL1  COL2
G1    ['SP2','SP2']
G2    ['SP1_8','SP2']
G2    ['SP8']

@WiktorStribiżew So this post is a duplicate of Regex FAQ post? By that logic every regex post could be closed with that FAQ post, no? Maybe provide link to exact duplicate target post? — zx8754, Nov 20 '20 at 08:07

score 2 · Accepted Answer · answered Nov 19 '20 at 23:21

An option with str_remove_all

library(dplyr)
library(stringr)
df1 %>% 
     mutate(COL2 = str_remove_all(COL2, "[^\\[,]+__"))

-output

#   COL1        COL2
#1   G1   [SP2,SP2]
#2   G2 [SP1_8,SP2]
#3   G2       [SP8]

data

df1 <- structure(list(COL1 = c("G1", "G2", "G2"), COL2 = c("[OK_+__SP2,PL_-__SP2]", 
"[IQO_-__SP1_8,PL2_-__SP2]", "[IRO_+__SP8]")), 
class = "data.frame", row.names = c(NA, 
-3L))

zx8754 · Answer 2 · 2020-11-19T21:55:52.137

1

Try this:

gsub("(?<=[\\[,])(.*?__)", "", df1$COL2, perl = TRUE)
# [1] "[SP2,SP2]"   "[SP1_8,SP2]" "[SP8]"

Remove between:

(?<=[\\[,]) look behind, check if it is [ or ,
(.*?__) up to, including __

Note: not great with regex, probably there are better ways.

edited Nov 19 '20 at 21:55

answered Nov 19 '20 at 21:51

zx8754

42,109
10
93
154

thanks, and how can I now transform the COL2 into list that I can iterate within each row ? – chippycentra Nov 19 '20 at 21:55
@chippycentra do you need to have those square brackets at all? – zx8754 Nov 19 '20 at 21:59
@chippycentra see [this post](https://stackoverflow.com/q/13773770/680068) regarding delimited string into new rows. – zx8754 Nov 19 '20 at 22:05

Regex replace within list in r column

2 Answers2

data