strsplit it by the last "number_"

Question

Hello I have a df such as

COL1
SEQ_1.1_0
SEQ2.2_2
AB_1_2.3_3
ACC.3_3

and I would like to strsplit it by the last "number_"

and get :

COL1      COL2
SEQ_1.1   0
SEQ2.2    2
AB_1_2.3  3
ACC.3     3

so far I tried:

strsplit(df$COL1, "*.[0-9]_")

here is the code I use and need to use

df$shorti = do.call(rbind, strsplit(as.character(df$COL1), "*.[0-9]_"))[,1]

I added why I use strsplit ate the end of the post – chippycentra Oct 09 '20 at 15:05 — chippycentra, Oct 09 '20 at 15:05

Ronak Shah · Accepted Answer · 2020-10-09T15:06:23.007

2

Using tidyr::extract :

tidyr::extract(df, COL1, c('COL1', 'COL2'), regex = '(.*)_(.*)', convert = TRUE)

#      COL1 COL2
#1  SEQ_1.1    0
#2   SEQ2.2    2
#3 AB_1_2.3    3
#4    ACC.3    3

With strsplit using regex from here with negative lookahead.

result <- do.call(rbind, strsplit(df$COL1, '(_)(?!.*_)', perl = TRUE))

edited Oct 09 '20 at 15:06

answered Oct 09 '20 at 14:51

Ronak Shah

286,338
16
97
143

do you have a strsplit solution ? I mean I need only the regex part – chippycentra Oct 09 '20 at 15:02
@chippycentra Added `strsplit` solution. – Ronak Shah Oct 09 '20 at 15:06
Ronak, could you explain the tidyr code, there are more than 1 underscores _, how it takes the last one to split? – Karthik S Oct 09 '20 at 15:11
1

@KarthikS regex by definition are greedy so if there are multiple occurrence of a character it always matches the last one unless you make them non-greedy with `?`. (`(.*?)`) – Ronak Shah Oct 09 '20 at 15:17

score 2 · Answer 2 · answered Oct 09 '20 at 15:19

Using substr:

> dat                  
        COL1
1  SEQ_1.1_0
2   SEQ2.2_2
3 AB_1_2.3_3
4    ACC.3_3
> dat$COl2 <- substr(dat$COL1,nchar(dat$COL1),nchar(dat$COL1)+1)
> dat$COL1 <- substr(dat$COL1,1,nchar(dat$COL1)-2)
> dat
      COL1 COl2
1  SEQ_1.1    0
2   SEQ2.2    2
3 AB_1_2.3    3
4    ACC.3    3
>

Chris Ruehlemann · Answer 3 · 2020-10-09T15:22:41.030

1

Here's a base Rsolution with sub:

Data:

df <- data.frame(
  COL1 = c("SEQ_1.1_0",
  "SEQ2.2_2",
  "AB_1_2.3_3",
  "ACC.3_3")
)

Solution:

df$COL2 <- sub(".*(\\d$)", "\\1", df$COL1) 
df$COL1 <- sub("_\\d$", "", df$COL1)

Result:

df
      COL1 COL2
1  SEQ_1.1    0
2   SEQ2.2    2
3 AB_1_2.3    3
4    ACC.3    3

edited Oct 09 '20 at 15:22

answered Oct 09 '20 at 15:16

Chris Ruehlemann

10,258
2
9
18

strsplit it by the last "number_"

3 Answers3