R Split string and keep substrings righthand of match?

Question

How to do this stringsplit() in R? Stop splitting when no first names seperated by dashes remain. Keep right hand side substring as given in results.

a <- c("tim/tom meyer XY900 123kncjd", "sepp/max/peter moser VK123 456xyz")

# result: 
c("tim meyer XY900 123kncjd", "tom meyer XY900 123kncjd", "sepp moser VK123 456xyz", "max moser VK123 456xyz", "peter moser VK123 456xyz")

Rich Scriven · Accepted Answer · 2016-02-05T20:47:24.477

Here is one possibility using a few of the different base string functions.

## get the lengths of the output for each first name
len <- lengths(gregexpr("/", sub(" .*", "", a), fixed = TRUE)) + 1L
## extract all the first names 
## using the fact that they all end at the first space character
fn <- scan(text = a, sep = "/", what = "", comment.char = " ")
## paste them together
paste0(fn, rep(regmatches(a, regexpr(" .*", a)), len))
# [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd"
# [3] "sepp moser VK123 456xyz"  "max moser VK123 456xyz"  
# [5] "peter moser VK123 456xyz"

Addition: Here is a second possibility, using a little less code. Might be a little faster too.

s <- strsplit(a, "\\/|( .*)")
paste0(unlist(s), rep(regmatches(a, regexpr(" .*", a)), lengths(s)))
# [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd"
# [3] "sepp moser VK123 456xyz"  "max moser VK123 456xyz"  
# [5] "peter moser VK123 456xyz"

Perfect! ..no loop and base functions is exactly what i was after;) — Kay, Feb 05 '16 at 15:39

score 2 · Answer 2 · answered Feb 05 '16 at 15:01

I'd do it like that (with stringi):

library("stringi")

a <- c("tim/tom meyer XY900 123kncjd", "sepp/max/peter moser VK123 456xyz")

stri_split_fixed(stri_match_first_regex(a, "(.+?)[ ]")[,2], "/") -> start
stri_match_first_regex(a, "[ ](.+)")[,2] -> end


for(i in 1:length(end)){
    start[[i]] <- paste(start[[i]], end[i])
}

unlist(start)

## [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd" "sepp moser VK123 456xyz" 
## [4] "max moser VK123 456xyz"   "peter moser VK123 456xyz"

score 2 · Answer 3 · answered Feb 05 '16 at 15:35

Why not one more approach to show there are many ways to an R solution. Split the string by / symbol. Separate the first names from the rest of the string. Then combine with paste. Interesting question btw:

unlist(sapply(strsplit(a, "/"), function(x) {len <- length(x)
  last <- gsub("^(\\w+).*", "\\1", x[len])
  fill <- gsub("^\\w+ ", "", x[len])
  paste(c(x[-len], last), fill)}))
# [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd" "sepp moser VK123 456xyz" 
# [4] "max moser VK123 456xyz"   "peter moser VK123 456xyz"

score 1 · Answer 4 · answered Feb 05 '16 at 15:26

Here's one approach:

a <- c('tim/tom meyer XY900 123kncjd','sepp/max/peter moser VK123 456xyz');
do.call(c,lapply(strsplit(a,' '),function(w) apply(expand.grid(strsplit(w,'/')),1,paste,collapse=' ')));
## [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd" "sepp moser VK123 456xyz"  "max moser VK123 456xyz"   "peter moser VK123 456xyz"

An advantage of this solution is that it performs the split and recombination for all words in each string, rather than just the first word, properly returning the full cartesian product of all word variants:

a <- c('a/b/c d/e/f g/h/i','j/k/l m/n/o p/q/r');
do.call(c,lapply(strsplit(a,' '),function(w) apply(expand.grid(strsplit(w,'/')),1,paste,collapse=' ')));
## [1] "a d g" "b d g" "c d g" "a e g" "b e g" "c e g" "a f g" "b f g" "c f g" "a d h" "b d h" "c d h" "a e h" "b e h" "c e h" "a f h" "b f h" "c f h" "a d i" "b d i" "c d i" "a e i" "b e i" "c e i" "a f i" "b f i" "c f i" "j m p" "k m p" "l m p" "j n p" "k n p" "l n p" "j o p" "k o p" "l o p" "j m q" "k m q" "l m q" "j n q" "k n q" "l n q" "j o q" "k o q" "l o q" "j m r" "k m r" "l m r" "j n r" "k n r" "l n r" "j o r" "k o r" "l o r"

R Split string and keep substrings righthand of match?

4 Answers4