Basically, this is my input;
"a ~ b c d*e !r x"
"a ~ b c"
"a ~ b c d1 !r y",
"a ~ b c D !r z",
"a~b c d*e!r z"
and would desire this as my result;
"b c d*e"
"b c"
"b c d1"
"b c D"
"b c d*e"
The input represents (mixed) models that are built up of three groups, i.e. the dependent part (~
) the fixed part and the random part (!r
). I thought with capture groups it would be easy enough (example). The difficulty is the random part which is not always present.
I tried different things as you can see below and of course it possible to do this in two steps. However, I desire a (robust) regex one-liner - I feel that should be possible. I employed these different sources as well for inspiration; non-capturing groups, string replacing and string removal.
library(stringr)
txt <- c("a ~ b c d*e !r x",
"a ~ b c",
"a ~ b c d1 !r y",
"a ~ b c D !r z",
"a~b c d*e!r z")
# Different tries with capture groups
str_replace(txt, "^.*~ (.*) !r.*$", "\\1")
> [1] "b c d*e" "a ~ b c" "b c d1" "b c D"
> [5] "a~b c d*e!r z"
str_replace(txt, "^(.*~ )(.*)( !r.*)$", "\\2")
> [1] "b c d*e" "a ~ b c" "b c d1" "b c D"
> [5] "a~b c d*e!r z"
str_replace(txt, "^(.*~)(.*)(!r.*|\n)$", "\\1\\2")
> [1] "a ~ b c d*e " "a ~ b c" "a ~ b c d1 " "a ~ b c D "
> [5] "a~b c d*e"
str_replace(txt, "^(.*) ~ (.*)!r.*($)", "\\2")
> [1] "b c d*e " "a ~ b c" "b c d1 " "b c D "
> [5] "a~b c d*e!r z"
str_replace(txt, "^.* ~ (.*)(!r.*|\n)$", "\\1")
> [1] "b c d*e " "a ~ b c" "b c d1 " "b c D "
> [5] "a~b c d*e!r z"
# Multiple steps
step1 <- str_replace(txt, "^.*~\\s*", "")
step2 <- str_replace(step1, "\\s*!r.*$", "")
step2
> "b c d*e" "b c" "b c d1" "b c D" "b c d*e"
EDIT: After posting I kept playing around and found something that worked for my particular case.
# My (probably non-robust) solution/monstrosity
str_replace(txt, "(^.*~\\s*(.*)\\s*!r.*$|^.*~\\s*(.*)$)", "\\2\\3")
> "b c d*e " "b c" "b c d1 " "b c D " "b c d*e"