This is a pretty complicated Regex problem. I would honestly recommend that instead of using this solution, you instead separate out the variable that you want and make them tidy.
However, you asked this question, so here's a regex answer. I've used the stringr
package because I find it easier and more readable than grep
.
The regex breaks down like this:
(?<=-)
- Positive lookbehind to find a - but don't capture it
(\\d+[\\~\\,] ?[^\\-]*)+
- Capture a number of 1 or more digits followed by either a ~ or a , followed maybe a space followed by 0 or more characters that aren't a -. Capture a group that is 1 or more of these combinations of characters long.
((?=, *\\d+-)|$)
- Find either a forward lookahead after the previous capture that contains a , some spaces and a number of 1 or more digits long, or capture the end of line character.
replacement= "(\\1)"
- Replace the result that you captured with ( then the first group you captured then )
library(stringr)
sample_string1 = "376-12~23, 28, 32, 35, 37,376-1"
sample_string2 = "391-1~8, 391-22~23"
sample_string3 = "391-10~21, 391-24, 27, 29"
# (?!u)
ss1 <- str_replace_all(sample_string1,
"(?<=-)(\\d+[\\~\\,] ?[^\\-]*)+((?=, *\\d+-)|$)",
replacement= "(\\1)")
ss1
# "376-(12~23, 28, 32, 35, 37),376-1"
ss2 <- str_replace_all(sample_string2,
"(?<=-)(\\d+[\\~\\,] ?[^\\-]*)+((?=, *\\d+-)|$)",
replacement= "(\\1)")
ss2
# "391-(1~8), 391-(22~23)"
ss3 <- str_replace_all(sample_string3,
"(?<=-)(\\d+[\\~\\,] ?[^\\-]*)+((?=, *\\d+-)|$)",
replacement= "(\\1)")
ss3
# "391-(10~21), 391-(24, 27, 29)"