-1

I wanted to add parentheses to the below strings under a condition. The numbers consist of two parts: "Id - subId", and I wanted to put parenthesis when there are multiple subId.

sample_string1 = "376-12~23, 28, 32, 35, 37,376-1"
sample_string2 = "391-1~8, 391-22~23"
sample_string3 = "391-10~21,  391-24, 27, 29"

These are my desirable outcome.

desire_string1 = "376-(12~23, 28, 32, 35, 37),376-1"
desire_string2 = "391-(1~8), 391-(22~23)"
desire_string3 = "391-(10~21),  391-(24, 27, 29)"

How can I do this? Thanks in advance

markus
  • 23,189
  • 5
  • 29
  • 47
John legend2
  • 745
  • 6
  • 15

2 Answers2

0

This is a pretty complicated Regex problem. I would honestly recommend that instead of using this solution, you instead separate out the variable that you want and make them tidy.

However, you asked this question, so here's a regex answer. I've used the stringr package because I find it easier and more readable than grep.

The regex breaks down like this:

(?<=-) - Positive lookbehind to find a - but don't capture it

(\\d+[\\~\\,] ?[^\\-]*)+ - Capture a number of 1 or more digits followed by either a ~ or a , followed maybe a space followed by 0 or more characters that aren't a -. Capture a group that is 1 or more of these combinations of characters long.

((?=, *\\d+-)|$) - Find either a forward lookahead after the previous capture that contains a , some spaces and a number of 1 or more digits long, or capture the end of line character.

replacement= "(\\1)" - Replace the result that you captured with ( then the first group you captured then )

library(stringr)

sample_string1 = "376-12~23, 28, 32, 35, 37,376-1"
sample_string2 = "391-1~8, 391-22~23"
sample_string3 = "391-10~21,  391-24, 27, 29"
# (?!u)

ss1 <- str_replace_all(sample_string1,
                       "(?<=-)(\\d+[\\~\\,] ?[^\\-]*)+((?=, *\\d+-)|$)",
                       replacement= "(\\1)")
ss1
# "376-(12~23, 28, 32, 35, 37),376-1"

ss2 <- str_replace_all(sample_string2,
                       "(?<=-)(\\d+[\\~\\,] ?[^\\-]*)+((?=, *\\d+-)|$)",
                       replacement= "(\\1)")
ss2
# "391-(1~8), 391-(22~23)"

ss3 <- str_replace_all(sample_string3,
                       "(?<=-)(\\d+[\\~\\,] ?[^\\-]*)+((?=, *\\d+-)|$)",
                       replacement= "(\\1)")
ss3
# "391-(10~21),  391-(24, 27, 29)"
Adam Sampson
  • 1,418
  • 1
  • 4
  • 14
0

A regex that produces the correct output is:

(?:(\d+-)((?:\d+~\d+|(?:,?\s*\d+){2,})+)(?=,\s*\d+-|\"))

Demo: https://regex101.com/r/QHDCMd/1/

  • (\d+-) match the ID and the dash
  • \d+~\d+ match a subid range or ...
  • (?:,?\s*\d+){2,} at least two subids
  • (?=,\s*\d+-|\") positive look-ahead for next ID or closing quotes
alex-dl
  • 612
  • 1
  • 1
  • 8