0

I have a single string vector containing the address data of certain locations, and would like to remove any information contained with the square brackets.

address <- "[Market Street Food Centre]50, Market Street, Golden Shoe Multi-Storey Car Park, 2nd/3rd Storey, S(048940);[Berseh Food Centre]166, Jalan Besar, S(208877); [Dunman Food Centre]271, Onan Road, S(424768)"

The solution to this is as follows:

remove_names <- gsub(pattern = "\\[[^][]*\\]", "", address)

Output:

[1] "50, Market Street, Golden Shoe Multi-Storey Car Park, 2nd/3rd Storey, S(048940);166, Jalan Besar, S(208877); 271, Onan Road, S(424768)"

However, I do not understand some portions of the regex expression. More specifically, what does [^] and []* mean?

aaaroo
  • 3
  • 1

1 Answers1

1
\\[[^][]*\\]
\\[      \\]  Matching outer [ and ]
   [   ]      A character set
        *     That charset repeated 0 or more times
    ^         Invert the set that follows
     ][       Set of two characters (square brackets)

The goal seems to be to remove any bracketed string. But, since brackets might be nested, the text between brackets is limited to non-brackets.

Rick James
  • 106,233
  • 9
  • 103
  • 171