2

I am trying to split a SQL Query script, and on a create table, I only want the contents inside the outermost parenthesis, while ignoring any parenthesis inside, which are mainly describing maximum characters for a column type.

Here is a sample of my query:

CREATE TABLE IF NOT EXISTS %SCHEMA%.business (
    id UUID NOT NULL,
    name VARCHAR(50) NOT NULL,
    DBA VARCHAR(50),
    isactive BOOL NOT NULL DEFAULT TRUE,
    isdeleted BOOL NOT NULL DEFAULT FALSE,
    createdon TIMESTAMP(6) WITHOUT TIME ZONE NOT NULL
)
WITH (OIDS = FALSE)

I want to get back everything inside that describe the columns. I have tried these

\(([^\)]*)\) 

stops at the first (50), and reads the other (50) and (6), but not anything else, and doesn't reach the end of that outside parenthesis.

\((\d+)\)

This will only read the inside parenthesis, which could be useful later, but I only get back the two (50)'s and the (6).

What regular expression will capture all of the inside of that parenthesis? Even if it does capture more, I can always remove the excess strings through code later.

Aleks Andreev
  • 6,732
  • 8
  • 27
  • 36
as.beaulieu
  • 414
  • 7
  • 22
  • 2
    This [`(?<=\()(?:[^()]|(?\()|(?\)))+(?(B)(?!))(?=\))`](http://regexstorm.net/tester?p=%28%3f%3c%3d%5c%28%29%28%3f%3a%5b%5e%28%29%5d%7c%28%3f%3cB%3e%5c%28%29%7c%28%3f%3c-B%3e%5c%29%29%29%2b%28%3f%28B%29%28%3f!%29%29%28%3f%3d%5c%29%29&i=CREATE+TABLE+IF+NOT+EXISTS+%25SCHEMA%25.business+%28%0d%0a++++id+UUID+NOT+NULL%2c%0d%0a++++name+VARCHAR%2850%29+NOT+NULL%2c%0d%0a++++DBA+VARCHAR%2850%29%2c%0d%0a++++isactive+BOOL+NOT+NULL+DEFAULT+TRUE%2c%0d%0a++++isdeleted+BOOL+NOT+NULL+DEFAULT+FALSE%2c%0d%0a++++createdon+TIMESTAMP%286%29+WITHOUT+TIME+ZONE+NOT+NULL%0d%0a%29%0d%0aWITH+%28OIDS+%3d+FALSE%29)? – ctwheels Feb 26 '18 at 17:06
  • That was fast! Code runs it! Now to be greedy. Is there a way to also omit the "Oids = False" ? Either way, post that as an Answer so I can mark it correct! – as.beaulieu Feb 26 '18 at 17:11
  • If you need something a little more robust, you might consider using ScriptDom to actually parse out the SQL and glean the information you want from it. [Here's a LINQPad example](http://share.linqpad.net/h4wpgk.linq) – StriplingWarrior Feb 26 '18 at 17:13
  • I am open to all suggestions, however can you offer a link to a page or provide the example in a way other than downloading a file? – as.beaulieu Feb 26 '18 at 17:15
  • 1
    @as.beaulieu: Sure: https://blogs.msdn.microsoft.com/arvindsh/2013/04/04/using-the-transactsql-scriptdom-parser-to-get-statement-counts/ – StriplingWarrior Feb 26 '18 at 17:24

1 Answers1

4

.NET regular expressions have what's called Balancing Groups. This ensures that a starting delimiter is followed by a closing delimiter (both of your choice). If these do not balance it stops matching at that point in the string.

This is how you'd accomplish this using regex, but other alternatives are less breakable and may be better suited for the task.

See regex in use here

(?<=\()(?:[^()]|(?<B>\()|(?<-B>\)))+(?(B)(?!))(?=\))
  • (?<=\() Positive lookbehind ensuring what precedes is (
  • (?:[^()]|(?<B>\()|(?<-B>\)))+(?(B)(?!)) The magic. Simplistically put, this matches either any character that is not ( or ), or it matches the ( character while setting B to B+1, or it matches ) while setting B to B-1. It does this one or more times. (?(B)(?!)) indicates that if B is not balanced (anything except 0), it fails.
  • (?=\)) Positive lookahead ensuring what follows is )

Results in the two following matches:

id UUID NOT NULL,
name VARCHAR(50) NOT NULL,
DBA VARCHAR(50),
isactive BOOL NOT NULL DEFAULT TRUE,
isdeleted BOOL NOT NULL DEFAULT FALSE,
createdon TIMESTAMP(6) WITHOUT TIME ZONE NOT NULL

and

OIDS = FALSE

To remove the latter result, you could use the following (ensures ( is not followed by WITH) as seen in use here.

(?<=(?<!WITH *)\()(?:[^()]|(?<B>\()|(?<-B>\)))+(?(B)(?!))(?=\))
ctwheels
  • 19,377
  • 6
  • 29
  • 60
  • Excellent! Valid answer, and I appreciate the explanations of the regex magic itself to understand it better for other applications. – as.beaulieu Feb 26 '18 at 17:19
  • 1
    @as.beaulieu you're very welcome :) I added a second regex to get rid of the second result, although it may take additional tinkering with to get it working properly with some of your other strings (I assume you have more than one) – ctwheels Feb 26 '18 at 17:22
  • 1
    Oh absolutely. This accomplishes the heavy lifting, and now I just have to worry about some rough edges. – as.beaulieu Feb 26 '18 at 17:23