1

In java I am trying to extract column names and their values using Regex and Matcher but dont know what I am doing wrong here.

    String sql = "INSERT INTO MyTable (column1, column2, column3, column4, column5 ) VALUES (1, 'Hi', 'A,B', '', null)";
    String pattern = "INSERT INTO.*((\\w)+).*\\((.*)\\).*VALUES.*\\((.*)\\)";

    Pattern r = Pattern.compile(pattern);

    Matcher m = r.matcher(sql);
    if (m.find()) {
        System.out.println("Found value: " + m.group(0));
        System.out.println("Found value: " + m.group(1));
        System.out.println("Found value: " + m.group(2));
    } else {
        System.out.println("NO MATCH");
    }

Expectation:

Group-1 = column1, column2, column3, column4, column5
Group-2 = 1, 'Hi', 'A,B', '', null
anubhava
  • 664,788
  • 59
  • 469
  • 547
amicngh
  • 7,543
  • 3
  • 26
  • 48

2 Answers2

2

Don't use greedy .* in your regex. You can use this regex:

\bINSERT\s+INTO\s+\S+\s*\(([^)]+)\)\s*VALUES\s*\(([^)]+)\)

In Java:

String regex = "\\bINSERT\\s+INTO\\s+\\S+\\s*\\(([^)]+)\\)\\s*VALUES\\s*\\(([^)]+)\\)";

This will give:

Group 1: "column1, column2, column3, column4, column5 "
Group 2: "1, 'Hi', 'A,B', '', null"

RegEx Demo

anubhava
  • 664,788
  • 59
  • 469
  • 547
  • 1
    that's an even better answer than mine. I love the replacement of the greedy operator with 'match everything except back bracket – Andrew Magerman Nov 18 '16 at 11:18
  • @anubhava Thanks! It worked. can you help me to extract the values separated by ','. I am having trouble when splitting values with ',' (comma) because one value itself has comma inside. – amicngh Nov 18 '16 at 11:49
  • 1
    Hmm splitting value on comma but ignoring commas inside the quotes is not trivial problem. You see [this](http://stackoverflow.com/questions/18893390/splitting-on-comma-outside-quotes) and [this](http://stackoverflow.com/questions/1757065/java-splitting-a-comma-separated-string-but-ignoring-commas-in-quotes) – anubhava Nov 18 '16 at 11:57
1

Try this

"INSERT INTO.*\\((.*)\\).*VALUES.*\\((.*)\\)"

The mistake you were doing is not escaping the brackets. Without the escaping \( Regex assumes you are starting a group.

Andrew Magerman
  • 1,332
  • 12
  • 23