0

Straight to the point,

String str ="((a=10) AND ((b=13) AND (c=15)))AND((d=23) AND (e=14))AND((f=23) AND (g=15))OR((h=23) AND (i=13))";

I want to split this string using AND for getting the matches surrounded by () symbols

I am expecting output like this,

((a=10) AND ((b=13) AND (c=15))) ((d=23)AND (e=14)) ((f=23)AND (g=15))OR((h=23)AND (i=13))

I have tried many regEx but couldn't figure out the solution,

Thanks in advance

Dinoop paloli
  • 613
  • 2
  • 8
  • 25
  • Your String is wrong - "((a=10) AND ((b=13) AND (c=15))))" has 3 "(" but 4 ")" – TheLostMind Mar 18 '14 at 12:32
  • @WhoAmI Thank you,I have corrected. – Dinoop paloli Mar 18 '14 at 12:49
  • Assuming that parenthesis can be nested to any level [regex is not right tool for this](http://stackoverflow.com/questions/546433/regular-expression-to-match-outer-brackets). To solve this problem with regex you would have to use something like `str.split("\\s*AND\\s*(?=...$)")` where in place of `...` we would have to use regex which will check if number of opening parenthesis (after each `AND`) is the same as number of closing parenthesis, until the end of your string, because only such `AND`s you are looking for. Check [this](http://stackoverflow.com/a/524624/1393766) answer for solution. – Pshemo Mar 19 '14 at 00:28

4 Answers4

2
  1. I would not mess with REGEX.
  2. Start with LHS of the String (index 0) till the end string.length() and find the indices where the count of "(" equals count of ")". Seperate these as substrings.
TheLostMind
  • 34,842
  • 11
  • 64
  • 97
  • Any built-in api in Java for this? – Dinoop paloli Mar 18 '14 at 12:55
  • I will move forward as like this method, by incrementing and decrementing. Also planning to use well tested API like ANTLR for this in future. – Dinoop paloli Mar 19 '14 at 05:30
  • 1
    well .. I've tried this.. it goes a little tricky. first, at the beginning your count will be 0. So, you will have to handle that condition. Next, you will need to use subString() to extract the Strings.. that might get a little tricky.. Enjoy :) – TheLostMind Mar 19 '14 at 05:32
1

I think this is one of those things that a regular language (aka regex) cannot express. It is a while since I studied compiler theory, but for instance nested comments, e.g. /* outer /* inner */ */, is one of the things that just cannot be expressed, and your problems seems to be similarly close without me giving an authoritative answer on that, but I think you should give up regex as a tool for this problem, because even if it were possible I am convinced that a simple parse-count-parenthesis solution will be simpler and easier to understand/maintain.

hlovdal
  • 23,353
  • 10
  • 78
  • 148
  • Did you mean I want to proceed as like WhoAml suggested? – Dinoop paloli Mar 18 '14 at 12:52
  • Yes, start with a parenthesis_nesting_count of zero, and then increase/decrease as you encounter `(`/`)` characters while parsing the string character for character (possibly "cheating" by using indexOf to skip over blocks of non-interesting characters). – hlovdal Mar 18 '14 at 13:01
  • Thank you. I have heard something like Lexical parser. Is that applied in this context? – Dinoop paloli Mar 18 '14 at 13:05
1

hlovdal is right. You can not solve this with a regex.

What you need (and perhaps intend) is a parser.

For such a simple syntax a recursive descending parser should do the job. Usually a parser consists of a lexer (or tokenizer) (which divides the input into tokens / terms, see StringTokenizer in Java API) and a grammar. Usually the single terms can be expressed via regex. In your case the tokens/terms would be numbers (\d+), brackets "()" and the keywords (AND, OR). After transforming your input into a sequence of tokens, you process one by one and track the state of the parser, usually by deciding what to do while having a look at the next token.

WhoAmI suggestion is really similar. By suggesting to count the brackets he suggests a really simple kind of parser. At least it's lexer has to differentiate between Bracket-Tokens and the rest. The parsers state would be the count of open brackets.

There are also several frameworks out there to help generating parsers, like ANTLR, yacc etc. But perhaps they're too complex for your purposes.

SCI
  • 538
  • 3
  • 6
0
String str ="((a=10) AND ((b=13) AND (c=15)))AND((d=23) AND (e=14))AND((f=23) AND (g=15))OR((h=23) AND (i=13))";

    char ch[]="(".toCharArray();
    char ch1[]=")".toCharArray();
    int c=0;
    int c1=0;
    int temp=0;
    StringBuilder str1=null;
    for(int i=0;i<str.length();i++){        
        if(String.valueOf(str.charAt(i)).equals(String.valueOf(ch[0])))
            c=c+1;
        if(str.charAt(i)==ch1[0])
            c1=c1+1;    
        if(c==c1 && c!=temp){
            temp=c;
            if(i+1<str.length()){
            str1=new StringBuilder(str).insert(i+1, "\n");
            str=str1.toString();
            }
        }


    }

    System.out.println(str1.toString());


The Output will be like this:
((a=10) AND ((b=13) AND (c=15)))
AND((d=23) AND (e=14))
AND((f=23) AND (g=15))
OR((h=23) AND (i=13))
Nirmal
  • 181
  • 3