0

I write some java code to split string into array of string. First, I split that string using regex pattern "\\,\\,|\\," and then I split using pattern "\\,|\\,\\,". Why there are difference between output of the first and output of the second?

public class Test2 {
    public static void main(String[] args){

        String regex1 = "\\,\\,|\\,";
        String regex2 = "\\,|\\,\\,"; 

        String a  = "20140608,FT141590Z0LL,0608103611018634TCKJ3301000000018667,3000054789,IDR1742630000001,80507,1000,6012,TCKJ3301,6.00E+12,ID0010015,WADORI PURWANTO,,3000054789";
        String ss[] = a.split(regex1); 

        int index = 0; 
        for(String m : ss){
            System.out.println((index++)+ ": "+m+"|"); 
        }
    }
} 

Output when using regex1:

0: 20140608|
1: FT141590Z0LL|
2: 0608103611018634TCKJ3301000000018667|
3: 3000054789|
4: IDR1742630000001|
5: 80507|
6: 1000|
7: 6012|
8: TCKJ3301|
9: 6.00E+12|
10: ID0010015|
11: WADORI PURWANTO|
12: 3000054789|

And when using regex2:

0: 20140608|
1: FT141590Z0LL|
2: 0608103611018634TCKJ3301000000018667|
3: 3000054789|
4: IDR1742630000001|
5: 80507|
6: 1000|
7: 6012|
8: TCKJ3301|
9: 6.00E+12|
10: ID0010015|
11: WADORI PURWANTO|
12: |
13: 3000054789|

I need some explanation of how regex engine works when handling this situation.

Unihedron
  • 10,251
  • 13
  • 53
  • 66
Mohammad Fajar
  • 787
  • 1
  • 9
  • 23
  • 1
    You don't have to quote `,`. – Maroun Aug 07 '14 at 10:11
  • @MarounMaroun can you give specific answer based my question... – Mohammad Fajar Aug 07 '14 at 10:13
  • 1
    [MarounMaroun comment](http://stackoverflow.com/questions/25179366/priority-in-regex-manipulating#comment39206033_25179366) was not intended to be an answer, but just some additionall info which could imrpove readability of your question. To be short: you don't need to write `"\\,\\,|\\,"` when you can simply write `",,|,"`. – Pshemo Aug 07 '14 at 11:57

4 Answers4

4

How regex works: The state machine always reads from left to right. ,|,, == ,, as it always will only be matched to the first alternation:

img
(source: gyazo.com)

,,|, == ,,?:

x
(source: gyazo.com)


However, you should use ,,? instead so there's no backtracking:

r
(source: gyazo.com)

Glorfindel
  • 19,729
  • 13
  • 67
  • 91
Unihedron
  • 10,251
  • 13
  • 53
  • 66
1

Seeing the two results, it seems that the split method try to find the first expression at first ("," for regex2, ",," for regex1) and split the string, and then the second one, but after the first pass with regex2 there isn't a single "," left in the strings. That's why there is an empty string detected when ",," is read with regex2.

So for your regex to be useful, you need to write the more complex expression first.

takeo999
  • 151
  • 4
1

It will be evaluated from left to right. In regex1, \\,\\, is tried first, otherwise \\, is tried. That's why 12th String is not empty, because \\,\\, is matched in that case. For regex2, everything is matched using \\,, hence the empty String.

Swapnil
  • 7,762
  • 4
  • 34
  • 56
1

Case 1: Split by ,, else ,
This gets only first case, the rest split by ,.

Case 2: Split by , else ,,
gets all cases. So ,, gets split into word and ,word.
Then ,word gets split into " " and word.

Unihedron
  • 10,251
  • 13
  • 53
  • 66
vks
  • 63,206
  • 9
  • 78
  • 110