-2

I'm trying to create patterns to be used in Java for the following two strings:

CIRCLE ( (187.8562 ,-88.562 ) , 0.774 ) 

and

POLYGON ( (17.766 55.76676,77.97666 -32.866888,54.97799 54.2131,67.666777 24.9771,17.766 55.76676) )

Please note that

  1. one/more white spaces may exist anywhere.Exceptions are not between alphabets.And not between any digits of a number. [UPDATED]

  2. CIRCLE and POLYGON words are fixed but are not case sensitive.[UPDATED]

  3. For the 2nd string the number of point set are not fixed.Here I've given 5 set of points for simplicity.

  4. points are set of decimal/integer numbers [UPDATED]

  5. positive decimal number can have a + sign [UPDATED]

  6. leading zero is not mandatory for a decimal number [UPDATED]

  7. For polygon atleast 3 point set are required.And also first & last point set will be the same (enclosed polygon) [UPDATED]

Any help or suggestion will be appreciated.

I've tried as:

(CIRCLE)(\\s+)(\\()(\\s+)(\\()(\\s+)([+-]?\\d*\\.\\d+)(?![-+0-9\\.])(\\s+)(,)(\\s+)([+-]?\\d*\\.\\d+)(?![-+0-9\\.])(\\s+)(\\))(\\s+)(,)(\\s+)([+-]?\\d*\\.\\d+)(?![-+0-9\\.])(\\s+)(\\))

Could you please provide me the working regex pattern for those two string?

ani
  • 5
  • 3
  • 2
    possible duplicate of [Learning Regular Expressions](http://stackoverflow.com/questions/4736/learning-regular-expressions) – ShellFish Jun 25 '15 at 12:08
  • Be more specific what should and shouldn't be matched. Both strings are matched by `.*` but I highly doubt that's what you want. – ShellFish Jun 25 '15 at 12:10

3 Answers3

0

I suggest you to remove space from your string before submitting it to the regex.

Circle:

CIRCLE\(\(-?\d+\.\d+,-?\d+\.\d+\),[-]?\d+\.\d+\)

Polygon:

POLYGON\(\((-?\d+\.\d+\s+-?\d+\.\d+,)+-?\d+\.\d+\s+-?\d+\.\d+\)\)

Circle including spaces:

CIRCLE\s*\(\s*\(\s*-?\d+\.\d+\s*,\s*-?\d+\.\d+\s*\)\s*,\s*-?\d+\.\d+\s*\)

Polygon including spaces:

POLYGON\s*\(\s*\(\s*(-?\d+\.\d+\s+-?\d+\.\d+\s*,\s*)+\s*-?\d+\.\d+\s+-?\d+\.\d+\s*\)\s*\)

Circle including spaces updated:

/CIRCLE\s*\(\s*\(\s*[+-]?\d*\.\d+\s*,\s*[+-]?\d*\.\d+\s*\)\s*,\s*[+-]?\d*\.\d+\s*\)/i

Polygon including spaces updated:

/POLYGON\s*\(\s*\(\s*([+-]?\d*\.\d+)\s+([+-]?\d*\.\d+)\s*(,\s*[+-]?\d*\.\d+\s+[+-]?\d*\.\d+)+\s*,\s*\1\s+\2\s*\)\s*\)/i
Delgan
  • 14,714
  • 6
  • 77
  • 119
  • removing the space is not possible in this case.:( – ani Jun 25 '15 at 12:20
  • @ani I added the regex considering spaces. – Delgan Jun 25 '15 at 12:32
  • It's failing for CIRCLE((+187.8562, -88.562),.774) and POLYGON ( (17.766 55.76676,77.97666 -32.866888,+54.97799 54.2131,.666777 24.9771,17.766 55.76676) ) – ani Jun 25 '15 at 12:46
  • @ani You did not precise that positive number could be preceded by `+` and that < 1 numbers could be displayed without the `0`. Your examples showed the opposite. – Delgan Jun 25 '15 at 12:49
  • Sorry for that.Could you please consider this.Also it will not be case sensitive and in case of polygon the first and the last point set will be the same which means it's a enclosed plolygon.Is it possible? – ani Jun 25 '15 at 12:54
  • @ani You can fix it easily by replacing `-?` with `[+-]?` and `\d+\.` with `\d*\.`. – Delgan Jun 25 '15 at 12:56
  • I appreciate your help.I have added few more points in the main question.Could you please look into that. I think I'm very close the solution. – ani Jun 25 '15 at 13:17
  • @ani Updated, should work now... Note that `/.../i` is for case insensitive. – Delgan Jun 25 '15 at 13:43
  • The following is failing: POLYGON ( (+17.766 55.76676,77.97666 -32.866888,54.97799 54.2131,.666777 24.9771,17.766 55.76676 ) ) – ani Jun 25 '15 at 17:32
  • @ani It is becoming insane. It fails because `+17.766 55.76676` != `17.766 55.76676`. It is hard to go over it... Can not you just remove `+` from your string before performing regex parsing? – Delgan Jun 25 '15 at 17:42
  • I'm not forming this field , this is an input and I'm avoiding the parsing and that’s the reason I'm using strict regex validation.I hope now u can understand the situation. – ani Jun 25 '15 at 17:47
  • @ani We need to use conditional operators to solve this, and unfortunately, I do not know enough about regex to help you then. – Delgan Jun 25 '15 at 18:21
  • Although I appreciate ur help.If there is no other way I will parse the input field.Thanks. – ani Jun 25 '15 at 18:24
0

UPDATED ANSWER:

This match examples from question and comments:

(CIRCLE|POLYGON)([( ]+)([+ \-\.]?(\d+)?([ \.]\d+[ ,)]+))+
m.cekiera
  • 5,307
  • 5
  • 19
  • 35
0

Any help or suggestion will be appreciated.

My suggestion is to break it up into pieces. Just as you'd want to break up a large, complex function into smaller functions so that each part is easy to see and understand, you want to break up a large, complex regex pattern into smaller patterns for the same reason. For example:

private interface Patterns {
    String UNSIGNED_INTEGER = "(?:0|[1-9]\\d*+)";
    String DECIMAL_PART = "(?:[.]\\d++)";
    String UNSIGNED_NUMBER_WITH_INTEGER_PART =
        "(?:" + UNSIGNED_INTEGER + DECIMAL_PART + "?+)";
    String UNSIGNED_NUMBER =
        "(?:" + UNSIGNED_NUMBER_WITH_INTEGER_PART + "|" + DECIMAL_PART ")";
    String NUMBER = "(?:[+-]?+" + UNSIGNED_NUMBER + ")";
    String SPACE_SEPARATED_PAIR = "(?:" + NUMBER + "\\s++" + NUMBER + ")";
    String OPTIONAL_SPACE = "(?:\\s*+)";
    String LPAREN = "(?:" + OPTIONAL_SPACE + "[(]" + OPTIONAL_SPACE + ")";
    String RPAREN = "(?:" + OPTIONAL_SPACE + "[)]" + OPTIONAL_SPACE + ")";
    String COMMA = "(?:" + OPTIONAL_SPACE + "," + OPTIONAL_SPACE + ")";
    Pattern CIRCLE = Pattern.compile(
        OPTIONAL_SPACE + "CIRCLE" + OPTIONAL_SPACE + LPAREN +
            LPAREN +
                NUMBER + COMMA + NUMBER +
            RPAREN + COMMA +
            NUMBER +
        RPAREN + OPTIONAL_SPACE,
        Pattern.CASE_INSENSITIVE);
    Pattern POLYGON = Pattern.compile(
        OPTIONAL_SPACE + "POLYGON" + OPTIONAL_SPACE + LPAREN +
            LPAREN +
                NUMBER_PAIR + "(?:" + COMMA + NUMBER_PAIR + "){3,}+" +
            RPAREN
        RPAREN + OPTIONAL_SPACE,
        Pattern.CASE_INSENSITIVE);
}

Notes:

  • The above is not tested. My goal was to show you how to do this maintainably, rather than to simply do it for you. (It should work as-is, though, unless I have typos or whatnot.)
  • Note the pervasive use of non-capture groups (?:...). This allows each subpattern to be a separate module; for example, something like COMMA + "+" is well-defined as meaning "one or more commas, plus optional spaces".
  • Also note the pervasive use of possessive quantifiers like ?+ and *+ and ++. It's easier to tell what is matched by a given occurrence of NUMBER when you know that NUMBER will never "stop short" before a trailing digit. (Imagine having a function whose behavior depended on the code that runs after it. That would be confusing, right? Well, the non-possessive quantifiers can change their meaning depending on what follows, which can have similarly confusing results for large, complex regexes.) This also has considerable performance benefits in the event of a near-match.
  • I made no attempt to detect the "And also first & last point set will be the same (enclosed polygon)" case. Regexes are not suited to this, since regexes are string-description language, and "same" in this case is not a string concept but a mathematical one. (It's easy to tell that 1 +0.3 is equivalent to +1.0 .30 if you use something like BigDecimal to store the actual values; but to try to express that using a regex would be pure folly.)
ruakh
  • 156,364
  • 23
  • 244
  • 282