6

I've a regex that matches comma separated numbers with an optional two digit decimal part in a given multiline text.

/(?<=\s|^)\d{1,3}(,\d{3})*(\.\d{2})?(?=\s|$)/m

It matches strings like 1, 12, 12.34, 12,345.67 etc successfully. How can I modify it to match a number with only the decimal part like .23?

EDIT: Just to clarify - I would like to modify the regex so that it matches 12, 12.34 and .34

And I am looking for 'stand alone' valid numbers. i.e., number-strings whose boundaries are either white space or start/end of line/string.

Amarghosh
  • 55,378
  • 11
  • 87
  • 119
  • 1
    Would be cool over here where commas are decimal delimiters and the dot is sometimes used as a thousands separator (though more commonly a space is) ^^ – Oskar Duveborn Oct 14 '09 at 12:56
  • Oskar, that's just pure evil :D though great fun for a regex :D – Mez Oct 14 '09 at 13:00
  • @Oskar I was about to ask where in the hell are you living. But apparently there are lot of places where people use commas as decimal separator http://en.wikipedia.org/wiki/Decimal_separator#Countries_using_Arabic_numerals_with_decimal_comma I don't even wanna think about that part now. Even the normal notation is enough of a head ache to me :) – Amarghosh Oct 14 '09 at 13:09
  • Check @Mez's answer. He covers both cases. – Amarghosh Oct 14 '09 at 15:09

4 Answers4

11

This:

\d{1,3}(,\d{3})*(\.\d\d)?|\.\d\d

matches all of the following numbers:

1
12
.99
12.34 
12,345.67
999,999,999,999,999.99

If you want to exclude numbers like 123a (street addresses for example), or 123.123 (numbers with more than 2 digits after the decimal point), try:

(?<=\s|^)(\d{1,3}(,\d{3})*(\.\d\d)?|\.\d\d)(?=\s|$)

A little demo (I guessed you're using PHP):

$text = "666a 1 fd 12 dfsa .99 fds 12.34 dfs 12,345.67 er 666.666 er 999,999,999,999,999.99";
$number_regex = "/(?<=\s|^)(?:\d{1,3}(?:,\d{3})*(?:\.\d\d)?|\.\d\d)(?=\s|$)/";
if(preg_match_all($number_regex, $text, $matches)) {
  print_r($matches);
}

which will output:

Array
(
    [0] => Array
        (
            [0] => 1
            [1] => 12
            [2] => .99
            [3] => 12.34
            [4] => 12,345.67
            [5] => 999,999,999,999,999.99
        )

)

Note that it ignores the strings 666a and 666.666

Bart Kiers
  • 153,868
  • 34
  • 276
  • 272
  • But that also matches `14` in `14.` or `145` in `145.2` and `1,344.12` in `1,344.123` – Amarghosh Oct 14 '09 at 12:59
  • See my comment about word boundaries. – Bart Kiers Oct 14 '09 at 13:01
  • I'm using actionscript and apparently `.` is considered as word boundary and hence it still matches `14` in `asd 14. asd`. I'd already tried `\b` and found this issue, that's when I chose to use look around that I got from another SO thread. And btw, if you remember, I started this with the regex that you gave me in http://stackoverflow.com/questions/1547574/regex-for-prices/1547585#1547585 – Amarghosh Oct 14 '09 at 13:17
  • Yes, you're right. Then use what you already posted yourself: replace the word boundaries by `(?<=\s|^)` and `(?=\s|$)`. Can your data contain strings like `123,123.12,456.45` (ie two successive numbers separated by a comma)? If so, could you then please adjust your original question and add all these corner cases? – Bart Kiers Oct 14 '09 at 13:23
  • Nope. I am looking for 'stand alone' valid numbers. Boundaries would be either white space or start/end of line/string. Will add this to the question. Sorry for the misunderstanding. – Amarghosh Oct 14 '09 at 13:44
  • No problem Amarghosh. See the edit, I think that covers it now. – Bart Kiers Oct 14 '09 at 13:51
  • Your updated regex that changed the structure from `(fixed.decimal)|(.decimal)` to `(fixed.decimal|.decimal)` is working fine. What is the difference between the two? – Amarghosh Oct 14 '09 at 13:51
  • 2
    Like that, there is no difference, but when doing: `^a|b$` it matches either an `a` at the beginning of the string, OR a `b` at the end. While `^(a|b)$` means: either an `a` or `b`. – Bart Kiers Oct 14 '09 at 13:54
  • 1
    Example: `a(b|c)|(d|e)f` would match ab or df, but not abf, wherase `a((b|c)|(d|e))f` would match abf, bit not df or ab – Mez Oct 14 '09 at 13:55
  • Eureka.. `(fix.dec)|(.dec)` fails on `14.` because it matches `^` followed by a `fix.dec` where dec is optional (which is `14`) followed by whatever (`.` here) or whatever followed by `.dec`. Thanks a ton, especially for that last two comments. – Amarghosh Oct 14 '09 at 14:36
2
/(?<=\s|^)(\d{1,3}(,\d{3})*(\.\d{2})?|\.(\d{2}))(?=\s|$)/m

Or taking into account some countries where . is used as a thousand seperator, and , is used as a decimal seperator

/(?<=\s|^)(\d{1,3}(,\d{3})*(\.\d{2})?|\d{1,3}(\.\d{3})*(,\d{2})?|\.(\d{2})|,(\d{2}))(?=\s|$)/m

Insane Regex for Internationalisation

/((?<=\s)|(?<=^))(((\d{1,3})((,\d{3})|(\.\d{3}))*(((?<=(,\d{3}))(\.\d{2}))|((?<=(\.\d{3}))(,\d{2}))|((?<!((,\d{3})|(\.\d{3})))([\.,]\d{2}))))|([\.,]\d{2}))(?=\s|$)/m

Matches

14.23
14,23
114,114,114.23
114.114.114,23

Doesn't match

14.
114,114,114,23
114.114.144.23
,
.
<empty line>
Mez
  • 22,526
  • 14
  • 67
  • 91
  • `([0-9,\.])` matches a single character, to begin with. Even if you add a + it would matches , etc. – Amarghosh Oct 14 '09 at 13:03
  • What is the difference between your 1st regex that encloses the whole thing in parenthesis like `(fixed.decimal|.decimal)` and `/(?<=\s|^)(\d{1,3}(,\d{3})*(\.\d{2})?)|(\.\d{2})(?=\s|$)/` that puts them in separate parenthesis like `(fixed.decimal)|(.decimal)`? (other than the fact that second one matches `14` in `14.`) – Amarghosh Oct 14 '09 at 13:38
  • To rephrase the last comment, why doesn't `(fixed.decimal)|(.decimal)` work? Is there any operator precedence that I am missing? – Amarghosh Oct 14 '09 at 13:41
  • I don't think there is a difference..... I'd need to see the 2 alongside each other to spot the difference. – Mez Oct 14 '09 at 13:44
  • 1
    Ah, it depends on what you've got around it. `a(fixed.decimal)|(.decimal)b` would not be the same as `a((fixed.deciman)|(.decimal))b` – Mez Oct 14 '09 at 13:47
  • `/(?<=\s|^)(\d{1,3}(,\d{3})*(\.\d{2})?)|(\.\d{2})(?=\s|$)/m` and the first regex in your answer. – Amarghosh Oct 14 '09 at 13:49
  • 1
    In this case, no. As the look behinds/aheads are none matching. In a case where the look behind or ahead were not matching, then the brackets are there to limit the boundaries of the or. – Mez Oct 14 '09 at 13:53
  • But `/(?<=\s|^)(\d{1,3}(,\d{3})*(\.\d{2})?|\.\d{2})(?=\s|$)/m` and `/(?<=\s|^)(\d{1,3}(,\d{3})*(\.\d{2})?)|(\.\d{2})(?=\s|$)/m` are giving me different results. 1st one works fine, but the second one matches `14` in `asd 14. asd`, `145` in `asd 145.2 asd` and `1,344.12` in `asd 1,344.123 asd` – Amarghosh Oct 14 '09 at 14:00
  • Yeah. Thanks to your and @Bart's comments about `()`, now I understand the difference. I wish I could accept both answers :) – Amarghosh Oct 14 '09 at 14:39
  • I understood the second one. But I don't dare even try the 'insane' one today. I've just started learning and I believe am not yet ready for such 'insanities' yet. Will come back later and try to break it down. Thanks again. Unfortunately I can accept only one answer, but I'm gonna upvote you somewhere else :) – Amarghosh Oct 14 '09 at 15:01
0

This answer treats with this question more comprehensively.

Community
  • 1
  • 1
tchrist
  • 74,913
  • 28
  • 118
  • 169
-1

(@"^((([0-9]+)(.([0-9]+))?)(\,(([0-9]+)(.([0-9]+))?))*)$")

This works for comma separated whole number or comma separated decimal numbers.

Example: Happy scenarios: case 1) 9,10 case 2) 10.1,11,12,15,15.2 case 3) 9.8 case 4) 9

Sad scenarios: case 1) 2..7 case 2) 2,,7 case 3) 2. case 4) 7, case 5) , case 6) . case 7) .2 case 8) ,2

code code
  • 39
  • 5