2

I'm trying to match numbers greater than 40. The good point is that all of them have 2 decimal places, so all of them are like: 3.25, 5.89, 999.75 and they don't use any leading zeros (except on the decimal part that always have 2 digits)... At first I tried the following code but then I realized this wouldn't match numbers like 100, 1000... even if they are greater than 40.

[4-9][0-9]\.

I don't have to match the decimal part, so don't worry about matching that, just help me to find how to match numbers greater than 40 (up to 9999 would be fine).

Thanks for your help.

Filipe Santos
  • 31
  • 1
  • 4
  • 4
    This is the most unsuitable place for using regex. – Maroun Jan 06 '15 at 22:14
  • 3
    Why use regex? Wouldn't it be easier to convert the string to a number and use the `>` operator? – Mureinik Jan 06 '15 at 22:14
  • 1
    Welcome to SO! Please have a look at the tag-information (hover over the tag): General reference for regex: http://stackoverflow.com/q/22937618 Remember to **include a tag specifying the programming language or tool you are using.** – Stewie Griffin Jan 06 '15 at 22:14
  • 1
    @Robert: I was confused for a second, then I clicked the username link. :) – Robert P Jan 06 '15 at 22:16
  • 2
    @RobertP Are you talking to yourself? :/ #$%# I've just realized only dot separates between you guys LOL.. I have to printscreen this. – Maroun Jan 06 '15 at 22:17
  • 1
    @RobertP Hahaha =) I had to check it too to see what was going on! – Stewie Griffin Jan 06 '15 at 22:17
  • We need regex because we are using a software that only provides regex matching. I know regex it's not the best way to match numbers but it's the only choice we have at the moment(this will be temporart though because we plan to develeop a specific software for this task). – Filipe Santos Jan 06 '15 at 22:27
  • You could match all numbers and then add a negative lookahead which excludes negative numbers (check for -) and 0-40 numbers. Now the big question is: does your software support this? – HamZa Jan 06 '15 at 22:55
  • @MarounMaroun, why? I suppose that if you asseverate that so firmly you'll have good reasons to do. I actually think you are completely wrong, as normally regex are implemented using finite state automata, and in this case with only one pass through the data you can extract all the numbers individually. I cannot figure any case for this problem to be solved better than with a regexp. – Luis Colorado Jan 08 '15 at 14:11
  • @Mureinik what software are you using to convert this to a number and why do you think it is not using a finite state automaton to check syntax, which is an alternate way to say a regexp ? – Luis Colorado Jan 08 '15 at 14:13
  • @LuisColorado Depending on the language OP is using. There are many libraries that can do that easily without the need of regex. On most languages I always try to avoid using regex, only when necessary. – Maroun Jan 08 '15 at 14:14
  • @LuisColorado Convert the string to double, then check if it's ` > 40`. No regex or finite state automaton is involved here. – Maroun Jan 08 '15 at 14:15
  • But he has all the numbers in a string and needs to separate and to check format.... cannot see an scenario where regexp is not appropiate for this problem. He is trying to **match** numbers, has not said anything about converting them... perhaps he has only to surround them by double quotes and print.... in that case regexp is not only good, I think then it is the best way to solve the problem. – Luis Colorado Jan 08 '15 at 14:15
  • @LuisColorado It's much more expensive in terms of performance. Try to have a file with 1000+ numbers, iterate over them, match regexes.. Now try to convert each to a double and compare using `>`. – Maroun Jan 08 '15 at 14:19
  • and just to match numbers greater than 40, he doesn't need to convert them to numeric form, he can do it as he reads characters and telling yes or not after reading each. He only needs a finite automaton that says when he has accepted a number greater than 40 and when has accepted a number lower. – Luis Colorado Jan 08 '15 at 14:19
  • @LuisColorado You might be right, but regex engine and its implementation in various languages can be expensive and methods that matches or find patterns are expensive too. – Maroun Jan 08 '15 at 14:20
  • the regexp to match a number greater than 40 (with the format he has expressed) is: [0-9]*[4-9][0-9]\.[0-9]* this compiles to an automaton that accepts it one char at a time and **beeps** when one number has been recognized. I have no knowledge of a faster algorithm than this. – Luis Colorado Jan 08 '15 at 14:21
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/68419/discussion-between-luis-colorado-and-maroun-maroun). – Luis Colorado Jan 08 '15 at 15:05
  • @MarounMaroun: Are you sure [HTML](http://stackoverflow.com/a/1732454/2338750) is not more unsuitable? – Stewie Griffin Jan 12 '15 at 10:59

8 Answers8

6

This should do the job:

([4-9][0-9]|\d{3,})\.

Check it here: http://www.regexr.com/3a5v9

Pcsl
  • 128
  • 7
4

Don't use regular expressions for number comparison. If, for example, you're using Javascript:

var aNumber = parseFloat("50");
if (aNumber > 40) {
    // yay!
}
brandonscript
  • 57,554
  • 29
  • 142
  • 204
1

You can use brackets to indicate a minimum and, if desired, maximum number of characters to match. So,

([4-9][0-9]|[1-9][0-9]{2,})\.

matches 4-9 followed by one or more digits. Presumably there's a boundary of some sort at the beginning of this, but it sounds like you have that part worked out. This uses an OR to allow for two possible groups of first digits.

Christina
  • 1,309
  • 1
  • 9
  • 21
  • 1
    This regex doesnt match at all from 100 to 139 https://www.regex101.com/r/lC2mC5/1 – Andie2302 Jan 06 '15 at 23:49
  • Whoops, you are right. I focused on the multiple numbers and forgot about the 1-3. I'll update, and drink more coffee in future. – Christina Jan 07 '15 at 15:41
1
(40\.(?!0[^\d]|00)\d{1,2}|(((4[1-9](?!\d)|[5-9][0-9])(?![\d])|\d*[1-9]\d{2,})(\.\d{1,2})?))

This prevents false positives from leading 0s.

This worked for me.

  • It tries to match 40 followed by 1 or two decimals that are not 00.
  • It then tries to match 4 followed by 1-9, decimal optional.
  • If it can't match that it matches 5-9 followed by 0-9, decimal optional.
  • It then triese to match any digit, any number of times, followed by 1-9, followed by 1 or 2 digits, decimal optional.

If you want to require the decimal, just remove the last question mark.

Regular Jo
  • 4,348
  • 3
  • 18
  • 36
1

This will do it:

([4-9][0-9]+|\d{3,})

This it will get all the numbers of two digits having the first one greater than 4 or any number with three digits.

As an example http://www.regexr.com/3a5v0

panagdu
  • 2,055
  • 1
  • 21
  • 35
1

If your regex flavour can use negative lookbehind to match the numbers from 41 to 9999 without decimal:

\b(?:[1-9][0-9]{2,3}|[5-9][0-9]|4[1-9])(?<!\.\d{1,2})\b
Andie2302
  • 4,551
  • 3
  • 19
  • 39
0

(Most of the other answer are perfect for me -- This is paranoia and a bad idea :)

for use with grep -Po or Perl we could use:

'\b(\d{3,}|[4-9]\d)\.\d\d'

but this would get 40.00 (not greater than 40)

'\b(\d{3,}|[5-9]\d|4[1-9])\.\d\d|\b40\.\d?[1-9]\d?'

Corresponding to:

     DDD.DD 
| [5-9]D.DD
| 4[1-9].DD 
|     40.D[1-9] 
|     40.[1-9]D
JJoao
  • 3,791
  • 1
  • 14
  • 19
0

In flex(1) you have this code to parse strings and get numbers greater than 40:

pru.l:

%option noyywrap
%%
\+?(0*[4-9][0-9]|0*[1-9][0-9][0-9][0-9]*)(\.[0-9]*)?  { printf("Greater than 40: %s\n", yytext); }
\-?[0-9]*(\.[0-9]*)?                                { printf("Lesser than 40: %s\n", yytext); }
\n                                                |
.                                                 ;
%%

int main()
{ yylex(); }

Install flex and compile this file it with

make pru

Then run it as:

pru <filein >fileout

or just

pru

This code constructs a deterministic finite automaton from the regular expressions listed and prints the commands listed on the right when recognizes a value greater than 40. It allows a leading optional sign and leading zeros, and an optional fractional part composed of any number of digits. And it does this with only one asignment and one decision for each character read. You have access to the automaton state table generated by flex (it writes C code for you)

the regex that recognizes numbers greater than 40 (with decimals and leading sign and zeros) is:

\+?(0*[4-9][0-9]|0*[1-9][0-9][0-9][0-9]*)(\.[0-9]*)?

and can be abreviated as:

\+?(0*[4-9][0-9]|0*[1-9][0-9]{3,})(\.[0-9]*)?

explanation:

  • \+? matches an optional plus sign.
  • (...|...) two options:
  • 0* optional arbitrary number of leadin zeros.
  • [4-9][0-9] the numbers 40 to 99
  • [1-9][0-9]{3,} the numbers 100 and up.
  • (.[0-9]*)? optional decimal point followed by an arbitrary number of digits.
Community
  • 1
  • 1
Luis Colorado
  • 8,037
  • 1
  • 10
  • 27