0

There are many expressions to match with number, something like \d , 1[0-9][0-9], \d{4},... But how to check if the expression matches only number?

For example, \d matches 0 to9, so the expression is true as "matches only number".

Another example, 1[0-9a-z] matches number like 10, but also matches 1s which is not number. So false as "matches only number".

It is difficult to describe what "number" is, as mentioned here, so I would like to restrict it just an integer.

My question doesn't mean that I want an expression which matches integer, but kind of reverse regex to check expression itself. Thanks to read! :)

Community
  • 1
  • 1
HayatoY
  • 507
  • 1
  • 3
  • 13
  • 3
    Are you trying to programmatically determine using one regex whether another regex could ever match any string that is not just integers? Except for trivial cases where there is only `\d` or `[0-9]` and multipliers, this is *extremely difficult*. – jonrsharpe Nov 24 '14 at 10:11
  • http://stackoverflow.com/questions/8586346/python-regex-for-integer – the_marcelo_r Nov 24 '14 at 10:11
  • 1
    I have no idea what you are talking about? Are you talking about integers and decimals? – RvdK Nov 24 '14 at 10:13
  • you can either make a new regex based off from what you know that validates a regex or test that regex by throwing some dummy data and if-else to catch result... – Craftein Nov 24 '14 at 10:14
  • @jonrsharpe yes right, just not necessary to use one regex to achieve that if it's possible. – HayatoY Nov 24 '14 at 12:03
  • @Craftein at first I tried your solution, but if you choose 100 for dummy data, and given expression is '2[0-9]', the test won't pass.. even '2[0-9]' must match number. – HayatoY Nov 24 '14 at 12:03

1 Answers1

3

If I understand the question correctly, you want to check - given some regular expression r - if r might match anything which is not a number. In that sense, the empty regex $^ (matching nothing) would pass the test, since it matches nothing (and hence no non-number). 1[0-9][0-9a-z], however, matches 10s, which is not a number, hence the test fails.

That is not possible with Python's regex facilities. You would need a regex language which supports intersection (&), complement (~), and test for non-emptiness (e.g., through generation of a matching word). Then, if r is your regex, you would need to check if

`r & ~(0|[1-9][0-9]*)`

is non-empty.

Intersection and complement are computationally expensive, but there are some regex libraries that support them. An example (Java) which I know of is the BRICS automaton/regex library.

This could be realized as follows (assuming you're adhering to the BRICS regex syntax):

// Checks if `re` might match a non-number, and returns an example; otherwise, null is returned
public String matchesNonNumber(String re) {
  // construct regex like above
  RegExp bricsRe = new RegExp("(" + re + ") & ~(0|[1-9][0-9]*)", RegExp.INTERSECTION | RegExp.COMPLEMENT);
  Automaton a = bricsRe.toAutomaton();
  return a.getShortestExample(true); // returns shortest accepted string, or null if no string is accepted
}

See the JavaDoc of the RegExp and Automaton classes. This is not an example matching the python tag of the question, but the problem you want to solve also is not inherently language-specific.

misberner
  • 3,293
  • 16
  • 16
  • It's not often that you find yourself hoping that the question was up to the quality of the answer (and not the other way around). Thanks! – xbug Nov 24 '14 at 10:38
  • Thank you, That's what I wanted to know :) – HayatoY Nov 24 '14 at 11:50