4

I'm trying to build a regex that validates a math equation. The equation itself is very simple, I'm looking to make an English readable equation, that I will later return as true or false. An example would be like so.

((1 and 2) or 3)

In this example, I will swap out any numbers with either true, or false. I will also replace, "and" with "&&" and "or" with "||" in order to run the equation using PHP. The response to this will ultimately be either true, or false.

An example final equation would look something like this:

((true && true) || true)

Here are some more examples of what should be considered valid.

(1 or 2 or 3)

((1 and 2 and 3) or (4 and 5))

So, my question comes in two parts.

  1. Is it possible to create a regex expression to validate all possible valid equations? One big road block for me is understanding how I could validate that all opening "(" also have the closing ")".
  2. Is it advisable to use a regex expression in order to validate client side in this circumstance? I am already able to validate the expression using AJAX and PHP, so am I just overthinking this?
Community
  • 1
  • 1
mathem8tic
  • 41
  • 4
  • 1
    That's possible with a [recursive pattern](http://stackoverflow.com/questions/20569306/how-to-write-a-recursive-regex-that-matches-nested-parentheses). Which is not an option in JavaScript though. By itself that won't help much with evaluating the expressions. Or is this *really* only about numbers and boolean operators? – mario May 28 '15 at 18:40
  • Balanced text matching is possible with php. Replacing the contents requires you to know what constitutes true/false. More likely just a straight replacement of the or/and into symbols is all you need. –  May 28 '15 at 18:52
  • @mario In our application the numbers represent a variable which will return true|false in PHP. Our objective is to allow the client to make edits to the conditional pattern and validate the pattern client-side with angular before the numbers become boolean. – mathem8tic May 28 '15 at 19:06
  • @mario to clarify we don't want to evaluate the expression with regex. We want to make sure the conditional pattern is valid before evaluating it in PHP. – mathem8tic May 28 '15 at 19:10
  • 2
    JavaScript really only has plain old regular expressions. PHP has Regexps. So you should just make an AJAX call to preverify server-side if they're valid. There's no need to substitute `and` and `or` btw, because those are allowed alternatives in PHP expressions as well. – mario May 28 '15 at 19:14
  • Its almost doable with a regex. The only hitch are the boundary conditions between `digits, and, or` –  May 28 '15 at 19:20
  • I've added a validation regex. Once valid, you can just do normal text substitution with `true/fals/&&/||` –  May 28 '15 at 21:26

2 Answers2

2

Using pumping lemma it can be easily proven that the strings you want to validate belong to a language that is not regular so it cannot be parsed using regular expressions. (Actually in the way of proving this The fact that you cannot match opening and closing parenthesis or even count them is used - as you mentioned in first part of your question) Although some regex engines may provide some additional functionalities that can be used for parsing this (like recursive patterns) but it's not 100% in accordance with the formal definition of regular expressions.

You may consider parsing the parenthesis yourself and validating the expression inside them using simple regular expressions, or you can use a parse tree, similar to what compilers do.

Mohammad Jafar Mashhadi
  • 3,534
  • 3
  • 26
  • 48
1

This can be done in php (uses PCRE engine).
Below is just an example.
You could comment out the errors check, then insert boundary constructs
around the regex to make it definitively pass/fail.

The biggest problem is not the recursion, but defining the content boundary
conditions. I've pretty much boiled it down for you. These checks have to
be maintained any how you do it, state, stacks ..., its all the same.

( This regex was constructed and tested using RegexFormat 6 )

Sample input:

 (((   (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and   7) )

Tested output:

 **  Grp 0 -  ( pos 0 , len 64 ) 
(((   (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and   7) )  
 **  Grp 1 -  ( pos 1 , len 62 ) 
((   (1 and 2 and 3) or (9) or ( ( 4 and 5)) and 5 ) and   7)   
 **  Grp 2 -  NULL 
 **  Grp 3 -  NULL 
 **  Grp 4 -  NULL 

Regex:
5/29 All Forms:

Empty form ( ) not allowed
Empty form ) ( not allowed
Form ) and ( ok
Form ) and 2 and ( ok
Form ( 1 and 2 ) ok
Form ( 1 ) ok
Form ) and 2 ) ok
Form ( 1 and ( ok
Form ( whitespace ( or ) whitespace ) ok

 # (?s)(?:\(((?!\s*\))(?&core))\)|\s*([()]))(?(DEFINE)(?<core>(?>(?&content)|\((?:(?!\s*\))(?&core))\)(?!\s*\())+)(?<content>(?>(?<=\))\s*(?:and|or)\s*(?=\()|(?<=\))\s*(?:(?:and|or)\s+\d+)+\s*(?:and|or)\s*(?=\()|(?<=\()\s*\d+(?:(?:\s+(?:and|or)\s+)?\d+)*\s*(?=\))|(?<=\))\s*(?:(?:and|or)\s+\d+)+\s*(?=\))|(?<=\()\s*(?:\d+\s+(?:and|or))+\s*(?=\()|\s+)))


 # //////////////////////////////////////////////////////
 # // The General Guide to 3-Part Recursive Parsing
 # // ----------------------------------------------
 # // Part 1. CONTENT
 # // Part 2. CORE
 # // Part 3. ERRORS

 (?s)                       # Dot-All modifier (used in a previous incarnation)

 (?:
      #           (                          # (1), Take off CONTENT (not used here)
      #                (?&content) 
      #           )
      #        |                           # OR

      \(                         # Open Paren's
      (                          # (1), parens CORE
           (?! \s* \) )               # Empty form '( )' not allowed
           (?&core) 
      )
      \)                         # Close Paren's
   |                           # OR
      \s* 
      (                          # (2), Unbalanced (delimeter) ERRORS
                                      # - Generally, on a whole parse, these
                                      #   are delimiter or content errors
           [()]                       
      )
 )

 # ///////////////////////
 # // Subroutines
 # // ---------------

 (?(DEFINE)

      # core
      (?<core>                   # (3)
           (?>
                (?&content) 
             |  
                \(                         # Open Paren's
                (?:
                     (?! \s* \) )               # Empty form '( )' not allowed
                     (?&core) 
                )
                \)                         # Close Paren's
                (?! \s* \( )               # Empty form ') (' not allowed
           )+
      )

      # content 
      (?<content>                # (4)
           (?>
                (?<= \) )                  # Form ') and ('
                \s* 
                (?: and | or )
                \s* 
                (?= \( )

             |  
                (?<= \) )                  # Form ') and 2 and ('
                \s* 
                (?:
                     (?: and | or )
                     \s+ 
                     \d+ 
                )+
                \s* 
                (?: and | or )
                \s* 
                (?= \( )

             |  
                (?<= \( )                  # Form '( 1 and 2 )'
                \s* 
                \d+ 
                (?:
                     (?:
                          \s+ 
                          (?: and | or )
                          \s+ 
                     )?
                     \d+ 
                )*
                \s* 
                (?= \) )

             |  
                (?<= \) )                  # Form ') and 2 )'
                \s* 
                (?:
                     (?: and | or )
                     \s+ 
                     \d+ 
                )+
                \s* 
                (?= \) )

             |  
                (?<= \( )                  # Form '( 1 and ('
                \s* 
                (?:

                     \d+ 
                     \s+ 
                     (?: and | or )
                )+
                \s* 
                (?= \( )

             |  
                \s+                        # Interstitial whitespace
                                           # '( here (' or ') here )'
           )
      )

 )
  • There is a bug, it matches `((( (1 and 2 and 3) or (9) ( ( 4 and 5)) and 5 ) and 7) )` too where there is no operator between `) (` in `(9) ( ( 4` – revo May 29 '15 at 09:41
  • @revo - Fixed: Empty form ') (' not allowed. Any more? –  May 29 '15 at 14:43
  • Nice job but it won't be accurate. I'm not a mass input tester but why this `((( (1 and 2 and 3) or 9 or ( ( 4 and 5)) and 5 ) and 7) )` couldn't be matched? *`9` doesn't have surrounding parenthesis.* – revo May 29 '15 at 17:12
  • @revo - Added Form `) and 2 and (` and listed all the forms in the post. Any more ? –  May 29 '15 at 19:44
  • I think that covers most of the forms. I am being verbose in the forms. Most of which contain redundant overlapps. When doing this kind of regex, in the beginning this is by design. When all the parts are identified, the next step is to refactor it. But that could be a little tricky. –  May 29 '15 at 19:55