1

Hello there !

I will start by giving a short, simplified definition of λ-expressions.
A λ-expression can be either :

  • A variable (here, let's say it's a lower-case letter [a-z]) (or any simple operation with variables (like a*b or (a+b)*c))
  • A function (or abstraction). It has the following syntax : (λx.e) where x is a dummy variable (can be any lower-case letter) and e is a λ-expression (eventually containing xs). It can be read as : function λ : x -> e(x)
  • A function application. It has the following syntax : (f e) (note that I want the space and both parenthesis) where f and e are both λ-expressions. It can be read as : f(e). The reduction operation basically means evaluating f(e)

Here is a link if you want to know more about Lambda Calculus


Now, I am trying to find a regex that does a reduction operation on a function application. In other words, in the abstraction, I want to replace each dummy variable (except the one preceding the .) by the following expression and give the resulting expression.

Here are some examples :
(For typing purpose, let's replace λ by \ in the string)
string => result after one reduction
((\x.x) a) => a
((\x.x) (\y.y)) => (\y.y)
(((\x.(\y.x+y)) a) b) => ((\y.a+y) b)
((\y.a+y) b) => a+b
((\x.x) (f g)) => (f g)
((\x.x) ((\y.y) a)) => ((\x.x) a) OR ((\y.y) a) (depends on what you think is easier to do. My guess would be the first one)


It can be done with multiple substitutions, but I would prefer no more than 2.
The language I am using is Powershell, so the regex must support .NET flavour (it does mean that recursion is not allowed...)
I am pretty sure there is something to do with balancing groups, but I can't find a working regex...
Also, there certainly are better solutions than using regex, but I want to do that with regex, no code here.
I will add more examples when I think of good ones.


Edit 1 :

All I managed to do so far is matching the expression and capture each sub-expression with the following regex :

(?:[^()]|(?'o'\()|(?'c-o'\)))*(?(o)(?!))

Demo here


Edit 2 :

I have made some progress here, with this regex :

(?>\((?'c')\\(\w)\.|[^()]+|\)(?'-c'))+(?(c)(?!))(?=\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!))))

Demo here
Now what I need to do is match only the second y instead of the current match.


Edit 3 :

I feel like nobody is capable of helping me here... Maybe I am asking something too hard :(
However, I almost have what I need. Here is what I came up with :

(?<=\(\\(\w)\.(?>\((?'c')|\)(?'-c')|(?>(?!\(|\)|\1).)*)*)\1(?=(?>\((?'c')|\)(?'-c')|(?>(?!\(|\)|\1).)*)*(?(c)(?!))\)\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!))))

Demo here
As you can see, I can match the variable to be replaced only where it appears a single time. When there are multiple occurrences of it, only the last one is matched (seems obvious, seeing the regex. I don't understand why it's the last and not the first that is matched though...)


Edit 4 :

Ok I'm almost done ! I just have a problem for the third line, the regex is not matching it correctly, and I can't possibly understand why. I will post the answer of this question as soon as I have figured this non-matched string out.
Here is the regex (although it's unreadable now, I'll post a commented version later)

(?:(?<=\(\\(\w)\.(?>\((?'c')|\)(?'-c')|[^()\n])*)\1(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!)))))|(?:\(\(\\\w\.(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))|(?:(?<=\(\(\\\w\.(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!)))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\))

Demo here

Finale edit : Nevermind I found the problem, it was just a bad reading of lookbehind, just read answer below

Gawil
  • 1,091
  • 4
  • 13
  • What exactly are you trying to get? Just [text inside balanced parentheses](https://stackoverflow.com/a/35271017/3832970)? – Wiktor Stribiżew May 23 '17 at 12:21
  • To put it in other words, for `((\x.x) a)`, I want to remove the `(\x.` and the `)`, and replace all occurences of `x` by `a` (knowing that the expression can be much more complex, and that there can be many other x outside `(\x.x)` that must not be replaced – Gawil May 23 '17 at 12:27
  • You want to write a regex to evaluate lambda terms? Are you serious? You can't even parse them using regex. – Lee May 23 '17 at 12:32
  • @Lee: I don't want to *evaluate* lambda terms, just to do a single reduction – Gawil May 23 '17 at 12:33
  • @Gawil Probably impossible, because you can't express as a replacement something like: `((\x.x+x) a) => a+x`. Using the `$` replacers you can't change the `x+x` into `a+a` – xanatos May 23 '17 at 14:20
  • @xanatos: You just have to match each `x` after the first, and replace it by the `a` (which is captured in a lookahead for example) – Gawil May 23 '17 at 14:22
  • @Gawil Lookahead don't "capture". You can't reference them in any way. – xanatos May 23 '17 at 14:22
  • 1
    @xanatos: Well... You can capture in a lookahead... `(?=(a))` does not match an `a`, but still captures it. – Gawil May 23 '17 at 14:24

1 Answers1

1

Ok I figured it out. It's a pretty long regex, so try to understand it at your own risk ;)
Here it goes :

(?x)  # Turns on free-spacing mode
      # The regex is an alternation of 3 groups
      # Each group corresponds to one part of the string
      # One for the function definition to replace the parameter, by the argument
      # in ((\x.(x+b)*c) a), it's (x+b)*c, with x matched and replaced by a
      # One for the beginning of the function definition (to replace it by nothing)
      # in ((\x.(x+b)*c) a), it's ((\x.
      # And the third one for the closing parenthesis and the argument
      # in ((\x.(x+b)*c) a), it's ) a)
(?:                # 1st non capturing group
  (?<=             # Positive lookbehind
    \(\\(\w)\.     # Look for the sequence '(\x.' where x is captured in group 1
    (?>            # Atomic group
                   # (No need to make it atomic here, it was just for reading purpose)
                   # Here come the balancing groups. You can see them as counters
      \((?(c)(?'-c')|(?'o')) |  # Look for a '(' then decrease 'c' counter or increase 'o' if 'c' is already 0
      \)(?(o)(?'-o')|(?'c')) |  # Look for a ')' then decrease 'o' counter or increase 'c' if 'o' is already 0
      [^()\n]      # Look for a character that is not a new line nor a parenthesis
                   # Note that preventing \n is just for text with multiple λ-expressions, one per line
    )*             # Repeat
  )  # End of lookbehind
  \1             # Match the parameter
                 # Note that if it is a constant function, it will not be matched.
                 # However the reduction will still be done thanks to other groups
  (?=            # Positive lookahead
    (?>          # Atomic group. It's the same as the previous one
    \((?(c)(?'-c')|(?'o')) |  # All atomic groups here actually mean 'look for a legal λ-expression'
    \)(?(o)(?'-o')|(?'c')) |
    [^()\n]
  )*
  # this is where balancing groups really come into play
  # We are now going to check if number of '(' equals number of ')'
  (?(o)(?!))     # Fail if 'o' is not 0 (meaning there are more '(' than ')'
  (?(c)(?!))     # Fail if 'c' is not 0 (meaning there are more ')' than '('
  \)\s           # Look for a ')' and a space
    (            # Capturing group 2. Here come the argument
      (?>\((?'c')|\)(?'-c')|[^()\n])+(?(c)(?!))  # Again, look for a legal λ-expression
    )            # End of capturing group
  \) # Look for a ')'
  )  # End of lookahead
) |  # End of 1st non-capturing group
(?:  # 2nd non-capturing group
  \(\(\\\w\.     # Match '((\x.'
  (?=            # Positive lookahead
    (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Look for a legal λ-expression
    \)\s         # Followed by ')' and a space
    (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Followed by a legal λ-expression
    \)           # Followed by a ')'
  )  # End of lookahead
) |  # End of 2nd non-capturing group
(?:  # 3rd non-capturing group
  (?<=           # Positive lookbehind
    \(\(\\\w\.   # Look for '((\x.'
    (?>\((?'-c')|\)(?'c')|[^()\n])*
           # Here is what caused issues for my 4th edit.
           # I am not sure why, but the engine seems to read it from right to left
           # So I had, like before :
           # (?'c') for '(' (increment)
           # (?'-c') for ')' (decrement)
           # But from right to left, we encounter ')' first, so "decrement" first
           # By "decrement", I mean pop the stack, which is still empty
           # So parenthesis were not balanced anymore
           # That is why (?'c') and (?'-c') were swapped here
    (?(c)(?!))   # Check parenthesis count
  )  # End of lookbehind
  \)\s           # Match a ')' and a space
  (?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))  # Then a legal λ-expression
  \)             # And finally the last ')' of the function application
)  # End of 3rd non-capturing group

So here is the compact regex :

(?:(?<=\(\\(\w)\.(?>\((?(c)(?'-c')|(?'o'))|\)(?(o)(?'-o')|(?'c'))|[^()\n])*)\1(?=(?>\((?(c)(?'-c')|(?'o'))|\)(?(o)(?'-o')|(?'c'))|[^()\n])*(?(o)(?!))(?(c)(?!))\)\s((?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!)))\)))|(?:(?<!\(\(\\\w\..*)\(\(\\\w\.(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))|(?:(?<=\(\(\\\w\.(?>\((?'-c')|\)(?'c')|[^()\n])*(?(c)(?!)))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)(?!.*\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))

The compact regex is not exactly the same as the detailed one. I just added two negative look-arounds to ensure only one reduction is done for each line. Multiple reductions could be a problem in large expressions, as in certain cases they can overlap...

You need to replace matches by $2. 2nd Captured group is set only in the first case of alternation, so it will be either empty or the argument of the function application
Demo here

There are still many things that need improvement, or correction, so I may update it as I am working on it.

Edit :

Ok I found the problem. This should be the last edit.
I don't think I can handle the function definition with a single counter (as I called it in code comments), because stacks can't have a negative size, so the counter can't be negative. I have to use 2 stacks, one for (, one for ), and then test if they have the same size. Check the code if you want to know more.

BE CAREFUL : This regex should work for most λ-expressions but does not test if variables are free or not. I did not find any λ-expression is not handled by this regex, though it does not mean there are none. And I won't try proving this regex work for every λ-expressions ;)

Gawil
  • 1,091
  • 4
  • 13
  • I didn't think it was possible :-) – xanatos May 24 '17 at 10:39
  • @xanatos: I was almost sure it was. Though my main problem at the beginning was matching dummy variables inside any number of nested parenthesis. Once done, everything came by itself, little by little :) – Gawil May 24 '17 at 11:57
  • @xanatos: However the very first set of balancing groups still bother me, I'm pretty sure I can find a matched string that should not be matched... I'll dig into it ! – Gawil May 24 '17 at 12:02
  • Found it... strings like `(\x.(a+b)*x)` are not matched correctly... It's because of the well formed `(...)` before the second x. I'll update my answer as possible. – Gawil May 24 '17 at 12:07