Hello there !
I will start by giving a short, simplified definition of λ-expressions.
A λ-expression can be either :
- A variable (here, let's say it's a lower-case letter
[a-z]
) (or any simple operation with variables (likea*b
or(a+b)*c
)) - A function (or abstraction). It has the following syntax :
(λx.e)
where x is a dummy variable (can be any lower-case letter) and e is a λ-expression (eventually containingx
s). It can be read as :function λ : x -> e(x)
- A function application. It has the following syntax :
(f e)
(note that I want the space and both parenthesis) where f and e are both λ-expressions. It can be read as :f(e)
. The reduction operation basically means evaluatingf(e)
Here is a link if you want to know more about Lambda Calculus
Now, I am trying to find a regex that does a reduction operation on a function application. In other words, in the abstraction, I want to replace each dummy variable (except the one preceding the .
) by the following expression and give the resulting expression.
Here are some examples :
(For typing purpose, let's replace λ
by \
in the string)
string
=> result after one reduction
((\x.x) a)
=> a
((\x.x) (\y.y))
=> (\y.y)
(((\x.(\y.x+y)) a) b)
=> ((\y.a+y) b)
((\y.a+y) b)
=> a+b
((\x.x) (f g))
=> (f g)
((\x.x) ((\y.y) a))
=> ((\x.x) a)
OR ((\y.y) a)
(depends on what you think is easier to do. My guess would be the first one)
It can be done with multiple substitutions, but I would prefer no more than 2.
The language I am using is Powershell, so the regex must support .NET flavour (it does mean that recursion is not allowed...)
I am pretty sure there is something to do with balancing groups, but I can't find a working regex...
Also, there certainly are better solutions than using regex, but I want to do that with regex, no code here.
I will add more examples when I think of good ones.
Edit 1 :
All I managed to do so far is matching the expression and capture each sub-expression with the following regex :
(?:[^()]|(?'o'\()|(?'c-o'\)))*(?(o)(?!))
Demo here
Edit 2 :
I have made some progress here, with this regex :
(?>\((?'c')\\(\w)\.|[^()]+|\)(?'-c'))+(?(c)(?!))(?=\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!))))
Demo here
Now what I need to do is match only the second y instead of the current match.
Edit 3 :
I feel like nobody is capable of helping me here... Maybe I am asking something too hard :(
However, I almost have what I need. Here is what I came up with :
(?<=\(\\(\w)\.(?>\((?'c')|\)(?'-c')|(?>(?!\(|\)|\1).)*)*)\1(?=(?>\((?'c')|\)(?'-c')|(?>(?!\(|\)|\1).)*)*(?(c)(?!))\)\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!))))
Demo here
As you can see, I can match the variable to be replaced only where it appears a single time. When there are multiple occurrences of it, only the last one is matched (seems obvious, seeing the regex. I don't understand why it's the last and not the first that is matched though...)
Edit 4 :
Ok I'm almost done ! I just have a problem for the third line, the regex is not matching it correctly, and I can't possibly understand why. I will post the answer of this question as soon as I have figured this non-matched string out.
Here is the regex (although it's unreadable now, I'll post a commented version later)
(?:(?<=\(\\(\w)\.(?>\((?'c')|\)(?'-c')|[^()\n])*)\1(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s((?>\((?'c')|[^()]+|\)(?'-c'))*(?(c)(?!)))))|(?:\(\(\\\w\.(?=(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\)))|(?:(?<=\(\(\\\w\.(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!)))\)\s(?>\((?'c')|\)(?'-c')|[^()\n])*(?(c)(?!))\))
Demo here
Finale edit : Nevermind I found the problem, it was just a bad reading of lookbehind, just read answer below