56

I've seen Ruby and Perl programmers do some complicated code challenges entirely with regexes. The lookahead and lookbehind capabilities in Perl regexes make them more powerful than the regex implementations in most other languages. I was wondering how powerful they really are.

Is there an easy way to either prove or disprove that Perl regexes are Turing complete?

Community
  • 1
  • 1
Peter Olson
  • 121,487
  • 47
  • 188
  • 235
  • 1
    See also http://cstheory.stackexchange.com/questions/1047/where-do-most-regex-implementations-fall-on-the-complexity-scale – kennytm Nov 02 '11 at 15:48
  • To those voting to close as off-topic: are [these](http://stackoverflow.com/questions/3136686/is-the-c99-preprocessor-turing-complete) [questions](http://stackoverflow.com/questions/3480950/are-makefiles-turing-complete) [also](http://stackoverflow.com/questions/2497146/is-css-turing-complete) off-topic? – Peter Olson Nov 02 '11 at 16:10
  • @PeterOlson: Yes, but they were asked before there was programmers.se – Daenyth Nov 02 '11 at 20:22
  • 1
    @daxim - What are the rules here? You can easily write a pattern for Rule 110 (for example, start with `'010'`, and pattern `s/(?:|(?<=0)(0)(?=0)|(?<=0)0(?=(1))|...|(?<=1)1(?=1)(?=1*(0))|^(?=(0))|(?<=(0))$)/$1/g` - it needs some more thinking, I guess), but I think you need to use it in a loop to be of any use. Is that legitimate? Maybe you have a template of the program you're after? – Kobi Nov 09 '11 at 21:19
  • @Kobi If it's turing complete, it's legit. I don't really understand rule 110, so I don't know if that works or not. – Peter Olson Nov 09 '11 at 21:31
  • Kobi, add it as an elaborate answer; comments are not eligible for bounties. – daxim Nov 10 '11 at 09:24

3 Answers3

29

Excluding any kind of embedded code, such as ?{ }, they probably don't cover all of context-free, much less Turing Machines. They might, but to my knowledge, nobody has actually proven it one way or another. Given that people have been trying to solve certain context-free problems with Perl regexes for a while and haven't come up with a solution yet, it's likely that they are not context-free.

There is an interesting discussion to be had about what features are merely convenient, and which actually add power. For instance, matching 0n*1*0n (that's notation for "any number of zeros, followed by a one, followed by the same number of zeros as before") is not something that can be done with pure regexes. You can prove this can't be done with regexes using the Pumping Lemma, but the simple, informal proof is that the regex would have to count an arbitrary number of zeros, and regexes can't do counting.

However, backreferences can match that with:

/(0*) 1 \1/x;

So that means backreferences give you more power, and are not a mere convenience. What else might give us more power, I wonder?

Also, Perl6 "patterns" (they're not even pretending they're regexes anymore) are designed to look kinda like Perl5 regexes (so you don't need to relearn much), but they have enough features added to be fully context-free. They're actually designed so you can use them to alter the way the language is parsed within a lexical scope.

frezik
  • 2,266
  • 12
  • 13
  • 3
    Apparently regexes are at least more powerful than context free grammars https://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html – resgh Jan 19 '16 at 05:38
18

There are at least two discussions: Turing completeness and regular expressions and Are Perl patterns universal? with further references.

The consensus (to my untrained eye) seems to be that the answer is "no", but I am not sure if I understand everything correctly.

Sinan Ünür
  • 113,391
  • 15
  • 187
  • 326
6

For regexes in Perl there are two cases:

  1. With embedded code: They are of course Turing-complete.
  2. Without embedded code: They always halt so they are not general Turing machines.

Every regular language can be accepted by a finite automaton. Its input must be a finite string.

[...] a deterministic finite automaton (DFA)—also known as deterministic finite state machine—is a finite state machine that accepts/rejects finite strings of symbols [...].

The same goes for Turing machines: The formal definition does not even have input. It must be encoded in the finite number of states.

Alternative (equivalent) definitions include input, but it must be finite.

Jeffrey Bosboom
  • 12,141
  • 16
  • 70
  • 85
usr
  • 162,013
  • 33
  • 219
  • 345
  • 2
    Will they always halt on arbitrary input, though? They won't always halt on an infinitely long string, for example. – Miles Rout Jun 12 '14 at 15:11
  • 1
    @MilesRout I don't think infinite input is allowed or makes sense. Almost nothing halts with infinite input. Even `(ab)*` does not halt because there can always be a `c` in the future. The text book answer is that regular languages are clearly not Turing complete. I think infinite input is just not part of the definition. – usr Jun 12 '14 at 15:24
  • 2
    `(ab)*` halts on infinite input like `abcabcabcabcabcabc...` – Miles Rout Jun 13 '14 at 03:45
  • Yeah, bad example. `a*b` does not halt for input `aaaaa...`. Anyway, I don't think infinite inputs are allowed or make sense. That would invalidate many important and clearly true results. – usr Jun 13 '14 at 10:58
  • Infinite inputs are definitely allowed. Turing machines have a theoretically infinite tape, otherwise they are not capable of computing everything. – William Shipley Aug 02 '14 at 06:01
  • 2
    @WilliamShipley infinite tape does not mean infinite input. Note again, that *any* machine of the Chomsky hierarchy does not halt if you feed infinite input to it. Therefore, it must be impossible to provide infinite input to make the existing definitions work. – usr Aug 02 '14 at 09:35
  • Do they always halt if recursive? I think not. – Tuntable Aug 08 '16 at 02:41
  • You realize `(?R)` does not halt (and co.). (error is thrown but point still stands....) – Downgoat Feb 09 '17 at 01:48
  • I did not realize that. This answer is wrong, then. @Downgoat – usr Feb 09 '17 at 15:01
  • 1
    Perl regexes are not regular expressions in the CS meaning of the term, they can parse languages that aren't even context free. – saolof Sep 19 '18 at 04:20