9

As the name suggests we may think that regular expressions can match regular languages only. But regular expressions we use in practice contain stuff that I am not sure it's possible to implement with their theoretical counterparts. How for example would you simulate a back-reference? So the question arises: what is the theoretical power of the regular expressions we use in practice? Can you think of a way to match {(a^n)(b^n)|n>=0}? What about {(a^n)(b^n)(c^n)|n>=0}?

ire_and_curses
  • 64,177
  • 22
  • 110
  • 139
Artium
  • 4,474
  • 7
  • 33
  • 53
  • You need to further define "regular expressions we use in practice". Different regex engines vary in power. – sepp2k Sep 23 '10 at 13:53
  • There are expressions that match your examples for the .NET Regex engine using balanced capturing groups. `"^(a(?))*(b(?))*(?(a)(?!))$"` for the first one, e.g. – Jens Sep 23 '10 at 14:03
  • 3
    This seems more like a question for computational theorists than for a programming site. – Ben Doom Sep 23 '10 at 15:20
  • Terrence Parr or Jeffrey Ullman would be the right person to ask. You can search for forums as well. – A_Var Sep 23 '10 at 17:11
  • http://stackoverflow.com/questions/3644266/how-can-we-match-an-bn-with-java-regex – Kobi Sep 28 '10 at 20:06
  • @Ben, I would disagree... if we know in theory how to classify the regex language we're using, we can know some very useful (practical) things about the capabilities and limits of that language that we would not have known without theory. Granted not all of theory is 100% applicable. But try telling me that it's not practically useful for a programmer to know whether an algorithm is O(2^n) or O(log n)? – LarsH Sep 28 '10 at 20:31
  • I'm not arguing that knowing theory is not good. I'm just saying that StackOverflow is, in my opinion, more for practical application and that theoretical questions belong more on programmers.stackexchange.com. – Ben Doom Sep 28 '10 at 21:04

2 Answers2

6

The answer to your question is, "regular expression" languages that allow back-references are neither regular nor context-free. (In other words, as you pointed out, you cannot simulate back-reference with a regular language, nor with a CFL.) In fact, Wikipedia says many of the "regular expression" languages we use in practice are NP-Complete:

Pattern matching with an unbounded number of back references, as supported by numerous modern tools, is NP-complete (see,[11] Theorem 6.2).

As others have suggested, the regular expression languages commonly supported in computer languages and libraries are a different animal from regular expressions in formal language theory. Larry Wall wrote in regard to Perl "regexes,"

'Regular expressions' [...] are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I'm not going to try to fight linguistic necessity here. I will, however, generally call them "regexes"

You asked,

Can you think of a way to match {(a^n)(b^n)|n>=0}? What about {(a^n)(b^n)(c^n)|n>=0}?

I'm not sure here if you're trying to test whether theoretical regular expression languages can match the "language of squares", or whether you're looking for an implementation in a (practical) regex language. Here's the proof why the former is not possible; and here's a long explanation and implementation of the latter for java regexes.

Community
  • 1
  • 1
LarsH
  • 25,732
  • 8
  • 77
  • 136
4

The basic difficulty with regular expressions that you are hinting at is the fact that regular expressions don't have a "memory" to them. In the purest form, no real regular expression should be able to recognize either of these languages. Any regular expression that could parse these sorts of languages would be, by definition, not regular. I think what you mean by "regular expressions we use is practice" is extended regular expressions, which are not technically regular expressions.

The problem with your question is that you are asking to apply a specifically contrived theoretical scenario to a practical situation, which almost always ends in disaster.

So my answer is sort of a non-answer, in that I am saying you would have to rephrase the question to ask about extended regular expressions for it to have an answer.

A couple of resources that might help in this matter:

Helpful wikipedia article

Similar StackOverflow question

Good book with a chapter on this topic

I'm also making my answer a community wiki for anyone else who wants to contribute to this line of thought.

Community
  • 1
  • 1
Daisy Sophia Hollman
  • 5,148
  • 6
  • 21
  • 32