To start with, I cannot do anything but refer to what I believe is the most famous SO post ever:
RegEx match open tags except XHTML self-contained tags
Now, is it even a question for StackOverflow? I don't know, but I'll try...
I'll speak from a personal point of view. While I've never had to do that, I know that the day I have to parse HTML, I will certainly not go with regexes; I'll try and find an HTML parsing library. Fine.
But I don't know why.
At one point, I decided to do CSS validation in Java. I knew "by the guts" that regexes wouldn't cut it, so I used Parboiled.
And I don't know why.
The "why" troubles me. I am no newbie with regexes at all. I just can't put a clear line between what regex engines can, and cannot do.
My question is the following: what is this clear line? What fundamental characteristic of an input must exist so that it is mathematically demonstrated that any regex engine cannot reliably determine success and failure?
Can you give a simple, theoretical input which would spell failure as to a regex engine's ability to give a reliable "match/no match" answer? If yes, what is the defining characteristic of such an input?
EDIT For the sake of this discussion, I'll add a task suggested by a post on SO (which I can't find the link to at the moment, sorry) which is simpler than HTML, but for which I won't use regexes: shell command line parsing.
As far as the shell is concerned, those are equivalent:
alias ll="ls -l"
alias ll=ls\ -l
alias l"l"=ls' -'l
"alia"s l"l= "ls\ -l
Shell quoting mechanisms are so numerous that I'll just create a Parboiled grammar in this case... But this is "out of my guts". Because I find it easier probably... But that doesn't prove that this is not feasible with regexes.