3

I assume Σ = {a, b}. I wanna find out the RE that means Σ* (Σ* means the set of all possible strings over the alphabet Σ)

I come up with below tow REs(regular expressions)

(a+b)*
(a*b*)*

However, I can't decide by myself which RE is correct or both are bad. So, please tell me the correct answer.

Rurou2
  • 117
  • 1
  • 6
  • The latter is correct. The former requires at least one `a` to precede every `b`, so the string `b` isn't matched for example. – kaya3 Feb 07 '20 at 04:13
  • If this is in the context of computer science - formal languages and automata - ignore paxdiablo's answer and use Welbog's instead. – Patrick87 Feb 07 '20 at 19:05

2 Answers2

4

In normal regular expression grammar, (a+b)* means zero or more of any sequence that start with a, then have zero or more a, then a b. This discounts things like baa (it doesn't start with a), abba, and a (there must be one exactly b after each a group), so is not correct.

(a*b*)* means zero or more of any sequence that contain zero or more a followed by zero or more b. This is more correct since it allows for either starting character, any order and quantity of characters, and so on. It also allows the empty string which I'm pretty certain should be allowed by Σ* (but I'll leave that up to you).

However, it may be better to opt for the much simpler [ab]* (or [ab]+ in the unlikely event you consider an empty string invalid). This is basically zero (one for the + variant) or more of any character drawn from the class [ab].


However, it's possible, since you're using Σ, that you may be discussing formal language theory (where Σ is common) rather than regex grammar (where it tends not to be).

If that is the case then you should understand that there are variants of the formal language where the a | b expression (effectively [ab] in regex grammar) can instead be rendered as one of a ∪ b, a ∨ b or a + b, with each of those operator symbols representing "logical or".

That would mean that (a+b)* is actually correct (as it is equivalent to the regex grammar I gave above) for what you need since it basically means any character from the set {a, b}, repeated zero or more times.

Additionally, that's also covered by your (a*b*)* option but it's almost always better to choose the simplest one that does the job :-)

paxdiablo
  • 772,407
  • 210
  • 1,477
  • 1,841
3

The + operator is typically used to indicate union (|, "or") in academic regular expressions, not "one or more" as it typically means in non-academic settings (such as most regex implementations).

So, a+b means [ab] or a|b, thus (a+b)* means any string of length 0 or more, containing any number of as and bs in any order.

Likewise, (a*b*)* also means any string of length 0 or more, containing any number of as and bs in any order.

The two expressions are different ways of expressing the same language.

paxdiablo
  • 772,407
  • 210
  • 1,477
  • 1,841
Welbog
  • 55,647
  • 8
  • 105
  • 119
  • The descriptions "car" and "auto-kinetic mobility device" (automobile, and the Greek αυτοκίνητο (pr. 'aftokinito'), are less extreme versions) can also refer to the same thing but I think I prefer the former. Not dissing your answer, just suggesting it may be better to prefer `(a+b)*` over the other one :-) – paxdiablo Feb 08 '20 at 03:04