How to iterate every string possible from a given pattern?

Question

/^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}$/

This is the a regular expression that validates the hostname of a url, taken from Domain name validation with RegEx.

How can we iterate or create and loop through, every possible combination? After each iteration of correct match a test will be done on the result.

Please show your work so far, and explain where you are stuck on this problem. You may also want to consider the scale of it - the number of possible valid URLs is very large, and any attempt to loop through them will not get very far into the set of all possibilities, or touch a meaningful fraction of real hostnames. — Neil Slater, Aug 12 '14 at 22:41
[HOP](http://hop.perl.plover.com/book/pdf/06InfiniteStreams.pdf#18) covers this topic quite well. Just keep in mind you're looking at an absolutely enormous number of possibilities--that second character class on its own will match something in the area of 6.3x10^19 strings. — Slade, Aug 12 '14 at 23:00
@Slade The second character class will actually match `2.2e+109` possible strings. Add the fact that the last character class is unbounded, and this is definitely an odd goal. — Miller, Aug 12 '14 at 23:20
Re "This is the a regular expression that validates the hostname of a url", No it doesn't. It doesn't match `www.stackoverflow.com`, for example. — ikegami, Aug 13 '14 at 02:04
@Stevie G, And it fails to match the hostname `www.stackoverflow.com`. — ikegami, Aug 13 '14 at 13:27
@ikegami, don't want to get into a whole thing here but there is clearly some confusion regarding what that part of a url should be called and I've googled and it is not clear. If I can point to: http://rhwwebsites.com/wp-content/uploads/2012/10/domains_map_english.jpg, it would be the `name` part which as I explained is the `hostname` — stephen, Aug 13 '14 at 14:32
Sounds like you want the company's top-level domain, but it fails at that too. e.g. The BBC's is `bbc.co.uk`. — ikegami, Aug 13 '14 at 14:40
@ikegami incorrect, I just want the `bbc` and it succeeds at that. I would prefer not to harp on this minor confusion you are having. — stephen, Aug 13 '14 at 16:08

ikegami · Answer 1 · 2014-08-13T03:17:56.340

1

This will find 10¹¹⁶ possibilities. That should be keep you going for a while, considering the planet only came into existence 10¹⁷ seconds ago (almost yesterday!)

use strict;
use warnings;
use feature qw( say );

use Algorithm::Loops qw( NestedLoops );

my @char_set1 = ('a'..'z', 'A'..'Z', '0'..'9' );
my @char_set2 = ('a'..'z', 'A'..'Z', '0'..'9', '-');
my @char_set3 = (undef, 'a'..'z', 'A'..'Z', '0'..'9', '-');
my @char_set4 = ('a'..'z', 'A'..'Z');
my @char_set5 = (undef, 'a'..'z', 'A'..'Z');

my $iter = NestedLoops([
   (\@char_set4) x 2,
    ['.'],
   (\@char_set3) x 60,
    \@char_set2,
    \@char_set1,
]);

while (my @chars = $iter->()) {
   say join '', reverse grep defined, @chars;
}

This is not a general approach, just one that works well in this situation.

edited Aug 13 '14 at 03:17

answered Aug 13 '14 at 02:14

ikegami

322,729
15
228
466

Also, there are only 10^80 atoms in the observable galaxy. – ikegami Aug 13 '14 at 02:34
I reversed the order of the sets then reversed the output to produce a more natural order. It would go faster if you didn't this, but you seem to have lots of time. – ikegami Aug 13 '14 at 02:42

How to iterate every string possible from a given pattern?

1 Answers1