-2
/^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}$/

This is the a regular expression that validates the hostname of a url, taken from Domain name validation with RegEx.

How can we iterate or create and loop through, every possible combination? After each iteration of correct match a test will be done on the result.

Community
  • 1
  • 1
stephen
  • 7,825
  • 14
  • 77
  • 134
  • 1
    Please show your work so far, and explain where you are stuck on this problem. You may also want to consider the scale of it - the number of possible valid URLs is very large, and any attempt to loop through them will not get very far into the set of all possibilities, or touch a meaningful fraction of real hostnames. – Neil Slater Aug 12 '14 at 22:41
  • 1
    [HOP](http://hop.perl.plover.com/book/pdf/06InfiniteStreams.pdf#18) covers this topic quite well. Just keep in mind you're looking at an absolutely enormous number of possibilities--that second character class on its own will match something in the area of 6.3x10^19 strings. – Slade Aug 12 '14 at 23:00
  • @Slade The second character class will actually match `2.2e+109` possible strings. Add the fact that the last character class is unbounded, and this is definitely an odd goal. – Miller Aug 12 '14 at 23:20
  • What's a few dozen orders of magnitude between friends? – friedo Aug 12 '14 at 23:45
  • 1
    Re "This is the a regular expression that validates the hostname of a url", No it doesn't. It doesn't match `www.stackoverflow.com`, for example. – ikegami Aug 13 '14 at 02:04
  • @ikegami, looking for the `hostname` part only – stephen Aug 13 '14 at 07:44
  • @Stevie G, And it fails to match the hostname `www.stackoverflow.com`. – ikegami Aug 13 '14 at 13:27
  • @ikegami, don't want to get into a whole thing here but there is clearly some confusion regarding what that part of a url should be called and I've googled and it is not clear. If I can point to: http://rhwwebsites.com/wp-content/uploads/2012/10/domains_map_english.jpg, it would be the `name` part which as I explained is the `hostname` – stephen Aug 13 '14 at 14:32
  • Sounds like you want the company's top-level domain, but it fails at that too. e.g. The BBC's is `bbc.co.uk`. – ikegami Aug 13 '14 at 14:40
  • @ikegami incorrect, I just want the `bbc` and it succeeds at that. I would prefer not to harp on this minor confusion you are having. – stephen Aug 13 '14 at 16:08
  • Well, it doesn't match `bbc` either. – ikegami Aug 13 '14 at 16:28

1 Answers1

1

This will find 10116 possibilities. That should be keep you going for a while, considering the planet only came into existence 1017 seconds ago (almost yesterday!)

use strict;
use warnings;
use feature qw( say );

use Algorithm::Loops qw( NestedLoops );

my @char_set1 = ('a'..'z', 'A'..'Z', '0'..'9' );
my @char_set2 = ('a'..'z', 'A'..'Z', '0'..'9', '-');
my @char_set3 = (undef, 'a'..'z', 'A'..'Z', '0'..'9', '-');
my @char_set4 = ('a'..'z', 'A'..'Z');
my @char_set5 = (undef, 'a'..'z', 'A'..'Z');

my $iter = NestedLoops([
   (\@char_set4) x 2,
    ['.'],
   (\@char_set3) x 60,
    \@char_set2,
    \@char_set1,
]);

while (my @chars = $iter->()) {
   say join '', reverse grep defined, @chars;
}

This is not a general approach, just one that works well in this situation.

ikegami
  • 322,729
  • 15
  • 228
  • 466
  • Also, there are only 10^80 atoms in the observable galaxy. – ikegami Aug 13 '14 at 02:34
  • I reversed the order of the sets then reversed the output to produce a more natural order. It would go faster if you didn't this, but you seem to have lots of time. – ikegami Aug 13 '14 at 02:42