Unless the HTML file is extraordinarily simple (a list of links) you should probably avoid parsing it yourself as has been mentioned. In this answer I'll suggest that you can "cheat" and install something from CPAN to help :-)
e.g. you could use Mojolicious
- specifically the mojo
tool that is included with that module:
mojo get https://www.svenskaspel.se a attr href
which in "long form" is something like:
perl -Mojo -E ' my $ua = Mojo::UserAgent->new;
say $ua->get("https://www.svenskaspel.se")
->res->dom->find("a[href]")->map(attr => 'href')->join("\n");'
The longer one-liner outputs:
/
/
/spela
/mina-spel
/bomben
#
/stryktipset/tipssm
/triss
/grasroten
/spelkoll
/kundservice
/om-cookies
which includes blank lines because some of the href
attributes have no content (href=""
).
You can control the selector using the matching syntax from Mojo::DOM
SELECTORS. That way, similar to DOM CSS selectors, something like: ...->dom->find("a[href^=/]")
would look for values of href
attributes that begin with "/"
.