0

I am trying to understand the following regular expression which gets the domain name out of a URL

$host = "www.php.net"

// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";

how is $matches[0] coming as php.net ?

I am stuck with the patter [^.]. ^ means complimentary and . means any char. so what does [^.] mean? complimentary of any char? Please help

Kiran
  • 866
  • 1
  • 6
  • 22

3 Answers3

2

You understood it wrong. When dot (.) is INSIDE a bracket it means a DOT, not any char. So, [^.] means every character that is not a dot.

barbarity
  • 2,222
  • 1
  • 17
  • 26
1

In a character class ([]) a dot means just that: a liteal dot ..

So [^.] means any character except a dot/period.

knittl
  • 197,664
  • 43
  • 269
  • 318
1

It can be tricky if you're new to it.

. normally means any non-space character. In a range ([]), however, it reverts to its literal meaning, i.e. a full stop (period, if you're American.)

^ normally means "anchor to the start of the string." In a range, however, when it's the first character in that range, it flips the logic, so rather than the range representing the allowed characters, it represents the DISallowed characters.

So the pattern '/[^.]+\.[^.]+$/' says:

  1. Match a sequence of one or more non-space characters that are not periods (the host)
  2. Match a period thereafter
  3. Match another sequence of one or more non-space characters that are not periods (the suffix)
  4. The $ anchors this to the end of the string, so steps 1-3 must be in sequence right up to the last character of the string.

Incidentally, it's not exactly a water-tight pattern for hostname matching. It doesn't take into account sub-domains or country-specific domains with more than one period (e.g. ".co.uk"), to name two points.

Mitya
  • 30,438
  • 7
  • 46
  • 79
  • Minor clarification: `^` only has special meaning inside a range when it's the _first_ character. – Barmar Apr 18 '14 at 21:37
  • Quite right. My laziness. Edited. – Mitya Apr 18 '14 at 21:40
  • Thanks. I tried to understand your answer with the following code (host is: www.php.net) preg_match('/([^.]+)\.[^.]+$/', $host, $matches); echo "domain name is: {$matches[1]}\n"; Output is php. But shouldn't it be wwww because we are saying any but not a dot ? Or probably the $ char is forcing it to parse from end? – Kiran Apr 18 '14 at 21:40
  • thanks, it is clear now. if i remove $, it is coming as www. so $ does the trick of forcing it from the end – Kiran Apr 18 '14 at 21:41