1

I'm trying to match a host-name from a url with regex and groups. I wrote this test in order to simulate the acceptable inputs.

why does this code fails?

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTest {

    public static void main(String[] args)
    {
        Pattern HostnamePattern = Pattern.compile("^https?://([^/]+)/?", Pattern.CASE_INSENSITIVE);

        String[] inputs = new String[]{

                "http://stackoverflow.com",
                "http://stackoverflow.com/",
                "http://stackoverflow.com/path",
                "http://stackoverflow.com/path/path2",
                "http://stackoverflow.com/path/path2/",
                "http://stackoverflow.com/path/path2/?qs1=1",

                "https://stackoverflow.com/path",
                "https://stackoverflow.com/path/path2",
                "https://stackoverflow.com/path/path2/",
                "https://stackoverflow.com/path/path2/?qs1=1",
        };

        for(String input : inputs)
        {
            Matcher matcher = HostnamePattern.matcher(input);
            if(!matcher.matches() || !"stackoverflow.com".equals(matcher.group(1)))
            {
                throw new Error(input+" fails!");
            }
        }

    }

}
  • IMHO `^https?` matches `^http` and `^https`. I mean that `?` applies to the `s` only. – Arnaud Denoyelle Apr 28 '14 at 14:19
  • @EladYosifon: Welcome to Stack Overflow! Please consider bookmarking our [Regular Expressions FAQ](http://stackoverflow.com/a/22944075/2736496) for future reference. You may find these two answers interesting: [matching urls](http://stackoverflow.com/a/190405/2736496), [matching host/port combinations](http://stackoverflow.com/a/22697740/578411). Also be sure to check out "The differences between functions in `java.util.regex.Matcher`" (under "Flavor-Specific Information > Java"), and the list of online testers at the bottom, where you can try things out yourself. – aliteralmind Apr 28 '14 at 14:23

2 Answers2

3

It is because your regex ^https?://([^/]+)/? and your call to Matcher#matches method which expects to match the input completely.

You need to use:

matcher.find()

Otherwise your regex will only match first 2 input strings: http://stackoverflow.com and http://stackoverflow.com/

anubhava
  • 664,788
  • 59
  • 469
  • 547
  • 1
    Note that `Matcher#group(..)` won't even work without `Matcher#find()`. It'll throw exceptions. – Sotirios Delimanolis Apr 28 '14 at 14:22
  • Yes that is true `Matcher#group(..)` call must come after `Matcher#matches` OR `Matcher#find` – anubhava Apr 28 '14 at 14:23
  • I was matching before looking for the group: `if(!matcher.matches() || !"stackoverflow.com".equals(matcher.group(1)))` . the answer is that `Matcher#matches()` does not respect the caret while `Matcher#find()` does.. thx! – Elad Yosifon Apr 28 '14 at 14:40
1

Take a look at "http://stackoverflow.com/path". How should your pattern match? It doesn't recognize the part path.

nils
  • 1,212
  • 1
  • 8
  • 15
  • 1
    the caret(^) at the beginning of the regex represents that it should try to find the match the input from the beginning and not the whole input. – Elad Yosifon Apr 28 '14 at 14:35