-2

I'm looking for a suitable regex for hyperlinks.

I found a link to this website: https://mathiasbynens.be/demo/url-regex, which has several choices. (There are too many to list here.)

However, these do not seem to be compatible with .NET's Regex syntax. I tried using the "cowboy" pattern vs several hyperlinks, e.g. http://www.cnn.com, but no match results.

What syntax are these, and how do I get these to work with .NET? If these require manual tweaking, then just showing how to get the "cowboy" pattern to run will suffice.

bright
  • 4,298
  • 1
  • 30
  • 52
  • Did you just copy and paste the whole pattern? It looks like the cowboy one uses tildes in the way that JavaScript or Ruby would use slashes `/`. That means you should only copy the pattern between the tildes and probably use `RegexOptions.IgnoreCase`. – kamilk Feb 13 '16 at 13:00
  • The [@cowboy regex works fine with .NET](http://regexstorm.net/tester?p=(%3fi)%5cb((%3f%3a%5ba-z%5d%5b%5cw-%5d%2b%3a(%3f%3a%2f%7b1%2c3%7d%7c%5ba-z0-9%25%5d)%7cwww%5cd%7b0%2c3%7d%5b.%5d%7c%5ba-z0-9.%5c-%5d%2b%5b.%5d%5ba-z%5d%7b2%2c4%7d%2f)(%3f%3a%5b%5e%5cs()%3c%3e%5d%2b%7c%5c((%5b%5e%5cs()%3c%3e%5d%2b%7c(%5c(%5b%5e%5cs()%3c%3e%5d%2b%5c)))*%5c))%2b(%3f%3a%5c((%5b%5e%5cs()%3c%3e%5d%2b%7c(%5c(%5b%5e%5cs()%3c%3e%5d%2b%5c)))*%5c)%7c%5b%5e%5cs%60!()%5c%5b%5c%5d%7b%7d%3b%3a%27%22.%2c%3c%3e%3f%c2%ab%c2%bb%e2%80%9c%e2%80%9d%e2%80%98%e2%80%99%5d))&i=http%3a%2f%2fwww.cnn.com). Did you use `'~` and `~iS`? – Wiktor Stribiżew Feb 13 '16 at 13:06
  • @kamilk: The `(?i)` is the case insensitive inline modifier. No need in `RegexOptions.IgnoreCase`. – Wiktor Stribiżew Feb 13 '16 at 13:08
  • I reconsidered and this is what I think: I am not against helping to convert some code from a language to language, but it is not an effort to just copy/paste some code, see it does not work, and go ask for help on SO. No, that is still an invalid question. Please describe the issue you have had, what you did to fix that, and what exactly failed. SO is not a free code writing service. – Wiktor Stribiżew Feb 13 '16 at 15:51

1 Answers1

2

You need to only copy the pattern in between the tildes.

    static void Main(string[] args)
    {
        string pattern = "(?:\\b[a-z\\d.-]+://[^<>\\s]+|\\b(?:(?:(?:[^\\s!@#$%^&*()_=+[\\]{}\\|;:'\\\",.<>/?]+)\\.)+(?:ac|ad|aero|ae|af|ag|ai|al|am|an|ao|aq|arpa|ar|asia|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|biz|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|cat|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|coop|com|co|cr|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|edu|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gov|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|info|int|in|io|iq|ir|is|it|je|jm|jobs|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mil|mk|ml|mm|mn|mobi|mo|mp|mq|mr|ms|mt|museum|mu|mv|mw|mx|my|mz|name|na|nc|net|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|org|pa|pe|pf|pg|ph|pk|pl|pm|pn|pro|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tel|tf|tg|th|tj|tk|tl|tm|tn|to|tp|travel|tr|tt|tv|tw|tz|ua|ug|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|xn--0zwm56d|xn--11b5bs3a9aj6g|xn--80akhbyknj4f|xn--9t4b11yi5a|xn--deba0ad|xn--g6w251d|xn--hgbk6aj7f53bba|xn--hlcj6aya9esc7a|xn--jxalpdlp|xn--kgbechtv|xn--zckzah|ye|yt|yu|za|zm|zw)|(?:(?:[0-9]|[1-9]\\d|1\\d{2}|2[0-4]\\d|25[0-5])\\.){3}(?:[0-9]|[1-9]\\d|1\\d{2}|2[0-4]\\d|25[0-5]))(?:[;/][^#?<>\\s]*)?(?:\\?[^#<>\\s]*)?(?:#[^<>\\s]*)?(?!\\w))";
        string url = "http://www.stackoverflow.com";
        Console.WriteLine(Regex.IsMatch(url, pattern, RegexOptions.IgnoreCase));
    }
kamilk
  • 3,043
  • 1
  • 23
  • 36
  • I already said that it works - why post answers? See [WorksForMe](http://meta.stackexchange.com/questions/118992/are-works-for-me-answers-valid). – Wiktor Stribiżew Feb 13 '16 at 13:11
  • @WiktorStribiżew If it's an answer, it should be posted as an answer, not a comment. And there's a reasonable chance that kamilk started typing up an answer before you even posted your comment. From the question: "If these require manual tweaking, then just showing how to get the "cowboy" pattern to run will suffice." –  Feb 13 '16 at 13:15
  • There is no issue in the question, and answers "it works" are not valid. – Wiktor Stribiżew Feb 13 '16 at 13:16
  • @hvd: You are wrong assuming I want to answer an unclear, invalid question. – Wiktor Stribiżew Feb 13 '16 at 13:17
  • @WiktorStribiżew It's a bad question, but this isn't an "it works" answer, this answer has applied the minimal changes needed to *make* it work. –  Feb 13 '16 at 13:19
  • @WiktorStribiżew I rephrased the answer, alright? The OP was clearly making a mistake copying the whole pattern, which I pointed out in the comments even before you did, but I figured a complete example would be more useful. Why downvote an answer that I believe should be useful to the OP? – kamilk Feb 13 '16 at 13:20
  • The cowboy regex works without changes. Just use the verbatim string literal. – Wiktor Stribiżew Feb 13 '16 at 13:20
  • @WiktorStribiżew that's great that you know it, the OP did not. – kamilk Feb 13 '16 at 13:21
  • You do not know what OP does not know, because OP showed no attempt. – Wiktor Stribiżew Feb 13 '16 at 13:22
  • @WiktorStribiżew No, it doesn't work without changes. You know that already, because you commented saying so yourself. It requires at least taking out the `~` and `~iS`, since .NET doesn't use that syntax. Why would you then start claiming here that it does work without changes? –  Feb 13 '16 at 13:24
  • `~` is the regex delimiter and not the regex pattern. *Regex pattern* works without changes and the regex delimiters do not exist in .NET regex. – Wiktor Stribiżew Feb 13 '16 at 13:28
  • @WiktorStribiżew That's different from your earlier claim, but yes, I agree with that. –  Feb 13 '16 at 13:33
  • 1
    @karmilk - appreciate the answer. You correctly deduced I was tripped by the tildes. Obviously the source pattern does *not* work as-is; not everybody knows these delimiters. Thanks for applying good common sense and deduction. – bright Feb 13 '16 at 14:13
  • The *You need to only copy the pattern in between the tildes* is not a complete explanation of the issue. The thing is that the regex declaration differs from language to language. Some languages require *regex delimiters* that enclose the *regex pattern*. .NET does not require them because it has other means to achieve what delimiters do in other languages. Next, when you declare a regex pattern in .NET in most cases it is preferable to use *verbatim string literals* in order to avoid backslash hell. That is why I consider my downvote reasonable. Please fix the answer, and I will relieve it. – Wiktor Stribiżew Feb 13 '16 at 15:54
  • This is how you [could have described **`regex delimiters`**](http://stackoverflow.com/questions/31560080/removing-all-non-word-characters-with-regex-regex-delimiters-in-c-sharp-regular/31560118#31560118). Note how I explain it (getting 1 upvote) and you (with a vague 1-liner) getting 3 upvotes. I think you must expand your answer. – Wiktor Stribiżew Feb 13 '16 at 15:56