Can someone explain what this following regex means

Question

/(.*?)((http:\/\/|https:\/\/)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,6}(\/[a-zA-Z0-9\-\.]+)*){1}(.*?)/g

I could only make some assumptions about the above regex. But most of it is cryptic to me.

(http:\/\/|https:\/\/) - It contains either http or https protocol.

[a-zA-Z]{2,6} - Contain any of the lower or uppercase characters between 2 and 6 times.

/g - Search for it recursively

But was not able to put all of the blocks together.

Check it out on [regex 101](https://regex101.com/r/Hgte75/1) — Thomas Smyth, Jan 12 '18 at 00:08

score 2 · Accepted Answer · answered Jan 12 '18 at 00:13

This looks like it's trying to match full URLs.

(http:\/\/|https:\/\/)?, as you mentioned, looks for an optional protocol prefix
(.*?) at the beginning and end match anything that may be before or after the URLs.
[a-zA-Z0-9\-\.]+ is likely attempting to match domain names and sub-domains (e.g. test.us.domain)
\.[a-zA-Z]{2,6} is matches top-level domains (e.g. .com, .us, .ninja)
(\/[a-zA-Z0-9\-\.]+)* is looking for paths (e.g. /about, /files/my-file001.txt)
{1} just one

This regex has it's faults for this purpose, for example some of the segments that allow . characters (e.g. [a-zA-Z0-9\-\.]+) would allow for them multiple times in a row (i.e. a...c...d) but generally speaking this should match on URLs provided the data around them doesn't look too much like URLs.

Thank you for the reply. I think your explanation and spending a bit more time on the regex makes sense now. — Sushanth --, Jan 12 '18 at 18:48

Can someone explain what this following regex means

1 Answers1