0

I have the following regex to extract the youtube video id

var regExp = /^.*(youtu.be\/|v\/|u\/\w\/|embed\/|watch\?v=)([^#\&\?]*).*/;

For example this would match

http://www.youtube.com/watch?v=9bZkp7q19f0&desc=gangnam

However sometimes the video parameter does not come first and as a result the below does not match.

http://www.youtube.com/watch?desc=gangnam&v=9bZkp7q19f0

How would I include an or clause in the regex to account for the v parameter being preceeded by either an & or ??

I tried the following but it did not work

var regExp = /^.*(youtu.be\/|v\/|u\/\w\/|embed\/|watch\[?$]v=)([^#\&\?]*).*/;
user784637
  • 13,012
  • 31
  • 83
  • 144

2 Answers2

5

Basically, the video id is preceded by v=, and followed by either the end of the string, or &. So the regex you're looking for is simply this:

var expr = /(?:v=)([^&]+)/;
console.log('http://www.youtube.com/watch?v=9bZkp7q19f0&desc=gangnam'.match(expr)[1]);
//logs "9bZkp7q19f0"
console.log('http://www.youtube.com/watch?desc=gangnam&v=9bZkp7q19f0'.match(expr)[1]);
//logs "9bZkp7q19f0"

You could (but it's not really required) make sure to only match the pattern above in the URI (the bit that follows a ? in the full string:

var expr = /\?.*(?:v=)([^&]+)/;

The pattern you've tried is riddled with faults, and to begin fixing that is just too much work, I'd just forget about it. For example:

/youtu.be/

Matches a literal youtu followed by one instance of any char (that isn't a new line) (.), followed by a literal be. Thus it matches youtu2be, youtu#be, youtu.be, even youtu be

In response to your comment:

expr = /(youtu\.be\/|[?&]v=)([^&]+)/;
console.log('http://www.youtu.be/9bZkp7q19f0'.match(expr)[2]);
//logs "9bZkp7q19f0"
console.log('http://www.youtube.com/watch?desc=gangnam&v=9bZkp7q19f0'.match(expr)[2]);
//logs "9bZkp7q19f0"
console.log('http://youtu.be/9bZkp7q19f0'.match(/(youtu\.be\/|v=)([^&]+)/)[2]);
//logs "9bZkp7q19f0"
console.log(' youtube.com/watch?argv=xyz&v=u8nQa1cJyX8'.match(/(youtu\.be\/|[?&]v=)([^&]+)/)[2]);
//logs "u8nQa1cJyX8"

That's all. No need to check for a preceding ? or &...

How does it work:

  • (youtu\.be\/|[?&]v=): matches either literal youtu.be/ or either ?v= or &v=
  • ([^&]+): matches (and groups) everything that follows previous match, except for &

That means that youtu.be/<thiswillmatch>&<this will not match> and youtube.com/foo/bar/watch?some=params&v=<this will match>&<this won't>. It doesn't matter if the v= bit is directly after the ? or after an ampersand, all this regex is interested in is finding that v=, and matching everythign that follows up until the first & that follows. If it can't find the v=, but youtu.be/ is found, the regex will capture everything after the forward slash (ie the vid id)

Elias Van Ootegem
  • 67,812
  • 9
  • 101
  • 138
  • This doesn't work for a url like this `http://youtu.be/9bZkp7q19f0`, can you please just suggest how I change the existing part of the regex `?v=` to account for the cases when `v` is prepended with an ampersand `&`? – user784637 Aug 20 '13 at 14:07
  • I asked a very specific question on what the correct syntax is to change the portion of the regex `?v=` to account for ampersands. I'm sure it's about 2 characters more to make that change. Do you know what the correct syntax is? – user784637 Aug 20 '13 at 14:12
  • 1
    @user784637: Yes, but I'm saying you're doing _too much work_. I've added an alternative, way shorter regex that does the same thing... why not use that? But anyhow, intead of `watch\?` try `watch\?.*(?:v=)([^&]+)`, that deals with leading amps and question marks alike – Elias Van Ootegem Aug 20 '13 at 14:14
  • Yours doesn't work for this `m.youtube.com/watch?feature=related&v=DMua4iJ_VzM&rl=yes&client=mv-google&gl=MX&guid=&hl=en` – user784637 Aug 20 '13 at 14:17
  • 1
    @user784637: Sure it does: `console.log('m.youtube.com/watch?feature=related&v=DMua4iJ_VzM&rl=yes&client=mv-google&gl=MX‌​&guid=&hl=en'.match(/(youtu\.be\/|v=)([^&]+)/)[2]);` logs `DMua4iJ_VzM` in console for me – Elias Van Ootegem Aug 20 '13 at 14:19
  • My bad, I had a client side check denying it. I'm going to push this later tonight and let you know how it goes – user784637 Aug 20 '13 at 14:21
  • 1
    @user784637: What check where denied what? Who? Where? When? (stumped) – Elias Van Ootegem Aug 20 '13 at 14:27
  • What if the url is http://www.youtube.com/watch?argv=xyz&v=u8nQa1cJyX8? How do you differentiate between an argument whose name ends with v and an argument whose name is v? From: http://stackoverflow.com/a/3452617/784637 – user784637 Aug 20 '13 at 14:41
  • 1
    @user784637: I've updated my answer, the only change I had to make was to `|v=)`, which is now `|[?&]v=)`, the full regex is `/(youtu\.be\/|[?&]v=)([^&]+)/` – Elias Van Ootegem Aug 20 '13 at 14:52
  • Why did this syntax fail when I applied it to the original regex? I did the same `v=[?$]` – user784637 Aug 20 '13 at 15:00
  • 1
    @user784637: You didn't do the same, you escaped the `[`: copy-pasted from your question: `\[?$]v=`, that makes the opening `[` a literal, not a char-class delimiter – Elias Van Ootegem Aug 20 '13 at 15:06
  • there is a little problem with urls like this: https://www.youtube.com/watch?v=ABCDE123456#t=298. Anyway the workaround is considering the ID length = 11, so the regex is: /(youtu\.be\/|v=)([^&]{11})/ – Randomize Dec 28 '13 at 12:03
  • Thank you for the simple regex :) I would add '?' to the end, like: /(youtu\.be\/|[?&]v=)([^?&]+)/ so urls like https://youtu.be/4i1fbyc0lpw?t=10s (with the time param) will return only the video id. – Lucas D'Avila Jul 18 '15 at 19:20
  • @LucasD'Avila: Check the third possible regex I've listed: `expr = /(youtu\.be\/|[?&]v=)([^&]+)/;` – Elias Van Ootegem Jul 20 '15 at 06:48
0

You can incude a positive lookahead to ensure there is a v= ahead of the watch:

^.*(youtu.be\/|v\/|u\/\w\/|embed\/|watch\?(v=|.*(?=v=)))([^#\&\?]*).*

Edit: Also further looking into your regex you have formatted it wrong and thus will match on embed/. You need to group your statements with brackets when using or statements or it wont include the previous parts of the expression and just match on them alone. You also need to escape special characters like a '.' as this is treated as any character

I have cleaned it up a little bit:

/^.*youtu(\.)?be(\.com)?(\/|v\/|u\/\w\/)(embed\/|watch\?(v=|.*(?=v=)))([^#\&\?]+)/
Elias Van Ootegem
  • 67,812
  • 9
  • 101
  • 138
Martyn
  • 756
  • 1
  • 7
  • 20