Regex to exclude URL's

Question

I am trying to create a JavaScript program to replace certain patterns of text with links. However due to some of the patterns existing within a URL on the page it blocks the URL links.

I am specifically looking to exclude the pattern if it is contained within a URL so for example here is my current Regex code.

$els.replaceText(/(\bX00[A-Z0-9]{7}\b)/gi, '<span class="context context_ident">$1<\/span>');

Some Example Text:

item :X00132BhJk

www.domain.com/X00132BhJk

www.domainsearch.com/search?ident=X00132BhJk

X00132BhJk

X00132BhJk

The Italic References should be selected and replaced however the references contained within the domain should not. The issue I have been having is when the reference.

Initially I tried \sX00[A-Z0-9]{7}\s but when the reference appears on the far left of the page (First word in the sentence) it doesn't get selected. Equally so it does not select if a full stop follows or a colon precedes.

Is there a way to specifically exclude URL's by excluding / ? and = from being the immediate preceding character but select in all other cases?

The problem is that `X00132BhJk1 is a perfectly valid URL within an Intranet, for example (specifying a host by that name within the firewall). It's extremely hard to write a regexp for validating URLs. The best you can do is find some invalid cases, such as URLs which include invalid characters, or are malformed in obvious ways. — , Sep 06 '14 at 13:37

score 1 · Accepted Answer · edited May 23 '17 at 10:26

1

Capture (^ start | OR [^/?=] in a negated character class the ones, that must not appear before)

/(^|[^\/?=])(\bX00[A-Z0-9]{7}\b)/gi

And replace with: '$1<span class="context context_ident">$2</span>'

Also see fiddle; SO Regex FAQ;

edited May 23 '17 at 10:26

Community

1
1

answered Sep 06 '14 at 10:18

Jonny 5

11,051
2
20
42

1

Perfect this has fixed the issue one slight edit /(^|[^/?=]) changed to /(^|[^\/?=]) as the / was escaping the regex declaration too early. I have tested this and it worked perfectly – Bobstefano Sep 09 '14 at 15:13
@Bobstefano Great, works for you :) Updated answer accordingly. – Jonny 5 Sep 09 '14 at 16:51

score 0 · Answer 2 · answered Sep 06 '14 at 10:02

0

(?!^www.*?X00[A-Z0-9]{7}.*$)^(.*?)(X00[A-Z0-9]{7})(.*)$

Try this.

Replace with.

\1<span class="context context_ident">$1<\/span>\2

See demo.

http://regex101.com/r/oC3nN4/7

added an m flag as well for multiline match as i have used anchors.

answered Sep 06 '14 at 10:02

vks

63,206
9
78
110

score 0 · Answer 3 · answered Sep 06 '14 at 10:05

0

You can try with non-capturing parentheses (?:), in your case (?:[^/?=]|^)

replace(/(?:[^/?=]|^)(\bX00[A-Z0-9]{7}\b)/gi, '<span class="context context_ident">$1<\/span>');

Example

answered Sep 06 '14 at 10:05

Volune

4,197
18
23

This looks like it will eat the `/`, `?` or `=` from the URL; because it's non-capture doesn't mean it's not part of the match being replaced – Paul S. Sep 06 '14 at 12:51
I first thought the same, but the fiddle shows the opposite. – Volune Sep 06 '14 at 12:54
Sorry, I got the negate the wrong way in my head; it's a match which is not one of those characters; notice how the `:` disappears; http://jsfiddle.net/jqcwmu0j/1/ – Paul S. Sep 06 '14 at 13:01
True, better use Jonny 5's answer – Volune Sep 06 '14 at 13:31

Avinash Raj · Answer 4 · 2014-09-06T10:27:21.000

0

You don't need to escape the frontslash in the closing span tag on the replacement part.

Regex:

^((?:(?![\/?]).)*)(X00[A-Z0-9a-z]{7})(.*)$

Replacement string:

$1<span class="context context_ident">$2</span>$3

DEMO

edited Sep 06 '14 at 10:27

answered Sep 06 '14 at 10:06

Avinash Raj

160,498
22
182
229

Regex to exclude URL's

4 Answers4