1

I am trying to create a JavaScript program to replace certain patterns of text with links. However due to some of the patterns existing within a URL on the page it blocks the URL links.

I am specifically looking to exclude the pattern if it is contained within a URL so for example here is my current Regex code.

$els.replaceText(/(\bX00[A-Z0-9]{7}\b)/gi, '<span class="context context_ident">$1<\/span>');

Some Example Text:

item :X00132BhJk

www.domain.com/X00132BhJk

www.domainsearch.com/search?ident=X00132BhJk

X00132BhJk

X00132BhJk

The Italic References should be selected and replaced however the references contained within the domain should not. The issue I have been having is when the reference.

Initially I tried \sX00[A-Z0-9]{7}\s but when the reference appears on the far left of the page (First word in the sentence) it doesn't get selected. Equally so it does not select if a full stop follows or a colon precedes.

Is there a way to specifically exclude URL's by excluding / ? and = from being the immediate preceding character but select in all other cases?

Bobstefano
  • 13
  • 3
  • The problem is that `X00132BhJk1 is a perfectly valid URL within an Intranet, for example (specifying a host by that name within the firewall). It's extremely hard to write a regexp for validating URLs. The best you can do is find some invalid cases, such as URLs which include invalid characters, or are malformed in obvious ways. –  Sep 06 '14 at 13:37

4 Answers4

1

Capture (^ start | OR [^/?=] in a negated character class the ones, that must not appear before)

/(^|[^\/?=])(\bX00[A-Z0-9]{7}\b)/gi

And replace with: '$1<span class="context context_ident">$2</span>'

Also see fiddle; SO Regex FAQ;

Community
  • 1
  • 1
Jonny 5
  • 11,051
  • 2
  • 20
  • 42
  • 1
    Perfect this has fixed the issue one slight edit /(^|[^/?=]) changed to /(^|[^\/?=]) as the / was escaping the regex declaration too early. I have tested this and it worked perfectly – Bobstefano Sep 09 '14 at 15:13
  • @Bobstefano Great, works for you :) Updated answer accordingly. – Jonny 5 Sep 09 '14 at 16:51
0
(?!^www.*?X00[A-Z0-9]{7}.*$)^(.*?)(X00[A-Z0-9]{7})(.*)$

Try this.

Replace with.

\1<span class="context context_ident">$1<\/span>\2

See demo.

http://regex101.com/r/oC3nN4/7

added an m flag as well for multiline match as i have used anchors.

vks
  • 63,206
  • 9
  • 78
  • 110
0

You can try with non-capturing parentheses (?:), in your case (?:[^/?=]|^)

replace(/(?:[^/?=]|^)(\bX00[A-Z0-9]{7}\b)/gi, '<span class="context context_ident">$1<\/span>');

Example

Volune
  • 4,197
  • 18
  • 23
  • This looks like it will eat the `/`, `?` or `=` from the URL; because it's non-capture doesn't mean it's not part of the match being replaced – Paul S. Sep 06 '14 at 12:51
  • I first thought the same, but the fiddle shows the opposite. – Volune Sep 06 '14 at 12:54
  • Sorry, I got the negate the wrong way in my head; it's a match which is not one of those characters; notice how the `:` disappears; http://jsfiddle.net/jqcwmu0j/1/ – Paul S. Sep 06 '14 at 13:01
  • True, better use Jonny 5's answer – Volune Sep 06 '14 at 13:31
0

You don't need to escape the frontslash in the closing span tag on the replacement part.

Regex:

^((?:(?![\/?]).)*)(X00[A-Z0-9a-z]{7})(.*)$

Replacement string:

$1<span class="context context_ident">$2</span>$3

DEMO

Avinash Raj
  • 160,498
  • 22
  • 182
  • 229