2

I have the following regexp:

/^(?:(?:>|<)?[a-zA-Z]+(?:(?:\+|-)\d*\.?\d*(?:em)?)?)(?:<[a-zA-Z]+(?:(?:\+|-)\d*\.?\d*(?:em)?)?)?$/

Which you can think about like this:

^
  (?:
    (?:>|<)?
    [a-zA-Z]+
    (?:(?:\+|-)\d*\.?\d*(?:em)?)?
  )
  (?:
    <
    [a-zA-Z]+
    (?:(?:\+|-)\d*\.?\d*(?:em)?)?
  )?
$

It is effectively the same pattern repeated once or twice with a small difference. The core of each pattern is one or more letter [a-zA-Z] followed by an optional minus or plus and a numeric value possibly followed by em. The first instance can start with either < or > and the second instance can only start with <.

So the following are all valid:

  `alpha`,
  `alphaBravo`,
  `alphaBravoCharlie`,
  `>alpha`,
  `<alpha`,
  `>alpha+10`,
  `<alpha+10`,
  `>alpha+1.5`,
  `<alpha+1.5`,
  `>alpha-10`,
  `>alpha-10`,
  `>alpha-1.5`,
  `>alpha-1.5`,
  `>alpha+10em`,
  `<alpha+10em`,
  `>alpha+1.5em`,
  `<alpha+1.5em`,
  `>alpha-1.5em`,
  `>alpha-1.5em`,
  `alpha-50em<delta-100em`,
  `alpha-50em<delta+100em`,
  `>alpha-50em<delta+100em`,

My problem is that if the first instance starts with a < then the second instance shouldn't be allowed, so the following should be invalid:

<alpha<bravo

Is it possible to add this restriction to the regexp?

The two approaches I can think of are:

  1. Check the first character and make the second instance invalid if it is <
  2. Check if < has already ocurred in the string (or if < occurs again in the string) and if so, make the second instance invalid.

However I'm not sure how to implement either of these approaches here.

revo
  • 43,830
  • 14
  • 67
  • 109
Undistraction
  • 38,727
  • 46
  • 165
  • 296

2 Answers2

3

You could use a very early negative lookahead right after caret ^:

(?!<[^<\s]*<)

Live demo

You also don't need to use alternations to match a single character at a time i.e. (?:>|<) should be [<>] or (?:\+|-) should be [+-].

Extended mode:

^
  (?!<[^<\s]*<) # We have this extra one
  (?:
    [<>]?
    [a-zA-Z]+
    (?:[-+]\d+(?:\.\d+)?(?:em)?)?
  )
  (?:
    <
    [a-zA-Z]+
    (?:[-+]\d+(?:\.\d+)?(?:em)?)?
  )?
$

In a line:

^(?!<[^<\s]*<)(?:[<>]?[a-zA-Z]+(?:[-+]\d+(?:\.\d+)?(?:em)?)?)(?:<[a-zA-Z]+(?:[-+]\d+(?:\.\d+)?(?:em)?)?)?$
revo
  • 43,830
  • 14
  • 67
  • 109
  • 2
    The second approach does not seem to work in JS engines due to the back-reference: https://regex101.com/r/NkR12Z/3. – Jeffrey Westerkamp Jun 03 '18 at 16:42
  • Good point. I removed the second approach. It seems a non-captured capturing group in JS is initialized as a zero-length string in its back-reference. – revo Jun 03 '18 at 16:58
  • This definitely solves the original problem but now allows: `alpha+100px` which the original doesn't. – Undistraction Jun 03 '18 at 18:00
  • 3
    You can use the second approach, if you capture the first character (whatever it is) in a lookahead. https://regex101.com/r/NkR12Z/4 . This way the capture is never empty. – Casimir et Hippolyte Jun 03 '18 at 18:02
  • @Undistraction In your own regex second ` – revo Jun 03 '18 at 18:05
  • @CasimiretHippolyte Interesting. Is this behavior documented somewhere? I don't remember if I had faced with it. – revo Jun 03 '18 at 18:11
  • @revo: not really, `(?!\1)` when the first capture group is empty (or in an unused branch) gives `(?!)` that is an always failing subpattern. – Casimir et Hippolyte Jun 03 '18 at 18:16
  • @CasimiretHippolyte Right. A mistake I'm still in is thinking about an optional ` – revo Jun 03 '18 at 18:30
  • @revo For a bonus round: The above works perfectly, but if I need to add a second character that should disallow the second instance (so if either `]?[a-zA-Z]+(?:(?:[+-])\d*\.?\d*(?:em)?)?)(?: – Undistraction Jun 04 '18 at 09:29
  • 1
    @Undistraction Yes, you only need one negative lookahead `(?![ – revo Jun 04 '18 at 09:32
  • @revo Thank you. I think I finally understand negative lookaheads. – Undistraction Jun 04 '18 at 09:35
  • 1
    @Undistraction You're welcome. You may also want to have a look at [Reference - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075) for a better understanding of regular expressions. – revo Jun 04 '18 at 09:37
2

Just replace (?:(?:>|<)? with (?:(?:>|<(?!.*<))? to get desired results.

Test it here.


If you want to extend this feature from < character to > character as well, you can replace same part of the pattern (?:(?:>|<)? with (?:([<>])(?!.*\1))? and replace <? with [<>]? in the second part of your pattern.

Test it here.

Ωmega
  • 37,727
  • 29
  • 115
  • 183