1

I'm trying to match a char within a subset of chars, where either side of the matching char could be anything.

heres an example:

{{ SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS }}

against the above, I want to match anything between {{ and }} that has a dash "-" in it.

my regex pattern thus far is:

(?<={{)(.*?-.*?)(?=}})

but this is creating a match for the whole test string returning:

SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS

Is anyone able to see what I'm missing? I understand why my regex doesn't work as expected but not how to fix it.

Thanks

Nazim Kerimbekov
  • 3,965
  • 6
  • 23
  • 48
AlexW
  • 1,989
  • 7
  • 46
  • 110

3 Answers3

3

You may use this regex with a negative lookahead and a capture group:

({{(?:(?!{{|}})[^-])*)-(.*?}})

RegEx Demo

RegEx Details:

  • (: Start capture group
    • {{: Match {{
    • (?:: Start non-capture group
      • (?{{|!}}): Negative lookahead to assert that we don't have {{ and }} at next position
      • [^-]: Match any character except hyphen
    • )*: End non-capture group. * matches 0+ instances of this group
  • ): End capture group
  • -: Match literal hyphen
  • (.*?}}): Match remaining string up to }} and then match }} and capture this in 2nd capture group
anubhava
  • 664,788
  • 59
  • 469
  • 547
3

Use

import re
s = '{{ SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS }}'
print([x.strip() for x in re.findall(r'{{(.*?)}}', s) if '-' in x])
// -> ['BGP-AS']

See the Python demo

Details

  • Extract all matches between {{...}} with a mere {{(.*?)}} regex (note that re.findall will only return the captured substing, the value matched with (.*?))
  • Only keep the matches with - in them using a condition inside list comprehension (if '-' in x)
  • Remove trailing/leading whitespace with .strip()

A single regex approach (note it might turn out less efficient):

re.findall(r'{{\s*((?:(?!{{|}})[^-])*-.*?)\s*}}', s)

See the Python demo

Details

  • {{ - {{
  • \s* - 0+ whitespaces
  • ((?:(?!{{|}})[^-])*-.*?) - Capturing group 1 (what will be returned by re.findall):
    • (?:(?!{{|}})[^-])* - a tempered greedy token matching any non-hyphen char, 0+ times, that does not start a {{ and }} substrings
    • - - a hyphen
    • .*? - any 0+ chars (other than an LF), as few as possible
  • \s* - 0+ whitespaces
  • }} - }}.

See the regex demo

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
0

You can use this pattern: {{(.*?)}}.

  • .*? matches any stream of character non-greedily.

  • (...) creates a capturing group so re.findall yields the inside of the brackets.

To check if the match contains a '-', it might be simpler to then use in.

Code

import re

def tokenize(s):
    return [w.strip() for w in re.findall('{{(.*?)}}', s) if '-' in w]

print(tokenize('{{ SITE_AGGREGATE_SUBNET }}.3 remote-as {{ BGP-AS }}'))

Output

['BGP-AS']
Olivier Melançon
  • 19,112
  • 3
  • 34
  • 61