2

I have weird list of items and lists like this with | as a delimiters and {{ }} as a parenthesis. It looks like this:

| item1 | item2 | item3 | {{Ulist1 | item4 | item5 | {{Ulist2 | item6 | item7 }} | item8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item14

I want to match items in lists called Ulist* (items 4-8) using RegEx and replace them with Uitem*. The result should look like this:

| item1 | item2 | item3 | {{Ulist1 | Uitem4 | Uitem5 | {{Ulist2 | Uitem6 | Uitem7 }} | Uitem8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item14

Update:

I tried a solution according to this question, but the answer from that question doesn't work, if there is a list inside of an Ulist. It is in Python 2.7, specifically my code is:

#!/usr/bin/python
# -*- coding: utf-8  -*-
import regex
def repl(m):
    return "".join([x.replace("item", "Uitem") if x.startswith("{{Ulist") else x for x in regex.split(r'\{{2}(?=(\blist\d*))[^\}]*(?:}(?!})[^\}]*)*}}', m.group(0))])
text = "| item1 | item2 | item3 | {{Ulist1 | item4 | item5 | {{Ulist2 | item6 | item7 }} | item8 | {{list4 | item15 | item16 }} | item17 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item14"
rex = r'(\{\{(?=(Ulist\d*))(?>[^}{]|}(?!})|\{(?!\{)|(?1))*}})'
text = regex.sub(rex, repl, text)
print(text)
Community
  • 1
  • 1
aleskva
  • 1,023
  • 1
  • 13
  • 30
  • 2
    You might find it easier to use something other than regex... it doesn't typically do well with arbitrarily-nested structures. – glibdud Jan 21 '16 at 13:29
  • I think so too, but I don't know how. Somebody have already answered this question with some links, how to do it programatically without regex, but he deleted his answer :/ – aleskva Jan 21 '16 at 13:32
  • 1
    @AlexKupil As I can see this deleted answer, [here's the link](http://stackoverflow.com/questions/546433/regular-expression-to-match-outer-brackets) you're talking about, if it can help. – zessx Jan 21 '16 at 13:34
  • I still need a little bit of regex for replacing item to Uitem (because this is just simplified case of what I really want), but the main part of my question (to find Ulists and process items inside) is still openned – aleskva Jan 21 '16 at 13:36
  • That's that link, thank you – aleskva Jan 21 '16 at 13:38
  • But still I will be glad if somebody helps me with that. I'll try to make a draw of a code, but I think it will not be working. – aleskva Jan 21 '16 at 13:45
  • There we go [again](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454). These sort of problems are not suited for regex. Regex is only suited for regular grammars (hence the name). NOT for context sensitive grammars like the one you have. – RickyA Jan 21 '16 at 14:04
  • With [Python regex package](https://pypi.python.org/pypi/regex) you could probably [match nested](https://regex101.com/r/dS6wD0/1): `{{Ulist\d+(?>[^}{]+(?0)?)*}}` and [use a callback](http://stackoverflow.com/a/2095012/5527985) on `group(0)` with sub [like this](https://regex101.com/r/wN7qT4/1): `\w[^}{|]+(?<=\w)` – bobble bubble Jan 21 '16 at 14:10
  • Have you tried new answer? – SIslam Jan 21 '16 at 21:01
  • Finally I tried @glibdud answer and it worked without any problem. Your regex still can't handle list inside Ulist – aleskva Jan 21 '16 at 21:35

2 Answers2

2

Maybe this can get you started:

def parse(data):
    items = [i.strip() for i in data.split('|')]
    newitems = []
    nest = [False]
    for item in items:
        if item.startswith('{{'):
            if item.startswith('{{Ulist'):
                nest.append(True)
            else:
                nest.append(False)
            newitems.append(item)
        else:
            if item.startswith('item') and nest[-1]:
                newitems.append('U' + item)
            else:
                newitems.append(item)
        if item.endswith('}}'):
            nest.pop()
    return ' | '.join(newitems)

Basically it splits the data on the delimiters (|) and does a single loop over them, converting where appropriate and keeping state in a stack called nest to determine when it should be converting. It assumes that whitespace surrounding delimiters isn't significant.

glibdud
  • 7,131
  • 2
  • 23
  • 34
0

In fact regex is wrong tool to match parentheses/braces/brackets, see here.

Anyway you can use regex in this case as below-

import regex

s='| item1 | item2 | item3 | {{Ulist1 | item4 | item5 | {{Ulist2 | item6 | item7 }} | item8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | item4 | item5 | {{Ulist2 | item6 | item7 }} | item8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | item4 | item5 | {{Ulist2 | item6 | item7 }} | item8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | item4 | item5 | {{Ulist2 | item6 | item7 }} | item8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | item4 | item5 | {{Ulist2 | item6 | item7 }} | item8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | item4 | item5 | {{Ulist2 | item6 | item7 }} | item8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | item4 | item5 | {{Ulist2 | item6 | item7 }} | item8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | item4 | item5 | {{Ulist2 | item6 | item7 }} | item8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1'

def itm_changer(txt):
  return ''.join(txt.captures()).replace('item','Uitem')

def changer(data):
  return regex.sub(r'\{\{Ulist\d{0,}(?>[^\}\{]+|(?0))+\}}',lambda x: itm_changer(x),data)

print changer(s)

N.B. You can modify the as you want.

Output-

| item1 | item2 | item3 | {{Ulist1 | Uitem4 | Uitem5 | {{Ulist2 | Uitem6 | Uitem7 }} | Uitem8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | Uitem4 | Uitem5 | {{Ulist2 | Uitem6 | Uitem7 }} | Uitem8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | Uitem4 | Uitem5 | {{Ulist2 | Uitem6 | Uitem7 }} | Uitem8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | Uitem4 | Uitem5 | {{Ulist2 | Uitem6 | Uitem7 }} | Uitem8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | Uitem4 | Uitem5 | {{Ulist2 | Uitem6 | Uitem7 }} | Uitem8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | Uitem4 | Uitem5 | {{Ulist2 | Uitem6 | Uitem7 }} | Uitem8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | Uitem4 | Uitem5 | {{Ulist2 | Uitem6 | Uitem7 }} | Uitem8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1| item1 | item2 | item3 | {{Ulist1 | Uitem4 | Uitem5 | {{Ulist2 | Uitem6 | Uitem7 }} | Uitem8 }} | item9 | {{list3 | item10 | item11 | item12 }} | item13 | item1

To see details of the regex used here SEE LIVE DEMO

Community
  • 1
  • 1
SIslam
  • 4,986
  • 1
  • 21
  • 31