Python Code
import re
def updateRule(rule):
tokens = rule.split('/')
return [tokens[0][len('RULE:'):] , tokens[1].replace('$','\\') ]
def getPX(inputStr,rule):
reg_match = updateRule(rule)
match = re.compile(reg_match[0])
return re.sub(match,reg_match[1],inputStr)
def main():
inputStr = "XZ=Rep.com,PX=TE-ST-,PX=Zen,PX=TAG,M=Dana,I=JAR"
rule= 'RULE:^XZ=[^,]+,(PX=.+),M=Dana,I=JAR$/$1/,DEFAULT'
print(getPX(inputStr,rule))
if __name__== "__main__":
main()
Input Strings/Outputs expected :
Case 1:
inputStr = "XZ=Rep.com,PX=TE-ST-,PX=Zen,PX=TAG,M=Dana,I=JAR"
Desired output = "PX=TE-ST-,PX=Zen,PX=TAG"
Case 2:
inputStr = "PX=$#XN,I=JAR,M=Dana,PX=Faber,PX=Module,OU=gif,XZ=dana-fa.com,PX=GAN%"
Desired output = "PX=$#XN,PX=Faber,PX=Module,PX=GAN%"
As can be seen we only need PX= followed by corresponding values in the final output.
Case 1 is giving the desired output and works fine, case 2 is giving other values other than PX=.
I don't want to use findall() method but would rather want to change the regex rule in the code to address this issue so that we only see PX= in the final output.
How can we modify the below rule in the code to address this?
rule= 'RULE:^XZ=[^,]+,(PX=.+),M=Dana,I=JAR$/$1/,DEFAULT'
After lot of research with ( grouping, non-grouping captures etc)
This is the new regex rule I have created
"[A-Za-z_]+,((?:PX=[A-Za-z$-_ !]+,)+(?:PX=[A-Za-z$-_ !]+,)*).+"
Case 1 works fine with the following output ( with a comma appended in output)
PX=TEST,PX=Zen,PX=TAG,
Got it working with special characters as well but Case 2 is failing ( because it cannot take PX in any random order , where PX can be in beginning, middle or end ). So PX irrespective of order and comma in the end are the two things to fix in regex rule, suggestions ?