I am writing a regular expression in python to capture the contents inside an SSI tag.
I want to parse the tag:
<!--#include file="/var/www/localhost/index.html" set="one" -->
into the following components:
- Tag Function (ex:
include
,echo
orset
) - Name of attribute, found before the
=
sign - Value of attribute, found in between the
"
's
The problem is that I am at a loss on how to grab these repeating groups, as name/value pairs may occur one or more times in a tag. I have spent hours on this.
Here is my current regex string:
^\<\!\-\-\#([a-z]+?)\s([a-z]*\=\".*\")+? \-\-\>$
It captures the include
in the first group and file="/var/www/localhost/index.html" set="one"
in the second group, but what I am after is this:
group 1: "include"
group 2: "file"
group 3: "/var/www/localhost/index.html"
group 4 (optional): "set"
group 5 (optional): "one"
(continue for every other name="value" pair)