2

I am using Haskell and I don't seem to find a REGEX package that supports Named Groups so I have to implement it somehow myself. basically a user of my api would use some regex with named groups to get back captured groups in a map so

/(?P<name>[a-z]*)/hhhh/(?P<surname>[a-z]*)/jjj on /foo/hhhh/bar/jjj 

would give

[("name","foo"),("surname","bar")]

I am doing a specification trivial implementation with relatively small strings so for now performance is not a main issue.

To solve this, I thought I'd write a meta regex that will apply on the user's regex

/(?P<name>[a-z]*)/hhhh/(?P<surname>[a-z]*)/jjj 

to extract the names of groups and replace them with nothing to get

0 -> name
1 -> surname

and the regex becomes

/([a-z]*)/hhhh/([a-z]*)/jjj 

then apply it to the string and use the index to group names with matched.

Two questions:

  1. does it seem like a good idea?
  2. what is the meta regex that I need to capture and replace the named groups syntax

for those unfamiliar with named groups http://www.regular-expressions.info/named.html

note: all what I need from named groups is that the user give names to matches, so a subset of named groups that only gives me this is ok.

Tim Pietzcker
  • 297,146
  • 54
  • 452
  • 522
Sadache
  • 651
  • 5
  • 10

2 Answers2

4

The more generally you want to apply your solution, the more complex your problem becomes. For instance, in your approach, you want to remove the named groups and use the indexes (indices?) to match. This seems like a good start, but you have consider a few things:

  1. If you replace the (?<name>blah) with (blah) then you also have to replace the /name with /1 or /2 or whatever.
  2. What happens if the user includes non named groups as well? for eg: ([a-z]{3})/(?P<name>[a-z]*)/hhhh/(?P<surname>[a-z]*)/jjj on /foo/hhhh/bar/jjj. In this case, your numbering will not work b/c group 1 is the user defined non named group.

See this post for some insipration, as it seems other have successfully tried the same (albeit in Java)

Regex Named Groups in Java

Community
  • 1
  • 1
Java Drinker
  • 2,995
  • 1
  • 17
  • 19
  • yes. Actually I was planning to raise an error if the number of group names != number of captured groups – Sadache Jun 18 '10 at 14:56
2

Perhaps you should use parser combinators. This looks sufficiently complicated that it would be cleaner and more maintainable to step out and use Parsec or Attoparsec, instead of trying to push regexes further towards parsing.

Don Stewart
  • 134,643
  • 35
  • 355
  • 461
  • Hi Don, Actually the named group Regex is used in my example as a language the user will use to specify urls patterns. So Parsec doesn't help except if you suggest that I implement the Regex Named groups spec using parsec or writing my own DSL instead of using Named Groups – Sadache Jun 18 '10 at 15:21