I'm having trouble with a Regex statement that I want to use in R to extract full matches of a pattern from a data frame.
I have 11 sentence patterns and I want to be able to select only records matching these patterns from my data frame as full matches using one Regex (I've been able to get this to work with multiple Regex, but it's a real hassle). Any help would be please appreciated as to what I can do to simply this.
These are my sentences:
- A change to headings 0101 through 0106 from any other chapter.
- A change to subheadings 0712.20 through 0712.39 from any other chapter.
- A change to heading 0903 from any other chapter.
- A change to subheading 1806.20 from any other heading.
- A change to subheading 1207.99 from any other chapter.
- A change to heading 4302 from any other heading.
- A change to subheading 4105.10 from heading 4102 or any other chapter.
- A change to subheading 4105.30 from heading 4102, subheading 4105.10 or any other chapter.
- A change to subheading 4106.21 from subheading 4103.10 or any other chapter.
- A change to subheading 4106.22 from subheadings 4103.10 or 4106.21 or any other chapter.
- A change to tariff item 7304.41.30 from subheading 7304.49 or any other chapter.
This is the Regex I have now, which selects full matches and partial matches (where I'm stuck) - so I end up getting records I don't want from my data frame in addition to these sentences (I know this is messy, just an example).
^A change to (?:headings|heading|subheadings|subheading|tariff item) (?:\d+\S\d+\S\d+|\d+\S\d+) (?:through \d+\S\d+ from any other chapter.|from any other chapter.|from any other heading.|)|from heading \d+\S\d+ or any other chapter.|from (?:heading|subheading|subheadings) \d+\S\d+|, subheading \d+\S\d+ or any other chapter| or any other chapter.| or \d+\S\d+
This is the how far I can get with the Regex as full matches on all 11 sentences. I'm having a problem continuing to group cleany after this:
^A change to (?:tariff item|headings|heading|subheading|subheadings) (?:\d+\S\d+|\d+\S\d+\S\d+|\d+\S\d+) (?:from|through)