0

I am working on processing HTML that has unique subheadings.

In their current state the subheadings are formatted as such:

<p>Example Text.</p>

What makes these subheadings unique from the rest of the paragraph tags is the period(.) that proceeds the ending tag. I would like to convert the code above to make it look like what's shown below.

<subheading>Example Text</subheading>

Take note the period that proceeded the ending paragraph tag was also removed.

Is this something that is do-able using Regex and can you please provide me an example?

Many thanks!

cmill02s
  • 81
  • 2
  • 11
  • Try a DOM Parser not regular expressions. HTML is not a regular expression, and unintended consequences will occur. – Sam Apr 10 '14 at 16:34
  • 2
    But for sake of exercise, this should work (and the second match group contains your heading): [`(.*?)[.]\1>`](http://regex101.com/r/fO6aX1) – Sam Apr 10 '14 at 16:37
  • From the [Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/2736496): *[Do not use regular expressions to parse HTML](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not)...[it's for your own safety](http://stackoverflow.com/a/1732454/2736496)* (listed under "Common Validation Tasks") – aliteralmind Apr 10 '14 at 16:42

1 Answers1

1

The response provided by Sam was sufficient enough to get me on the right track.

Thank you!

cmill02s
  • 81
  • 2
  • 11