Regex to remove only span tags but preserve content found within them

Question

I have a span tag like this:

<span id="item.2.2">3 October.--As I must do something or go mad, I write this diary.</span>

I'd like to be able to remove the open and closing span but leave the text within it. In addition the id part of the opening span does change, so it could be item.10.2 or item.100.5 so I would need to take that into account.

** edit ** Edited to add. The file(s) I'd want to replace this in also have span tags that of not include the id specifier and I do NOT want to remove them, or their closing , sorry I should have said that earlier.

Parsing HTML with regex is [not exactly recommended](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). — Biffen, Apr 04 '14 at 14:38
Understood and I was waiting for that thread to come up. If regex is not the way to go, then what is? — Sharkus, Apr 10 '14 at 18:46
An (X|HT)ML library of some kind. You haven't mentioned what language you're using, but there're probably a few available for it. — Biffen, Apr 11 '14 at 06:52
The idea was a search and replace using TextWrangler, or perhaps just diving into bash and using sed. — Sharkus, Apr 11 '14 at 10:48
If you're able to review everything yourself then you can discard my warning. Whatever method is the quickest should do. Using regex to parse HTML in runtime and/or production on the other hand is a bad idea. — Biffen, Apr 11 '14 at 10:55
[Oh yes you can](http://stackoverflow.com/a/4234491/471272). — tchrist, Jun 06 '14 at 22:43

score 14 · Accepted Answer · answered Apr 04 '14 at 13:01

14

Do a regex which replaces </?span[^>]*> with empty string

answered Apr 04 '14 at 13:01

Allan S. Hansen

3,893
17
21

Works a treat. Just removes all `span` tags and any classes or other text within the tag. – Foliovision Feb 10 '19 at 19:40
1

Perfect! Removes attributes as well, which was what I needed. I processed 124k instances of auto-added span tags. – MiB Feb 22 '20 at 17:43

Regex to remove only span tags but preserve content found within them

1 Answers1