4

I have a span tag like this:

<span id="item.2.2">3 October.--As I must do something or go mad, I write this diary.</span>

I'd like to be able to remove the open and closing span but leave the text within it. In addition the id part of the opening span does change, so it could be item.10.2 or item.100.5 so I would need to take that into account.

** edit ** Edited to add. The file(s) I'd want to replace this in also have span tags that of not include the id specifier and I do NOT want to remove them, or their closing , sorry I should have said that earlier.

Sharkus
  • 175
  • 1
  • 7
  • Parsing HTML with regex is [not exactly recommended](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – Biffen Apr 04 '14 at 14:38
  • Understood and I was waiting for that thread to come up. If regex is not the way to go, then what is? – Sharkus Apr 10 '14 at 18:46
  • An (X|HT)ML library of some kind. You haven't mentioned what language you're using, but there're probably a few available for it. – Biffen Apr 11 '14 at 06:52
  • 1
    The idea was a search and replace using TextWrangler, or perhaps just diving into bash and using sed. – Sharkus Apr 11 '14 at 10:48
  • If you're able to review everything yourself then you can discard my warning. Whatever method is the quickest should do. Using regex to parse HTML in runtime and/or production on the other hand is a bad idea. – Biffen Apr 11 '14 at 10:55
  • [Oh yes you can](http://stackoverflow.com/a/4234491/471272). – tchrist Jun 06 '14 at 22:43

1 Answers1

14

Do a regex which replaces </?span[^>]*> with empty string

Allan S. Hansen
  • 3,893
  • 17
  • 21
  • Works a treat. Just removes all `span` tags and any classes or other text within the tag. – Foliovision Feb 10 '19 at 19:40
  • 1
    Perfect! Removes attributes as well, which was what I needed. I processed 124k instances of auto-added span tags. – MiB Feb 22 '20 at 17:43