-1

I have a string that has some html tags in it. I want to be able to find an opening and closing tag, and add those that are missing. For example, lets say I have this string:

Hello <strong>from <em>StackOverflow</strong> Hows it going</em>.

I want to be able to close the first <em> before the closing </strong> and add an opening <em> right after the closing </strong>. I will know what tags are included, but the stringI am provided could contain any variation of mixed up opening/closing tags.

How could I handle this without help from DOMParser or innerHTML?

Basically, I want to iterate the string and when I come across <, I want to start an opening string, and close it at >. Then, I want to start a closing tag when I come across </ and end it at >.

The problem is, when iterating a string, I will always start an opening tag due to checking one char at a time? How can I build these tags out of my string as I iterate?

johnny_mac
  • 1,193
  • 3
  • 11
  • 35
  • Please read the [canonical/obligatory answer](http://stackoverflow.com/a/1732454/1715579) before going down this path. – p.s.w.g Jan 30 '17 at 23:25
  • Got it, updating the question now – johnny_mac Jan 30 '17 at 23:28
  • 2
    Possible duplicate of [Javascript : Close open HTML tags in a string](http://stackoverflow.com/questions/14749078/javascript-close-open-html-tags-in-a-string) – Vlad Gincher Jan 30 '17 at 23:38
  • This is not a dupe of that question. That questions answer uses innerHTML. I want to do this without the help of DOMParser or the browser. Before anyone mentions that Javascript requires a browser, please consider it server side Javascript for the sake of this argument. – johnny_mac Jan 30 '17 at 23:42
  • Use a library such as htmltidy. –  Jan 30 '17 at 23:59
  • Server-side JS can use DOM libraries. –  Jan 31 '17 at 00:01
  • @torazaburo I know it can use DOM libraries, I am trying to write this myself, without the help of libraries. – johnny_mac Jan 31 '17 at 00:17
  • Downvoter of the question, can you please explain why you feel this way? It is not a duplicate. – johnny_mac Jan 31 '17 at 00:19
  • It's a one-year project to write something which reliably tidies up HTML. –  Jan 31 '17 at 00:39

1 Answers1

0

You could find where both the opening and closing tags are in JavaScript with a regex like (<.+?>), but considering regex lookbehind is unsupported in JavaScript, what you are asking is probably impossible, due to the fact that you'll never be able to work out which end tag correlates to which start tag.

Hope this helps!

Obsidian Age
  • 36,816
  • 9
  • 39
  • 58
  • Well, my thinking was to push the tag to a stack. Then I can check the tag type, `strong` etc. If I am pushing a closing tag onto its corresponding opening tag, then I can just pop the opening and move on, as that is not incorrect html. If however, I am pushing a closing `` onto an open ``, then I know I need to add a closing `` to the string at that index. Is that a crazy thought? – johnny_mac Jan 30 '17 at 23:47
  • 1
    The problem with that is that nested tags can have the **same** type. For example, you can have a div within a div. You may find [the comments here](http://stackoverflow.com/questions/37823200/final-solution-for-using-regex-to-remove-html-nested-tags-of-the-same-type) helpful in that regard. – Obsidian Age Jan 31 '17 at 00:44