Parse contents of script tags inside string

Question

Let's say I have the following string:

var myString = "<p>hello</p><script>console.log('hello')</script><h1>Test</h1><script>console.log('world')</script>"

I would like to use split to get an array with the contents of the script tags. e.g. I want my output to be:

["console.log('hello')", "console.log('world')"]

I tried doing myString.split(/[<script></script>]/) But did not get the expected output.

Any help is appreciated.

You might want to check out [this post](http://stackoverflow.com/questions/6659351/removing-all-script-tags-from-html-with-js-regular-expression) — wahwahwah, May 04 '15 at 15:03
I would suggest staying away from ["manual" parsing](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) and rather creating an actual DOM element from this string. You will then be able to directly access the different tags and their content. See [Creating a new DOM element from an HTML string using built-in DOM methods or prototype](http://stackoverflow.com/questions/494143/creating-a-new-dom-element-from-an-html-string-using-built-in-dom-methods-or-pro) for some more info and techniques. — Lix, May 04 '15 at 15:04
@i_trope do you need to do it using split or could you do it otherwise? — alainlompo, May 04 '15 at 15:14

score 14 · Accepted Answer · edited May 23 '17 at 10:27

14

You can't parse (X)HTML with regex.

Instead, you can parse it using innerHTML.

var element = document.createElement('div');
element.innerHTML = myString; // Parse HTML properly (but unsafely)

However, this is not safe. Even if innerHTML doesn't run the JS inside script elements, malicious strings can still run arbitrary JS, e.g. with <img src="//" onerror="alert()">.

To avoid that problem, you can use DOMImplementation.createHTMLDocument to create a new document, which can be used as a sandbox.

var doc = document.implementation.createHTMLDocument(); // Sandbox
doc.body.innerHTML = myString; // Parse HTML properly

Alternatively, new browsers support DOMParser:

var doc = new DOMParser().parseFromString(myString, 'text/html');

Once the HTML string has been parsed to the DOM, you can use DOM methods like getElementsByTagName or querySelectorAll to get all the script elements.

var scriptElements = doc.getElementsByTagName('script');

Finally, [].map can be used to obtain an array with the textContent of each script element.

var arrayScriptContents = [].map.call(scriptElements, function(el) {
    return el.textContent;
});

The full code would be

var doc = document.implementation.createHTMLDocument(); // Sandbox
doc.body.innerHTML = myString; // Parse HTML properly
[].map.call(doc.getElementsByTagName('script'), function(el) {
    return el.textContent;
});

edited May 23 '17 at 10:27

Community

1
1

answered May 04 '15 at 15:04

Oriol

225,583
46
371
457

Would fragment not be enough? And would some browsers execute the script upon adding? PS: Where would the map end up as in what var? – mplungjan May 04 '15 at 15:04
3

I love when I see that epic link. – kemicofa ghost May 04 '15 at 15:08
1

@mplungjan JS in `script` elements never run when created with `innerHTML`. However, other JS code may run, e.g. ``. Therefore, I used a "[sandbox](https://developer.mozilla.org/en-US/docs/Web/API/DOMImplementation/createHTMLDocument)". – Oriol May 04 '15 at 15:08
So where are the strings then? You map and have an array [] but where is the var that holds them? – mplungjan May 04 '15 at 15:09
@mplungjan The call to `[].map` returns the desired array. It can be assigned to some variable: `var arr = [].map.call(/* ... */)`. – Oriol May 04 '15 at 15:10
That is what I assumed but guessed OP would not grasp – mplungjan May 04 '15 at 15:14

score 2 · Answer 2 · answered May 04 '15 at 17:12

2

Javascript Code:

   function myFunction() {
        var str = "<p>hello</p><script>console.log('hello')</script><h1>Test</h1><script>console.log('world')</script>";

        console.log(str.match(/<script\b[^>]*>(.*?)<\/script>/gm));
}

answered May 04 '15 at 17:12

Ritesh Karwa

2,026
11
17

score 1 · Answer 3 · answered May 04 '15 at 15:17

1

You have to escape the forward slash like so: /.

 myString.split(/(<script>|<\/script>)/)

answered May 04 '15 at 15:17

kaz

1,190
8
19

Parse contents of script tags inside string

3 Answers3