0

I have a large string of HTML (and javascript). I need to get text that is inside document.write()

<script>
  $('.navigation').html();

  window.jQuery || document.write("<script src='//cdn.shopify.com/s/files/1/0967/6522/t/2/assets/jquery.min.js?15152727378558387064'> $('.link').attr('href',url)  \x3C/script>")

$('.button').html();

</script>

Currently I am finding the index of document.write then deleting any text before it.

strIndex = scriptHtml.indexOf('document.write(');                   
scriptHtml = scriptHtml.substr(strIndex);

This will Leave me with a string like this.

document.write("<script src='//cdn.shopify.com/s/files/1/0967/6522/t/2/assets/jquery.min.js?15152727378558387064'> $(".link").attr('href',url)  \x3C/script>")

$('.button').html();

</script>

I need to find the first bracket in this new string and then know where the matching bracket ends so that i can get the string inside it.

I have tried some regex but cannot make one that works.

\(([^)]+)\)

The above regex does not work as it will match to:

 ("<script src='//cdn.shopify.com/s/files/1/0967/6522/t/2/assets/jquery.min.js?15152727378558387064'> $(".link")

as it just searches for an opening and closing bracket without considering how many have been opened.

Has anyone got an idea of how i can get the text i want or think of a better way i can get the text inside document.write?

Thanks

Jay Povey
  • 107
  • 10

1 Answers1

0

Regular Expressions are simply not the right tool for matching parenthesis that can nest, as they lack the mechanisms that would allow you to do this properly (in this case, recursion). See this answer for more information.

That said, in the example code you posted, simply matching the string document.write along with its quote marks will work (assuming you put the whole code into a variable named str):

console.log(str.match(/document\.write\("([^"]*)"\)/)[1]);

However, I strongly advise against this, as there are many, many possible cases in which parsing it this way will fail and accounting for all possibilities is very complex and really depends on how much you know about (or have control of) the possible inputs.

Community
  • 1
  • 1
fstanis
  • 4,719
  • 1
  • 17
  • 37
  • Thanks for the advise and the link. I will follow the advice in the link that you posted as I have no control over the input. The input can be literally anything. i will be extracting this data from code given to my programme by 3rd parties – Jay Povey Sep 23 '15 at 11:52