0

I'm trying to parse a specific JSON object that can be found within a webpage's HTML code. I understand I have to use HtmlAgilityPack to essentially "trim out" the JSON from the rest of the HTML and I've used HtmlAgilityPack before, so I had an idea of what I needed to do.

My problem comes in where the specific JSON I'm looking for is in a script tag that contains multiple other JSON objects.

Here's a simplified version of what I get from the website:

<script>
    window.website = {};
    window.website.STATE_FROM_SERVER = { big json here };
    window.digitalData = { big json here };
    window.extraState= = {};
</script>

The JSON object that I specifically want to extract and parse is the one under the window.website.STATE_FROM_SERVER tag. I've tried a snippet of code that I found from this question but it still returns the JSONs from all of those window tags.

Here's the snippet that I used:

var WebJson = WebpageHtml.DocumentNode.SelectSingleNode("//script[contains(.,'window.website.STATE_FROM_SERVER')]");

This snippet would work if I only had to deal with 1 window tag, but there's 4 in this case and like I mentioned earlier, this returns all of the JSONs.

My question is, how can I retrieve the JSON object under the specific window tag that I'm looking for?

TGF
  • 3
  • 2
  • You can't parse JavaScript with HtmlAgilityPack. You can use it to extract the embedded script but will need to use a JavaScript parser like [tag:jint] or [tag:jurassic] or `IActiveScriptParse32` to parse the embedded script. Or I suppose a regex would also be an option. See: [Parsing HTML to get script variable value](https://stackoverflow.com/q/18156795), [Parsing javascript HTML using HTMLAgilityPack](https://stackoverflow.com/q/15296613), [parse and execute JS by C#](https://stackoverflow.com/q/4744105) or [Embedding JavaScript engine into .NET](https://stackoverflow.com/q/172753). – dbc Feb 15 '21 at 14:42
  • In fact this might be a duplicate of those questions, agree? – dbc Feb 15 '21 at 14:47

0 Answers0