0

Edit: I am aware similar questions have been asked but I have not found a resolution to this problem. Maybe I am using the wrong search criteria as this is a new topic for me but I have yet to find something that resolves the problem. Your help (or even a link to a solution) would be greatly appreciated.

I have an HTML file (output from MS Word as Filtered HTML) and I want to get the inner text of the 'MsoTitle' class. From everything I have read, the code should work but I consistently receive a NullReferenceException and am not sure why.

HTML Snippet:

<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=Generator content="Microsoft Word 15 (filtered)">
</head>
<body lang=EN-US link="#0563C1" vlink="#954F72">
<div class=WordSection1>
<p class=MsoNormal align=center style='text-align:center'><img width=435
height=102 id="Picture 2" src="FUND00_files/image001.png"></p>
<p class=MsoTitle>My Title</p>
...

My code:

    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml("C:\\Temp\\Output\\FUND00.htm");
    string text = doc.DocumentNode.SelectSingleNode("//p[@class='MsoSubtitle']").InnerText;

Looking at some suggestions here, I tried placing the @ symbol before the quotes but that did not anything to remedy the situation.

Is there something I am doing wrong to get the innertext of this HTML class?

Bill
  • 474
  • 6
  • 18
  • Could you please point to the answer? I searched a lot (and I mean a lot) and tried many examples but could not find one that exactly matches this problem. – Bill Nov 05 '15 at 01:08
  • Weird, can't answer the question... Anyway, doc.LoadHtml is being used incorrectly. LoadHtml requires an string with the html, not an link to the webpage. – SILENT Nov 05 '15 at 02:00
  • So far as I can tell, the document is loading with my current approach. I just cannot get to the content I am trying to read. – Bill Nov 05 '15 at 02:19
  • It would load "C:\\Temp\\Output\\FUND00.htm". It needs to load "...." for the parser to do anything. Use doc.LoadHtml(File.ReadAllText("C:\\Temp\\Output\\FUND00.htm")) – SILENT Nov 05 '15 at 02:24
  • Answered! Thank you very much! I have been searching for a resolution to this problem the last two days. Looks like my code to read the file is accurate but the code to read the file did not contain what I now know is an all important File.ReadAllText element. – Bill Nov 05 '15 at 02:32
  • Anytime. Not sure why its considered a duplicate question since NullReferenceException is extremely generic. – SILENT Nov 05 '15 at 02:36
  • Same but I appreciate your help. – Bill Nov 05 '15 at 16:58
  • @SILENT If you read the duplicate, you will see that it contains a tutorial on how to debug a NRE. – DavidG Nov 06 '15 at 00:50
  • @DavidG The cause of the NRE was due to semantics (misinterpreting html to be the link to file). He did provide the correct type, just incorrect content. Even with breakpoints, unless you can enter the HTMLAgilityPack dll, it won't be easy to debug. Then again, if he has Reflector, that would be a different story. – SILENT Nov 06 '15 at 01:05

0 Answers0