0

[Hello Developers as I have been solving such issue from a long time but didn't get any fruitful result which causes me to get help from senior people at StackOverflow as I am using HtmlAgilityPack in c# console application for web scraping as I have attached image, I want to parse the div from starting till the h3 tag contains hyperlink, how can i do it with HTML agility pack.

I have tried multiple solutions to parse but no fruitful result.

Attached Image too 1

Code is here:

        static string url = "https://www.rozee.pk/job/jsearch/q/all/fc/1184/fin/1/";
        HtmlWeb web = new HtmlWeb();
        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc = web.Load(url);
        var nodes = doc.DocumentNode.SelectSingleNode("//div[@class='job-listing opages npages']/div[@class='j-area']/div[@class='jlist float-left']/div[@class='job']/div[@class='jcont']/div[@class='jhead']/div[@class='jobt float-left']/h3[@title]/a[@href]").InnerText;
        Console.WriteLine(nodes);`
        
    

It gives

Null Reference Exception was unhandled. Use the new keyword the create the instance of Object.

barny
  • 5,280
  • 4
  • 16
  • 21
  • Possible duplicate of [What is a NullReferenceException, and how do I fix it?](https://stackoverflow.com/questions/4660142/what-is-a-nullreferenceexception-and-how-do-i-fix-it) – SᴇM Jun 06 '19 at 13:42
  • If I understand correctly, what you are trying to achieve is get each job link? if so, this xpath should help `"//div[@id='app']/div/div[2]/div[2]/div[*]/div[1]/div[1]/div/h3/a"`, then if you want to parse maybe title or other staff you can go up the tree using `ParentNode` and other properties. – Anton Kahwaji Jun 06 '19 at 13:43
  • Why it is giving exception on nodes object any suggestion will be appreciated – Mashood Siddiquie Jun 06 '19 at 13:46
  • what does this 2 or 1 refer too? @AntonKahwaji – Mashood Siddiquie Jun 06 '19 at 13:47
  • 2 and 1 are second and first (grab second or first div here). But that's not the problem here, I think the website's jobs that are loaded are done using client side javascript requests (I'm not sure about the correct name), and HtmlWeb here doesn't execute them (as it isn't a browser), maybe selenium might be of help but I'm not sure, maybe try looking around the website and the requests it does maybe you can find the source and fetch them directly. – Anton Kahwaji Jun 06 '19 at 14:15

1 Answers1

0
string htmlText = doc.ParsedText;

This will give you the content of html page that you trying to acquire. So you do find in this text file for particular tag you try to get. ex : <bdi>Wordpress Developer</bdi>

You will not be able to see this tag in that html content.

REASON : html agility pack cannot load dynamic contents. It is not act as your browser. It just help you to parse the html text. So you can navigate or traverse in to html.

Read this so article and you will understand.

Hint : if you look carefully at doc.parsedText, you can find a script which contains the data you looking.

cdev
  • 3,443
  • 1
  • 23
  • 28