Unable to get html element by using X-Path in HtmlAgilityPack C#

Question

I am trying to get element by using x-path tree element but showing null, and this type of x-path work for other site for me, only 2% site this types of X-Path not working, also i tried x-path from chrome also but when my x-path not work that time chrome x-path also not work.

public static void Main()
    {
        string url = "http://www.ndrf.gov.in/tender";
        HtmlWeb web = new HtmlWeb();
        var htmlDoc = web.Load(url);
        var nodetest1 = htmlDoc.DocumentNode.SelectSingleNode("/html[1]/body[1]/section[2]/div[1]/div[1]/div[1]/div[1]/div[2]/table[1]"); // i want this type // not wroking
        //var nodetest2 = htmlDoc.DocumentNode.SelectSingleNode("//*[@id=\"content\"]/div/div[1]/div[2]/table"); // from Google chrome // not wroking
        //var nodetest3 = htmlDoc.DocumentNode.SelectSingleNode("//*[@id=\"content\"]"); // by ID but i don't want this type  // wroking
        Console.WriteLine(nodetest1.InnerText); //fail
        //Console.WriteLine(nodetest2.InnerText); //fail
        //Console.WriteLine(nodetest3.InnerText); //proper but I don't want this type
    }

It is not clear what you are after. Did you intend to write out entire table or one row? Or something else? — QHarr, Aug 10 '19 at 06:42
I also suggest using anglesharp instead of agility pack. Agility pack does not seem to be maintained any more. — Jochen Kühner, Aug 10 '19 at 06:50

score 1 · Answer 1 · answered Aug 10 '19 at 07:09

The answer that @QHarr suggested works perfectly, But the reason you get null with a correct x-path, is that there is a javascript file in the header of the site, that adds a wrapper div around the table, and since getting result in HtmlAgilityPack seems not loading or executing js, the x-path returns null.

what you observe, after that js runs is:

<div class="view-content">
      <div class="guide-text">
          ...
      </div>
      <div class="scroll-table1">
          <!-- Your table is here -->
      </div>
</div>

but what actually you get whithout that js, is:

<div class="view-content">
    <!-- Your table is here -->
</div>

thus your x-path should be:

var nodetest1 = htmlDoc.DocumentNode.SelectSingleNode("/html[1]/body[1]/section[2]/div[1]/div[1]/div[1]/div[1]/table[1]");

Ali Bordbar thanks for answering. Your X-Path tree work perfectly. but in my project there will be multiple site at that time how can i identify is there is any wrapper div have or not. Because i am generating perfect HTML X-Path tree. — Saalim Bhoraniya, Aug 10 '19 at 09:10
well, one way to make sure something like this does not happen, is to browse your target web page without javascript. disable javascript, explained in [this answer](https://stackoverflow.com/a/13405409/5493209) and refresh your page, then use the x-path to find the correct element — Ali Bordbar, Aug 10 '19 at 09:36

QHarr · Answer 2 · 2019-08-10T06:55:06.093

0

Your xpath when used in browser selects for entire table. You can shorten and use as follows (fiddle):

using System;
using HtmlAgilityPack;

public class Program
{

    public static void Main()
    {
        string url = "http://www.ndrf.gov.in/tender";
        HtmlWeb web = new HtmlWeb();
        var htmlDoc = web.Load(url);
        var nodetest1 = htmlDoc.DocumentNode.SelectSingleNode("//table");  
        Console.WriteLine(nodetest1.InnerText); 
    }
}

edited Aug 10 '19 at 06:55

answered Aug 10 '19 at 06:45

QHarr

72,711
10
44
81

QHarr thanks for reply..if there is multiple table than what should you do..and important thinks is in my project i am giving user to select element on mouse hover on WebBrowser control when he select element that time my code generate X-Path. – Saalim Bhoraniya Aug 10 '19 at 08:37
When website is loaded in WebBrowser control by user which have any tag like table, div span etc. when he select by mouse click so there is no fix //table element or any other. – Saalim Bhoraniya Aug 10 '19 at 08:47
There is no magic method and certainly not using xpath. There are ways of dynamically traversing until a certain value is found for example. If multiple tables then select multiple nodes and loop is one way.It sounds like you are expecting an answer that will do the same as right click copy xpath in browser which is beyond scope of this site. – QHarr Aug 10 '19 at 10:45

score 0 · Answer 3 · answered Aug 10 '19 at 06:55

0

Use Fizzler.Systems.HtmlAgilityPack details here : https://www.nuget.org/packages/Fizzler.Systems.HtmlAgilityPack/ This library adds extension methods called QuerySelector and QuerySelectorAll, that takes CSS Selector not XPath.

answered Aug 10 '19 at 06:55

Pranav

31
4

Saalim Bhoraniya · Accepted Answer · 2019-08-16T05:07:47.557

Ali Bordbar caught perfect, This Url adds a wrapper div when I navigating URL in WebBrowser control in this all JavaScript file are loaded, but when i load URL using HtmlWeb there is none of the JavaScript file loaded. The HtmlWeb retrieves the static HTML response that the server sends, and does not execute any javascript, whereas a WebBrowser would. So WebBrowser control HTML DOM data XPath and HtmlWeb HTML DOM data XPath not match.

My below code work perfect for this switchvation

HtmlWeb web = new HtmlWeb();
web.AutoDetectEncoding = true;
HtmlAgilityPack.HtmlDocument theDoc1 = web.Load("http://www.ndrf.gov.in/tender");
var HtmlDoc = new HtmlAgilityPack.HtmlDocument();
var bodytag = theDoc1.DocumentNode.SelectSingleNode("//html");
HtmlDoc.LoadHtml(bodytag.OuterHtml);
var xpathHtmldata = HtmlDoc.DocumentNode.SelectSingleNode(savexpath); //savexpath is my first xpath make from HTML DOM data of WebBrowser control which is work for most url.
if (xpathHtmldata == null)
{
    //take last tag name from first xpath
    string mainele = savexpath.Substring(savexpath.LastIndexOf("/") + 1);
    if (mainele.Contains("[")) { mainele = mainele.Remove(mainele.IndexOf("[")); }
    //collect all tag name with name of which is sotre in mainele variable
    var taglist = HtmlDoc.DocumentNode.SelectNodes("//" + mainele);
    foreach (var ele in taglist) //check one by one element 
    {
        string htmltext1 = ele.InnerText;
        htmltext1 = Regex.Replace(htmltext1, @"\s", "");
        htmltext1 = htmltext1.Replace("&amp;", "&").Trim();
        htmltext1 = htmltext1.Replace("&nbsp;", "").Trim();

        string htmltext2 = saveInnerText; // my previus xpath text from HTML DOM data of WebBrowser control
        htmltext2 = Regex.Replace(htmltext2, @"\s", "");

        if (htmltext1 == htmltext2) // check equality to my previus xpath text..if it is equal thats my new xpath
        {
            savexpath = ele.XPath;
            break;
        }
    }
}

Unable to get html element by using X-Path in HtmlAgilityPack C#

4 Answers4