2

I have seen several threads on StackOverflow concerning this topic, however none of them seem to provide an answer.

I have a button that, when clicked, opens up an invisible web page, navigates to a URL, enters information into a box, presses a button, and then scrapes the screen for information.

The bones of my code basically in the click:

WebBrowser wb = new WebBrowser;
wb.Visibility = System.Windows.Visibility.Hidden;
wb.Navigate("http://somepage.com");

And this is where it gets tricky.

I am looking for a way to ensure that the page is loaded before trying to enter data or scrape the screen. I have seen several threads that talk about Navigated, IsLoaded, LoadCompleted as well as BackgroundWork stuff, but I cannot get any of these to work.

Which is the best option to use to determine that the page has fully loaded? How would you get the chosen method to work?

I also cannot get the data from the screen as WPF does not use the same GetElementByID.

Edit:

Per the comment below, here are the errors I run into:

  • Navigated first as soon as the page has been navigated too and does not necessarily wait until all objects are loaded. This works as intended, but cannot be used for my purposes.
  • IsLoaded never returns true

    private void GetData_Click(object sender, RoutedEventArgs e)
    {
      int x=0;
      HTMLDocument doc;
    
      wb = new WebBrowser();
      wb.Visibility = System.Windows.Visibility.Visible;
      wb.Navigate("somesite.com");
    
      doc = wb.Document as mshtml.HTMLDocument;
    
      while(!wb.IsLoaded)
      {
        //Wait
      }
    
      doc.getElementById("txt_One").innerText = "It Worked";
    
    }
    

Puts it in an infinite loop as wb does not ever seem to load.

  • This is the version with LoadCompleted

The event 'System.Windows.Controls.WebBrowser.LoadCompleted' can only appear on the left hand side of += or -=

    private void GetData_Click(object sender, RoutedEventArgs e)
    {
      int x=0;
      HTMLDocument doc;

      wb = new WebBrowser();
      wb.Visibility = System.Windows.Visibility.Visible;
      wb.Navigate("somesite.com");

      doc = wb.Document as mshtml.HTMLDocument;

      wb.LoadCompleted += wb_LoadCompleted;

      doc.getElementById("txt_One").innerText = "It Worked";

    }

    void wb_LoadCompleted(object sender, NavigationEventArgs e)
    {

    }

Produces the error

An unhandled exception of type 'System.NullReferenceException' occured in {filename}

Additional information: Object reference not set to an instance of an object.

dbc
  • 80,875
  • 15
  • 141
  • 235
user3175176
  • 151
  • 1
  • 2
  • 13
  • 2
    Please elaborate on "cannot get any of these to work". Show your code, describe the problem with each attempt in much greater detail. We cannot tell you why it doesn't work without seeing any code. – tnw Jan 13 '14 at 16:15
  • Thanks for your edit... but please **show your code**. – tnw Jan 13 '14 at 16:40

1 Answers1

8

The webbrowser control has a loadedevent (which you have): LoadCompleted: fires when the dom is fully loaded.

Bind the event and in the event method get the document instead of right away.

    //root is a grid element identified in the XAML
    public WebBrowser webb;

    public MainWindow()
    {
        InitializeComponent();

        webb = new WebBrowser();
        webb.Visibility = System.Windows.Visibility.Hidden;
        root.Children.Add(webb);
        webb.LoadCompleted += webb_LoadCompleted;
        webb.Navigate("http://www.google.com");
    }

    void webb_LoadCompleted(object sender, NavigationEventArgs e)
    {
        MessageBox.Show("Completed loading the page");

        mshtml.HTMLDocument doc = webb.Document as mshtml.HTMLDocument;
        mshtml.HTMLInputElement obj = doc.getElementById("gs_taif0") as mshtml.HTMLInputElement;
        mshtml.HTMLFormElement form = doc.forms.item(Type.Missing, 0) as mshtml.HTMLFormElement;

        webb.LoadCompleted -= webb_LoadCompleted; //REMOVE THE OLD EVENT METHOD BINDING
        webb.LoadCompleted += webb_LoadCompleted2; //BIND TO A NEW METHOD FOR THE EVENT
        obj.value = "test search";
        form.submit(); //PERFORM THE POST ON THE FORM OR SEARCH
    }

    //SECOND EVENT TO FIRE AFTER YOU POST INFORMATION
    void webb_LoadCompleted2(object sender, NavigationEventArgs e)
    {
        MessageBox.Show("Completed loading the page second time after post"); 
    }

You need to do doc = wb.Document as mshtml.HTMLDocument; in the loadcompleted event. Because until the load is complete you cannot get the document.

  • webb_Navigated get's fired as indicated. However, nothing in the webb_LoadCompleted ever get's called. I removed everything but the messageboxes under the webb_loadcompleted area and nothing happens. – user3175176 Jan 13 '14 at 20:29
  • I ran a quick test and it looks like if the control is never placed on the window it will never fire. Try something like this: webb.Visibility = System.Windows.Visibility.Hidden; root.Children.Add(webb); Where root is the name of a grid element. –  Jan 13 '14 at 20:33
  • I did what you said and was able to get it to work. Is there a way to do this recursively? For instance, When I get to the original page, I enter in search criteria and click enter. I tried adding a new Navigated and LoadCompleted componenet but neither of them ever fire from within the webb_LoadCompleted component. – user3175176 Jan 13 '14 at 21:11
  • You should be able to continue navigating the webbrowser and not have to rebind the events they should always fire as long as you keep using the same webbrowser instance. If the url you are using, is using something like ajax to post the search criteria. The loaded event might never fire. I would have to see the website to provide more information as to how you could accomplish it. –  Jan 13 '14 at 21:19
  • It's not AJAX and it actually navigates to a separate page when you click submit. However, as in your example above, I am setting the text field in the LoadCompleted events which loads the second page. If I continue to use the same LoadCompleted event it is going to try to keep searching a button that isn't there. A comparable example would be navigating to google, wait for it to load, when it loads, enter search criteria, click search, wait for the search results and scrape the first five links. – user3175176 Jan 13 '14 at 21:29
  • I have updated the answer to show how you can go about posting information and making sure the event does not fire twice. By first unbinding the old event in the loading event and binding to a new method to fire after a post. –  Jan 13 '14 at 21:33