1

My program is entering a youtube video link and its trying to get the commentary box. I know how to get it, but when i try to reach the div containing it, it appears as the loading div, so I'm assuming that the page is not fully loaded. I tried these solutions and none of them worked:

while(pagina.getFirstByXPath("//div[@id='comment-section-renderer']/div")
                           .toString().contains("loading")) {
    synchronized(pagina) {
        pagina.wait(2000);
    }
}

and the other way:

 cliente.waitForBackgroundJavaScript(100000);

the page loads from gmail sign in, and i checked that the user was successfully logged in when it's loading the video page.

Here is the code of the method

public HtmlPage comentaVideo(String correo, String pass, String video, 
                             String comentario) throws ... {

    String url= "https://www.youtube.com"+video;
    HtmlPage pagina;
    HtmlDivision division;
    HtmlButton boton;
    HtmlTextInput input;

    pagina = cliente.getPage("https://www.youtube.com/watch?v=E2b9PiqobWg");

    boton = pagina.getFirstByXPath("//div[@id='yt-masthead-signin']/div/button"); 
    //press sign in button
    pagina = boton.click();

    pagina=iniciaSesion(correo,pass,pagina); //Login gmail (working)        

    System.out.println(pagina.getUrl().toString()); //just for debug

    //Trying to get the coment box div
    division = pagina.getFirstByXPath("//div[@id='comment-section-renderer']/div"); 

    //verifying that the div is correct
    System.out.println(division.toString()); 

    //some tests...
    pagina=division.click();

    boton= pagina.getFirstByXPath("//div[@id='comment-simplebox']/div/button[2]");
    pagina=boton.click();

    return pagina;

}

Now that I recognised the problem, this is the updated Method, still not working...

public HtmlPage comentaVideo(String correo, String pass, String video, String comentario) throws FailingHttpStatusCodeException, MalformedURLException, IOException, ErrorSesionNoIniciada, InterruptedException{

        String url= "https://www.youtube.com"+video;
        HtmlPage pagina;
        HtmlDivision division;
        HtmlButton boton;
        HtmlTextInput input;

        pagina = cliente.getPage("https://www.youtube.com/watch?v=E2b9PiqobWg");

        boton = pagina.getFirstByXPath("//div[@id='yt-masthead-signin']/div/button");
        pagina = boton.click();

        pagina=iniciaSesion(correo,pass,pagina);        

        System.out.println(pagina.getUrl().toString());


        //Parte no funcional

        division = pagina.getFirstByXPath("//div[@id='comment-section-renderer']/div"); 


        boton = division.getFirstByXPath("//div[@id='comment-section-renderer']/div[2]/button"); //best comments button

    while(boton == null){ //while this button is not loaded
        ScriptResult sr=pagina.executeJavaScript("window.scrollBy(0,60000)");
        cliente.waitForBackgroundJavaScript(1000);
        pagina=(HtmlPage)sr.getNewPage();
        boton = division.getFirstByXPath("//div[@id='comment-section-renderer']/div[2]/button"); 
    }
    System.out.println(boton.toString());



        //just for testing
        division = pagina.getFirstByXPath("//div[@id='comment-section-renderer']/div"); 

        System.out.println(division.toString());
        pagina=division.click();


        boton= pagina.getFirstByXPath("//div[@id='comment-simplebox']/div/button[2]");
        pagina=boton.click();

        return pagina;

I also tried setting the inner page height to maximun size. (The code have unused var and throws because is just for testing, I will update it with the final version when I get the solution)

EDIT 1: CHANGED THE WHILE LOOP CONDITION, STILL NOT WORKING

Setekorrales
  • 105
  • 1
  • 9
  • Looking at the youtube page, it seems the ajax that loads the comment section is only triggered when you scroll down to have it visible in the page. You may want to try to simulate the scrolling – guido May 12 '16 at 22:31
  • @ᴳᵁᴵᴰᴼ wow thank you man, I didnt think about that, thank you very much!. If you dont mind, can you answer this question? How do I edit an editable
    – Setekorrales May 13 '16 at 09:41
  • Sure; about the other question about filling the div with htmlunit, I guess it is better to create a new one. (welcome to stackoverflow) – guido May 13 '16 at 09:54
  • @ᴳᵁᴵᴰᴼ now I'm stuck with the scrolling part, I tried the executeJavaScript scroling method and the fireEvent("scroll"), and even changed the max size of the actual page, I also tried to put all of these methods inside a while loop with a waitForJavaScript(6000), none of this works... If you have any solution... – Setekorrales May 13 '16 at 10:27
  • I'll look into it later if you still have the problem, I am unable to access youtube at the moment; I would try this (setting the inner height to maximum size): http://stackoverflow.com/questions/12119610/crawl-dynamic-web-page-using-htmlunit in the meanwhile – guido May 13 '16 at 10:42
  • @ᴳᵁᴵᴰᴼ Yeah I already red that post and nothing... I'm really fed up with this youtube thing that I'm trying to make, but I guess if I'm able to get the info from the comments on youtube It won't be hard to extract any info from a normal web page – Setekorrales May 13 '16 at 10:49
  • Update your question with the changes you did please, even if it only for learning purposes, you might still found a solution... – guido May 13 '16 at 10:53
  • I updated the answer, see if with that change it works. – guido May 13 '16 at 16:43
  • @ᴳᵁᴵᴰᴼ No, still doesnt work, as you can see now I'm trying to get the best comment button and is not loaded and scrolling inside the loop doesnt affect the loading because no more comments are generated until a button is pressed. – Setekorrales May 13 '16 at 20:30

1 Answers1

1

Looking at the youtube page structure, it seems the ajax that loads the comments section is only triggered when you scroll down the page, to the point that it becomes visible in the page. You may want to try to simulate the scrolling first, then relying on your loop which waits for the "loading" string to disappear from the inner html of the container div.

Also consider that this behaviour may change anytime soon when they roll-out an update.

EDIT:

after checking with chrome inspector, it seems there a lot more div elements containing the "loading" (sub)string even after the comment section is populated via ajax. I'd suggest to modify your condition for a new expected string to appear, instead of the "loading" to go. For instance you could search for "Top comments" (button text), or "Add a public comment..." (placeholder for the comment posting textarea).

guido
  • 17,668
  • 4
  • 66
  • 89