11

I'm fetching pages with cURL in PHP. Everything works fine, but I'm fetching some parts of the page that are calculated with JavaScript a fraction after the page is loaded. cURL already send the page's source back to my PHP script before the JavaScript calculations are done, thus resulting in wrong end-results. The calculations on the site are fetched by AJAX, so I can't reproduce that calculation in an easy way. Also I have no access to the target-page's code, so I can't tweak that target-page to fit my (cURL) fetching needs.

Is there any way I can tell cURL to wait until all dynamic traffic is finished? It might be tricky, due to some JavaScripts that are keep sending data back to another domain that might result in long hangs. But at least I can test then if I at least get the correct results back.

My Developer toolbar in Safari indicates the page is done in about 1.57s. Maybe I can tell cURL statically to wait for 2 seconds too?

I wonder what the possibilities are :)

3 Answers3

6

cURL does not execute any JavaScript or download any files referenced in the document. So cURL is not the solution for your problem.

You'll have to use a browser on the server side, tell it to load the page, wait for X seconds and then ask it to give you the HTML.

Look at: http://phantomjs.org/ (you'll need to use node.js, I'm not aware of any PHP solutions).

Jan Hančič
  • 49,796
  • 15
  • 87
  • 97
  • Luckily it's just a small piece of code. I'll rewrite the code in JavaScript and fetch the data with jQuery and PhantomJS then. Thank you :) –  Jan 31 '13 at 13:00
  • Is there any way to include PhantomJS just plainly in my local HTML-page where I do my jQuery? –  Jan 31 '13 at 13:24
  • No. phantom.js uses a real webkit browser internally, which you can't do on the client. – Jan Hančič Jan 31 '13 at 13:26
3

Not knowing a lot about the page you are retrieving or the calculations you want to include, but it could be an option to cURL straight to the URL serving those ajax requests. Use something like Firebug to inspect the Ajax calls being made on your target page and you can figure out the URL and any parameters passed. If you do need the full web page, maybe you can cURL both the web page and the Ajax URL and combine the two in your PHP code, but then it starts to get messy.

Peter Herdenborg
  • 5,283
  • 1
  • 16
  • 21
2

There is one quite tricky way to achieve it using php. If you' really like it to work for php you could potentially use Codeception setup in junction with Selenium and use Chrome browser webdriver in headless mode.

Here are some general steps to have it working.

  1. You make sure you have codeception in your PHP project https://codeception.com

  2. Download chrome webdriver: https://chromedriver.chromium.org/downloads

  3. Download selenium: https://www.seleniumhq.org/download/

  4. Configure it accordingly looking into documentation of codeception framework.

  5. Write codeception test where you can use expression like $I->wait(5) for waiting 5 seconds or $I->waitForJs('js expression here') for waiting for js script to complete on the page.

  6. Run written in previous step test using command php vendor/bin/codecept path/to/test

K. Igor
  • 185
  • 2
  • 6