Questions tagged [goutte]

Goutte is a simple headless web browser, written in PHP.

Goutte is a simple headless web browser / web scraper, written in PHP.

It can be used for writing automated testing scripts for websites.

It is a thin wrapper around a number of existing Symphony classes and components, including BrowserKit, DomCrawler, and others.

Full source code can be found here: https://github.com/fabpot/Goutte

295 questions
31
votes
1 answer

Behat & Mink : Use the test environment

I'm current using Behat with Mink & Goutte Driver. When i'm trying to use it with my dev environment, via the app_dev.php file, which is a typical app_dev.php file from a Symfony2 Standard Edition, my tests are working just fine (Gists). But, if I…
Talus
  • 754
  • 6
  • 18
11
votes
2 answers

How to use Goutte

Issue: Cannot fully understand the Goutte web scraper. Request: Can someone please help me understand or provide code to help me better understand how to use Goutte the web scraper? I have read over the README.md. I am looking for more information…
scrfix
  • 1,088
  • 3
  • 9
  • 22
10
votes
2 answers

How to use proxy authentication with Goutte?

I have the following code but it always returns a 407 HTTP status code. $url = 'http://whatismyip.org'; $client = new Client(); $options = array( 'proxy' => array( 'http' => 'tcp://@x.x.x.x:8010', ), 'auth' =>…
Abs
  • 51,038
  • 92
  • 260
  • 394
8
votes
3 answers

Goutte - Get inner values from $crawler->filter()

I am using PHP 7.1.33 and "fabpot/goutte": "^3.2". My composer file looks like the following: { "name": "ubuntu/workspace", "require": { "fabpot/goutte": "^3.2" }, "authors": [ { "name": "admin", …
Carol.Kar
  • 4,830
  • 22
  • 98
  • 199
8
votes
3 answers

How to crawl with php Goutte and Guzzle if data is loaded by Javascript?

Many times when crawling we run into problems where content that is rendered on the page is generated with Javascript and therefore scrapy is unable to crawl for it (eg. ajax requests, jQuery)
Batman
  • 81
  • 1
  • 1
  • 5
7
votes
3 answers

Can Goutte/Guzzle be forced into UTF-8 mode?

I'm scraping from a UTF-8 site, using Goutte, which internally uses Guzzle. The site declares a meta tag of UTF-8, thus: However, the content type header is thus: Content-Type:…
halfer
  • 18,701
  • 13
  • 79
  • 158
5
votes
1 answer

How to extract data with Goutte Crawler?

This code, returned hrefs to content, now I want to extract content form this hrefs and sent it to my view. Name divs which I need to extract:
Contact:
Jensej
  • 1,067
  • 6
  • 16
  • 32
5
votes
2 answers

Goutte - dom crawler - remove node

I have html on my site (http://testsite.com/test.php) :
click back
Mark888
  • 53
  • 1
  • 3
5
votes
1 answer

Access Guzzle Response from Goutte

I'm trying to access to the Guzzle Response object from Goutte. Because that object has nice methods that i want to use. getEffectiveUrl for example. As far as i can see there is no way doing it without hacking the code. Or without accessing the…
Can Vural
  • 1,650
  • 1
  • 19
  • 35
5
votes
2 answers

How can I scrape website content in PHP from a website that requires a cookie login?

My problem is that it doesn't just require a basic cookie, but rather asks for a session cookie, and for randomly generated IDs. I think this means I need to use a web browser emulator with a cookie jar? I have tried to use Snoopy, Goutte and a…
Forest
  • 888
  • 1
  • 8
  • 23
4
votes
1 answer

How to run PHPUnit from a PHP script?

I am creating a custom testing application using PHPUnit and Goutte. I would like to load the Goutte library (plus any files required for the tests) within my own bootstrap file and then start the PHPUnit test runner once it is all bootstrapped. I'm…
Saintwolf
  • 539
  • 1
  • 5
  • 18
4
votes
1 answer

Goutte - Check if there are two nodes

I am using php 7.4.1 and "fabpot/goutte": "^3.3". I am having the following script:
Carol.Kar
  • 4,830
  • 22
  • 98
  • 199
4
votes
1 answer

How can I fix the SSL problem in Symfony/Goutte

I'm trying to make a request to the website with Symfony/Goutte but I'm receiving such error: In ErrorChunk.php line 65: SSL peer certificate…
Mesolaries
  • 288
  • 4
  • 12
4
votes
0 answers

How to scrape URL protected by login using Goutte (I have login account)

I found similar question in here. But I didn't get enough information, so I decided to make new question. let's assume urls are as following. url1. http://base_url/login url2. http://base_url/home url3. http://base_url/target Note: if I logged in…
Lead Developer
  • 1,520
  • 8
  • 23
4
votes
3 answers

Setting CURL Parameters for fabpot/goutte Client

I am working on a web crowler using goutte (fabpot/goutte). When I try to connect to an https site, it throws an error because the site is using a self signed certificate. I am trying to find the way to set the curl parameters to ignore the fact…
osantos
  • 235
  • 2
  • 15
1
2 3
19 20