Goutte 0,3,2,10,7,0,8,0 travis-ci Packagist phpunit

Goutte, a simple PHP Web Scraper

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses.

Requirements

Goutte depends on PHP 5.5+ and Guzzle 6+.

.. tip::

If you need support for PHP 5.4 or Guzzle 4-5, use Goutte 2.x (latest `phar
<https://github.com/FriendsOfPHP/Goutte/releases/download/v2.0.4/goutte-v2.0.4.phar>`_).

If you need support for PHP 5.3 or Guzzle 3, use Goutte 1.x (latest `phar
<https://github.com/FriendsOfPHP/Goutte/releases/download/v1.0.7/goutte-v1.0.7.phar>`_).

Installation

Add fabpot/goutte as a require dependency in your composer.json file:

.. code-block:: bash

composer require fabpot/goutte

Usage

Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\Client):

.. code-block:: php

use Goutte\Client;

$client = new Client();

Make requests with the request() method:

.. code-block:: php

// Go to the symfony.com website
$crawler = $client->request('GET', 'https://www.symfony.com/blog/');

The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler).

To use your own Guzzle settings, you may create and pass a new Guzzle 6 instance to Goutte. For example, to add a 60 second request timeout:

.. code-block:: php

use Goutte\Client;
use GuzzleHttp\Client as GuzzleClient;

$goutteClient = new Client();
$guzzleClient = new GuzzleClient(array(
    'timeout' => 60,
));
$goutteClient->setClient($guzzleClient);

Click on links:

.. code-block:: php

// Click on the "Security Advisories" link
$link = $crawler->selectLink('Security Advisories')->link();
$crawler = $client->click($link);

Extract data:

.. code-block:: php

// Get the latest post in this category and display the titles
$crawler->filter('h2 > a')->each(function ($node) {
    print $node->text()."\n";
});

Submit forms:

.. code-block:: php

$crawler = $client->request('GET', 'https://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, array('login' => 'fabpot', 'password' => 'xxxxxx'));
$crawler->filter('.flash-error')->each(function ($node) {
    print $node->text()."\n";
});

More Information

Read the documentation of the BrowserKit_ and DomCrawler_ Symfony Components for more information about what you can do with Goutte.

Pronunciation

Goutte is pronounced goot i.e. it rhymes with boot and not out.

Technical Information

Goutte is a thin wrapper around the following fine PHP libraries:

  • Symfony Components: BrowserKit, CssSelector and DomCrawler_;

  • Guzzle_ HTTP Component.

License

Goutte is licensed under the MIT license.

.. _Composer: https://getcomposer.org .. _Guzzle: http://docs.guzzlephp.org .. _BrowserKit: https://symfony.com/components/BrowserKit .. _DomCrawler: https://symfony.com/doc/current/components/dom_crawler.html .. _CssSelector: https://symfony.com/doc/current/components/css_selector.html

Related Repositories

Goutte

Goutte

Goutte, a simple PHP Web Scraper ...

laravel-goutte

laravel-goutte

Laravel 5 Facade for Goutte, a simple PHP Web Scraper ...

MinkGoutteDriver

MinkGoutteDriver

Goutte driver for Mink framework ...

marlon-csrfscanner

marlon-csrfscanner

CSRF Scanner written in PHP, using Goutte ...

laravel4-goutte

laravel4-goutte

laravel4-goutte ...


Top Contributors

fabpot everzet larowlan mtdowling hnw hason tiger-seo christianchristensen jakoch davedevelopment keradus igorw zeopix arithmetric BRMatt zachbadgett stof robo47 thewilkybarkid csarrazi fabioelizandro benoitMariaux camcima benja-M-1 Herzult clemherreman taavit fabian benji07 ossinkine

Dependencies

package version
php >=5.5.0
symfony/browser-kit ~2.1|~3.0
symfony/css-selector ~2.1|~3.0
symfony/dom-crawler ~2.1|~3.0
guzzlehttp/guzzle ^6.0

Releases

-   v3.1.2 zip tar
-   v3.1.1 zip tar
-   v3.1.0 zip tar
-   v3.0.0 zip tar
-   v2.0.4 zip tar
-   v2.0.3 zip tar
-   v2.0.2 zip tar
-   v2.0.1 zip tar
-   v2.0.0 zip tar
-   v1.0.7 zip tar
-   v1.0.6 zip tar
-   v1.0.5 zip tar
-   v1.0.4 zip tar
-   v1.0.3 zip tar
-   v1.0.2 zip tar
-   v1.0.1 zip tar
-   v1.0.0 zip tar
-   v0.1.0 zip tar