13

This question describes my conclusion after researching available options for creating a headless Chrome instance in Python and asks for confirmation or resources that describe a 'better way'.

From what I've seen it seems that the quickest way to get started with a headless instance of Chrome in a Python application is to use CEF (http://code.google.com/p/chromiumembedded/) with CEFPython (http://code.google.com/p/cefpython/). CEFPython seems premature though, so using it would likely mean further customization before I'm able to load a headless Chrome instance that loads web pages (and required files), resolves a completed DOM and then lets me run arbitrary JS against it from Python.

Have I missed any other projects that are more mature or would make this easier for me?

Michael0x2a
  • 41,137
  • 26
  • 119
  • 175
Trindaz
  • 14,751
  • 20
  • 74
  • 103
  • Why specifically do you need a headless Chrome instance? – Daniel Roseman Mar 19 '12 at 19:26
  • @Marcin, I'm developing on Windows 7 but will publish the application as a website on Ubuntu. – Trindaz Mar 19 '12 at 19:26
  • @Trindaz, CefPython has a real API now, there is still much work in the coming weeks, but some things already work like calling javascript from python: browser.GetMainFrame().ExecuteJavascript("alert('hello!')") – Czarek Tomczak Jul 07 '12 at 09:40
  • @CzarekTomczak thanks - I posed a CefPython specific followup question at http://magpcss.org/ceforum. Is there a google group devoted to this? – Trindaz Jul 07 '12 at 10:46
  • @Trindaz, I asked Marshall whether it would be possible to create a subforum there at mapgcss, if not I will think of hosting my own forum and will put some link at google-cefpython site. – Czarek Tomczak Jul 07 '12 at 11:34
  • @CzarekTomczak why not just start a google group? That's what all the other groups use, zombie, phantom, jsdom, etc. And can you just email me dave dot trindall at gmail dot com to continue this conversation? We have to be breaking SO rules by having this back and forth here – Trindaz Jul 07 '12 at 13:09

5 Answers5

12

Any reason you haven't considered Selenium with the Chrome Driver?

http://code.google.com/p/selenium/wiki/ChromeDriver

http://code.google.com/p/selenium/wiki/PythonBindings

jdi
  • 83,050
  • 18
  • 151
  • 188
  • 2
    Combined with http://www.youtube.com/watch?v=DL7gyuqkzzU, this gives me exactly what I need – Trindaz Mar 19 '12 at 20:28
  • 2
    To summarise the youtube, you need: "from pyvirtualdisplay import Display; display = Display(visible=0, size=(1024, 768)); display.start()" – spookylukey Jun 04 '12 at 17:35
10

This question is 5 years old now and at the time it was a big challenge to run a headless chrome using python, but the good news is:

Starting from version 59, released in June 2017, Chrome comes with a headless driver, meaning we can use it in a non-graphical server environment and run tests without having pages visually rendered etc which saves a lot of time and memory for testing or scraping. Setting Selenium for that is very easy:

(I assume that you have installed selenium and chrome driver):

from selenium import webdriver

#set a headless browser
options = webdriver.ChromeOptions()
options.add_argument('headless')
browser = webdriver.Chrome(chrome_options=options)

and now your chrome will run headlessly, if you take out options from the last line, it will show you the browser.

Ibo
  • 3,351
  • 6
  • 33
  • 51
2

While I'm the author of CasperJS, I invite you to check out Ghost.py, a webkit web client written in Python.

While it's heavily inspired by CasperJS, it's not based on PhantomJS — it still uses PyQt bindings and Webkit though.

NiKo
  • 10,608
  • 5
  • 42
  • 56
0

I use this to get the driver:

def get_browser(storage_dir, headless=False):
    """
    Get the browser (a "driver").

    Parameters
    ----------
    storage_dir : str
    headless : bool

    Results
    -------
    browser : selenium webdriver object
    """
    # find the path with 'which chromedriver'
    path_to_chromedriver = '/usr/local/bin/chromedriver'

    from selenium.webdriver.chrome.options import Options
    chrome_options = Options()
    if headless:
        chrome_options.add_argument("--headless")
    chrome_options.add_experimental_option('prefs', {
        "plugins.plugins_list": [{"enabled": False,
                                  "name": "Chrome PDF Viewer"}],
        "download": {
            "prompt_for_download": False,
            "default_directory": storage_dir,
            "directory_upgrade": False,
            "open_pdf_in_system_reader": False
        }
    })

    browser = webdriver.Chrome(path_to_chromedriver,
                               chrome_options=chrome_options)
    return browser

By switching the headless parameter you can either watch it or not.

Martin Thoma
  • 91,837
  • 114
  • 489
  • 768
0

casperjs is a headless webkit, but it wouldn't give you python bindings that I know of; it seems command-line oriented, but that doesn't mean you couldn't run it from python in such a way that satisfies what you are after. When you run casperjs, you provide a path to the javascript you want to execute; so you would need to emit that from Python.

But all that aside, I bring up casperjs because it seems to satisfy the lightweight, headless requirement very nicely.

sethcall
  • 2,637
  • 1
  • 16
  • 21
  • Casperjs is a testing framework for PhantomJS, which is a headless QtWebkit. It allows you to communicate via the REST API. – Tobias Cudnik Apr 25 '12 at 09:09