368

I'm trying to put list of possible solutions for browser automatic tests suits and headless browser platforms capable of scraping.


BROWSER TESTING / SCRAPING:

  • Selenium - polyglot flagship in browser automation, bindings for Python, Ruby, JavaScript, C#, Haskell and more, IDE for Firefox (as an extension) for faster test deployment. Can act as a Server and has tons of features.

JAVASCRIPT

  • PhantomJS - JavaScript, headless testing with screen capture and automation, uses Webkit. As of version 1.8 Selenium's WebDriver API is implemented, so you can use any WebDriver binding and tests will be compatible with Selenium
  • SlimerJS - similar to PhantomJS, uses Gecko (Firefox) instead of WebKit
  • CasperJS - JavaScript, build on both PhantomJS and SlimerJS, has extra features
  • Ghost Driver - JavaScript implementation of the WebDriver Wire Protocol for PhantomJS.
  • new PhantomCSS - CSS regression testing. A CasperJS module for automating visual regression testing with PhantomJS and Resemble.js.
  • new WebdriverCSS - plugin for Webdriver.io for automating visual regression testing
  • new PhantomFlow - Describe and visualize user flows through tests. An experimental approach to Web user interface testing.
  • new trifleJS - ports the PhantomJS API to use the Internet Explorer engine.
  • new CasperJS IDE (commercial)

NODE.JS

  • Node-phantom - bridges the gap between PhantomJS and node.js
  • WebDriverJs - Selenium WebDriver bindings for node.js by Selenium Team
  • WD.js - node module for WebDriver/Selenium 2
  • yiewd - WD.js wrapper using latest Harmony generators! Get rid of the callback pyramid with yield
  • ZombieJs - Insanely fast, headless full-stack testing using node.js
  • NightwatchJs - Node JS based testing solution using Selenium Webdriver
  • Chimera - Chimera: can do everything what phantomJS does, but in a full JS environment
  • Dalek.js - Automated cross browser testing with JavaScript through Selenium Webdriver
  • Webdriver.io - better implementation of WebDriver bindings with predefined 50+ actions
  • Nightmare - Electron bridge with a high-level API.
  • jsdom - Tailored towards web scraping. A very lightweight DOM implemented in Node.js, it supports pages with javascript.
  • new Puppeteer - Node library which provides a high-level API to control Chrome or Chromium. Puppeteer runs headless by default.

WEB SCRAPING / MINING

  • Scrapy - Python, mainly a scraper/miner - fast, well documented and, can be linked with Django Dynamic Scraper for nice mining deployments, or Scrapy Cloud for PaaS (server-less) deployment, works in terminal or an server stand-alone proces, can be used with Celery, built on top of Twisted
  • Snailer - node.js module, untested yet.
  • Node-Crawler - node.js module, untested yet.

ONLINE TOOLS


RELATED LINKS & RESOURCES

Questions:

  • Any pure Node.js solution or Nodejs to PhanthomJS/CasperJS module that actually works and is documented?

Answer: Chimera seems to go in that direction, checkout Chimera

  • Other solutions capable of easier JavaScript injection than Selenium?

  • Do you know any pure ruby solutions?

Answer: Checkout the list created by rjk with ruby based solutions

  • Do you know any related tech or solution?

Feel free to edit this question and add content as you wish! Thank you for your contributions!

user299709
  • 3,791
  • 4
  • 37
  • 76
Inoperable
  • 1,339
  • 5
  • 15
  • 30
  • 1
    dont know if thats what you want, but i like that module for crawling webpages, inspecting the dom and so on: https://npmjs.org/package/crawler. it uses jsdom, and you can jQuery as selector-engine. this one (which uses crawler) looks interesting too: https://npmjs.org/package/snailer – hereandnow78 Aug 30 '13 at 20:14
  • I've had a lot of success with the node-phantom module. It's pretty straight forward and fairly well documented. It does support javascript injection. – Josh C. Aug 30 '13 at 21:04
  • 1
    I am sure you are aware that GhostDriver is an implementation of Selenium-Webdriver that uses phantomJS – Robbie Wareham Aug 30 '13 at 22:32
  • 2
    You might also find this helpful: http://blog.screen-scraper.com/2010/06/28/comparison-of-web-scraping-software/ – todd Aug 30 '13 at 23:53
  • 1
    For visual scraping and comparisons: PhantomCSS and PhantomFlow – FelipeAls Sep 01 '13 at 14:00
  • [Webdriver.io](http://webdriver.io) also has a CSS regression plugin called [WebdriverCSS](https://github.com/webdriverjs/webdrivercss) for visual scraping and comparison – ChristianB Apr 07 '14 at 18:39
  • http://github.com/briankircho/browserjet – laggingreflex May 27 '14 at 15:21
  • Check this out, https://github.com/christian-bromann/awesome-selenium – Alan Dong Mar 07 '15 at 07:44
  • I'm working on https://testingbot.com which has all the latest and older browsers, you can use selenium webdriver and test on any browser you like – Jochen Sep 21 '15 at 10:36
  • There's also a list of solutions at https://github.com/dhamaniasad/HeadlessBrowsers – Sean Bannister Nov 06 '15 at 04:38
  • USE import.io . – djangofan Dec 19 '16 at 00:29
  • This one is really cool - https://github.com/graphcool/chromeless – Andrey E Aug 14 '17 at 20:44

3 Answers3

35

If Ruby is your thing, you may also try:

also, Nokogiri gem can be used for scraping:

there is a dedicated book about how to utilise nokogiri for scraping by packt publishing

Matt
  • 51,189
  • 6
  • 117
  • 122
rkj
  • 7,429
  • 2
  • 26
  • 33
11

http://triflejs.org/ is like phantomjs but based on IE

  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – Sathish Aug 27 '14 at 09:58
  • 5
    That sounds good normally, yet this question is itself a collection of resources. A link with a short description to be incorporated in the list fits the format and makes sense. – Federico Galassi Aug 28 '14 at 10:21
7

A kind of JS-based Selenium is Dalek.js. It not only aims for automated frontend-tests, you can also do screenshots with it. It has webdrivers for all important browsers. Unfortunately those webdrivers seem to be worth improving (just not to say "buggy" to Firefox).