Questions tagged [splash-js-render]

Splash JS is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python using Twisted and QT. It's Selenium's competitor.

https://splash.readthedocs.io/en/stable/

Splash - A javascript rendering service

Splash is a javascript rendering service. It’s a lightweight web browser with an HTTP API, implemented in Python using Twisted and QT. The (twisted) QT reactor is used to make the sever fully asynchronous allowing to take advantage of webkit concurrency via QT main loop. Some of Splash features:

process multiple webpages in parallel;
get HTML results and/or take screenshots;
turn OFF images or use Adblock Plus rules to make rendering faster;
execute custom JavaScript in page context;
write Lua browsing scripts;
develop Splash Lua scripts in Splash-Jupyter Notebooks.
get detailed rendering info in HAR format.

134 questions

votes

3 answers

Scrapy Shell and Scrapy Splash

We've been using scrapy-splash middleware to pass the scraped HTML source through the Splash javascript engine running inside a docker container. If we want to use Splash in the spider, we configure several required project settings and yield a…

asked Feb 11 '16 at 23:56

alecxe

414,977
106
935
1,083

votes

3 answers

Adding a wait-for-element while performing a SplashRequest in python Scrapy

I am trying to scrape a few dynamic websites using Splash for Scrapy in python. However, I see that Splash fails to wait for the complete page to load in certain cases. A brute force way to tackle this problem was to add a large wait time (eg. 5…

python scrapy wait scrapy-splash splash-js-render

asked Dec 10 '16 at 11:58

NightFury13

votes

1 answer

How to set splash timeout in scrapy-splash?

I use scrapy-splash to crawl web page, and run splash service on docker. commond: docker run -p 8050:8050 scrapinghub/splash --max-timeout 3600 But I got a 504 error. "error": {"info": {"timeout": 30}, "description": "Timeout exceeded rendering…

python scrapy scrapy-splash splash-js-render

asked Jun 19 '17 at 10:08

Jhon Smith

votes

3 answers

Scrapy CrawlSpider + Splash: how to follow links through linkextractor?

I have the following code that is partially working, class ThreadSpider(CrawlSpider): name = 'thread' allowed_domains = ['bbs.example.com'] start_urls = ['http://bbs.example.com/diy'] rules = ( Rule(LinkExtractor( …

python scrapy web-crawler scrapy-splash splash-js-render

asked Aug 25 '17 at 16:45

eN_Joy

votes

2 answers

how does scrapy-splash handle infinite scrolling?

I want to reverse engineering the contents generated by scrolling down in the webpage. The problem is in the url https://www.crowdfunder.com/user/following_page/80159?user_id=80159&limit=0&per_page=20&screwrand=933. screwrand doesn't seem to follow…

scrapy scrapy-splash splash-js-render

asked Oct 30 '16 at 02:56

Bowen Liu

votes

0 answers

Splash containers stops working after 30 minutes

I have some issue with Aquarium and splash. They stop working after 30 minutes after the start. A number of pages for loading are 50K-80K. I made cron job for automatically rebooting every 10 minutes, each Splash container, but it didn't work How…

docker haproxy splash-js-render

asked Mar 01 '18 at 05:56

amarynets

1,666
7
20

votes

2 answers

Using docker, scrapy splash on Heroku

I have a scrapy spider that uses splash which runs on Docker localhost:8050 to render javascript before scraping. I am trying to run this on heroku but have no idea how to configure heroku to start docker to run splash before running my web: scrapy…

docker heroku scrapy splash-js-render

asked Sep 05 '17 at 02:06

HearthQiu

votes

2 answers

How to install python-gtk2, python-webkit and python-jswebkit on OSX

I've read through many of the related questions but am still unclear how to do this as there are many software combinations available and many solutions seem outdated. What is the best way to install the following on my virtual environment on…

python scrapy gtk webkit splash-js-render

asked Nov 12 '13 at 02:51

jyek

1,051
9
18

votes

1 answer

scrapy, splash, lua, button click

I am new to all instruments here. My goal is to extract all URLs from a lot of pages which are connected moreless by a "Weiter"/"next" button - that for several URLS. I decided to try that with scrapy. The page is dynamically generated. Then I…

python lua scrapy scrapy-splash splash-js-render

asked Nov 05 '17 at 10:12

P. Guyan

votes

0 answers

Docker Scrapinghub/splash exited with 139

I'm using Scrapy to do some crawling with Splash using the Scrapinghub/splash docker container however the container exit after a while by itself with exit code 139, I'm running the scraper on an AWS EC2 instance with 1GB swap assigned. i also tried…

docker amazon-ec2 scrapy web-crawler splash-js-render

asked Aug 16 '17 at 19:59

MtziSam

votes

1 answer

scrapy-splash returns its own headers and not the original headers from the site

I use scrapy-splash to build my spider. Now what I need is to maintain the session, so I use the scrapy.downloadermiddlewares.cookies.CookiesMiddleware and it handles the set-cookie header. I know it handles the set-cookie header because i set…

python scrapy scrapy-splash splash-js-render

asked Sep 25 '16 at 12:57

Roman Smelyansky

votes

1 answer

Splash lua script to do multiple clicks and visits

I'm trying to crawl Google Scholar search results and get all the BiBTeX format of each result matching the search. Right now I have a Scrapy crawler with Splash. I have a lua script which will click the "Cite" link and load up the modal window…

python scrapy scrapy-splash splash-js-render

asked Jun 26 '16 at 22:11

Syafiq Kamarul Azman

votes

2 answers

Google App Engine: Load another Docker Image for Scrapy + Splash

I'd like to scrape a javascript website using Scrapy + Splash in Google App Engine. The Splash plugin is a Docker image. Is there any way to use this within Google App Engine? App Engine itself uses a Docker image, but I'm not sure how to load and…

docker google-app-engine scrapy scrapy-splash splash-js-render

asked Nov 13 '19 at 15:28

bgolson

3,330
4
22
41

votes

1 answer

Scrapy does not fetch markup on response.css

I've built a simple scrapy spider running on scrapinghub: class ExtractionSpider(scrapy.Spider): name = "extraction" allowed_domains = ['domain'] start_urls = ['http://somedomainstart'] user_agent = "Mozilla/5.0 (Windows NT 10.0;…

python web-scraping scrapy scrapinghub splash-js-render

asked Aug 27 '19 at 15:37

qubits

votes

0 answers

FileNotFoundError: [Errno 2] after pushing splash to heroku

I'm trying to deploy the latest scrapinghub/splash I am using git-bash on win10. I forked the repo to https://github.com/kc1/splash/blob/master and I have been trying to follow Using docker, scrapy splash on Heroku to modify the docker file After…

linux docker heroku splash-js-render

asked May 31 '19 at 14:05

user1592380

26,587
62
220
414

2 3

…

8 9 Next