5

With splinter and Python, I have two threads running, each visiting the same main URL but different routes, e.g. thread one hits: mainurl.com/threadone and thread two hits: mainurl.com/threadtwo using:

from splinter import Browser
browser = Browser('chrome')

But came across the following error:

Traceback (most recent call last):
  File "multi_thread_practice.py", line 299, in <module>
    main()
  File "multi_thread_practice.py", line 290, in main
    first_method(r)
  File "multi_thread_practice.py", line 195, in parser
    second_method(title, name)
  File "multi_thread_practice.py", line 208, in confirm_product
    third_method(current_url)
  File "multi_thread_practice.py", line 214, in buy_product
    browser.visit(url)
  File "/Users/joshua/anaconda/lib/python2.7/site-packages/splinter/driver/webdriver/__init__.py", line 184, in visit
    self.driver.get(url)
  File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 261, in get
    self.execute(Command.GET, {'url': url})
  File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 247, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 464, in execute
    return self._request(command_info[0], url, body=data)
  File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 488, in _request
    resp = self._conn.getresponse()
  File "/Users/joshua/anaconda/lib/python2.7/httplib.py", line 1108, in getresponse
    raise ResponseNotReady()
httplib.ResponseNotReady

What is the error and how should I go about handling the issue?

Thank you in advance and will be sure to upvote/accept answer

CODE ADDED

import time
from splinter import Browser
import threading

browser = Browser('chrome')

start_time = time.time()

urlOne = 'http://www.practiceurl.com/one'
urlTwo = 'http://www.practiceurl.com/two'
baseUrl = 'http://practiceurl.com'

browser.visit(baseUrl)

def secondThread(url):
    print 'STARTING 2ND REQUEST: ' + str(time.time() - start_time)
    browser.visit(url)
    print 'END 2ND REQUEST: ' + str(time.time() - start_time)


def mainThread(url):
    print 'STARTING 1ST REQUEST: ' + str(time.time() - start_time)
    browser.visit(url)
    print 'END 1ST REQUEST: ' + str(time.time() - start_time)


def main():
    threadObj = threading.Thread(target=secondThread, args=[urlTwo])
    threadObj.daemon = True

    threadObj.start()

    mainThread(urlOne)

main()
Jo Ko
  • 5,515
  • 12
  • 47
  • 101
  • httplib.ResponseNotReady is usually to do with reusing responses. I can't tell if you are, cause there's no code, but I assume that's what's going wrong. – Generic Snake Apr 27 '17 at 19:52
  • It would help to provide a [MVC](https://stackoverflow.com/help/mcve) – Adonis Apr 27 '17 at 21:50
  • @GenericSnake Apologies. Just added the code in the original post. Please take a look. – Jo Ko Apr 28 '17 at 00:38
  • @asettouf Added the code in the original post. Appreciate it in advance! – Jo Ko Apr 28 '17 at 00:38
  • I imagine it is because you are using the same browser object for two separate threads at the same time, and it can't deal with it. I've never used splinter, but I'll have a look into it. – Generic Snake Apr 28 '17 at 00:50
  • Just so I know, why do you want to open these with threads, is there no reason why you can't just open them at separate times? – Generic Snake Apr 28 '17 at 00:51

2 Answers2

2

As far as I can tell, what you're trying to do isn't possible on one browser. Splinter is acting on an actual browser, and as such, passing in many commands at the same time causes issues. It acts just as a human would interact with a browser (automated of course). It is possible to open many browser windows, but you cannot send requests in a different thread without receiving the response from the previous request. That causes a CannotSendRequest error. So, what I recommend (if you need to use threads) is open two browsers, and then use threads to send a request through each of them. Otherwise, it can't be done.

This thread is on selenium, but the information is transferrable. Selenium multiple tabs at once Again,this says what you want (I assume) to do is impossible. And the green ticked answer giver makes the same recommendation I do.

Hope that doesn't put you off track too much, and helps you out.

EDIT: Just to show:

import time
from splinter import Browser
import threading

browser = Browser('firefox')
browser2 = Browser('firefox')

start_time = time.time()

urlOne = 'http://www.practiceurl.com/one'
urlTwo = 'http://www.practiceurl.com/two'
baseUrl = 'http://practiceurl.com'

browser.visit(baseUrl)


def secondThread(url):
    print 'STARTING 2ND REQUEST: ' + str(time.time() - start_time)
    browser2.visit(url)
    print 'END 2ND REQUEST: ' + str(time.time() - start_time)


def mainThread(url):
    print 'STARTING 1ST REQUEST: ' + str(time.time() - start_time)
    browser.visit(url)
    print 'END 1ST REQUEST: ' + str(time.time() - start_time)


def main():
    threadObj = threading.Thread(target=secondThread, args=[urlTwo])
    threadObj.daemon = True

    threadObj.start()

    mainThread(urlOne)

main()

Note that I used firefox cause I've not got chromedriver installed.

It might be a good idea to set a wait after the browsers open, just to make sure they're fully ready, before the timers begin.

Community
  • 1
  • 1
Generic Snake
  • 555
  • 1
  • 5
  • 12
  • Appreciate your input! I did in fact and gave that a try, but seem like two windows are not connected, meaning, acting on one window has no relation to the other window opened. So I was thinking doing having the second thread opening up a new tab but on the same window. Would that be a possibility? – Jo Ko Apr 28 '17 at 18:20
  • You should be able to open new tabs on one browser window yes, and then open different urls in them. But keep in mind that it can't be at the exact same time you open the urls. You have to wait for one tab to receive a response for its request, then send the request through the second tab. It sort of makes threading a bit pointless. I think @asettouf is more knowledgeable on the subject that me though, so he might show more with his example, that helps you out. – Generic Snake Apr 28 '17 at 19:23
  • Appreciate the insight regardless. How can I open a new tab with `splinter` though? – Jo Ko Apr 28 '17 at 21:23
  • As far as i can tell it's not built in for tabs, but windows (like popups etc). A work around though can be what's found here. http://stackoverflow.com/questions/27026317/how-to-open-two-tabs-in-python-splinter This uses key entry from selenium to open new tabs in the browser. You'll of course need selenium installed to do this. – Generic Snake Apr 29 '17 at 05:49
1

@GenericSnake is correct on the issue. To add a little bit to it, I would highly suggest you refactor your code to use the multiprocessing library, mainly because the threading implementation uses the GIL:

In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

Actually a nice thing using multiprocessing is that you can refactor your code to avoid the duplicate method secondThread and mainThread, for example this way (a last thing, don't forget to clean up the resources you use, like browser.quit() to close the browser once you are done):

import time
from splinter import Browser
from multiprocessing import Process
import os

os.environ['PATH'] = os.environ[
                         'PATH'] + "path/to/geckodriver" + "path/to/firefox/binary"

start_time = time.time()

urlOne = 'http://pythoncarsecurity.com/Support/FAQ.aspx'
urlTwo = 'http://pythoncarsecurity.com/Products/'



def url_visitor(url):
    print("url called: " + url)
    browser = Browser('firefox')
    print('STARTING  REQUEST TO: ' + url + " at "+ str(time.time() - start_time))
    browser.visit(url)
    print('END REQUEST TO: ' + url + " at "+ str(time.time() - start_time))   

def main():
    p1 = Process(target=url_visitor, args=[urlTwo])
    p2 = Process(target=url_visitor, args=[urlOne])
    p1.start()
    p2.start()
    p1.join() #join processes to the main process to see the output
    p2.join()

if __name__=="__main__":
    main()

That gives us the following output (timing will depend on your system though):

url called: http://pythoncarsecurity.com/Support/FAQ.aspx
url called: http://pythoncarsecurity.com/Products/
STARTING  REQUEST TO: http://pythoncarsecurity.com/Support/FAQ.aspx at 10.763000011444092
STARTING  REQUEST TO: http://pythoncarsecurity.com/Products/ at 11.764999866485596
END REQUEST TO: http://pythoncarsecurity.com/Support/FAQ.aspx at 16.20199990272522
END REQUEST TO: http://pythoncarsecurity.com/Products/ at 16.625999927520752

Edit: The problem with multi threading and Selenium is that a browser instance is not thread safe, the only way I found to circumvent this issue is to acquire a lock on the url_visitor, however, in this case, you lose the advantage of multi threading. That's why I believe that using multiple browser is much more beneficial (although I guess you have some very specific requirements), see the code below:

import time
from splinter import Browser
import threading
from threading import Lock
import os

os.environ['PATH'] = os.environ[
                         'PATH'] + "/path/to/chromedriver"

start_time = time.time()

urlOne = 'http://pythoncarsecurity.com/Support/FAQ.aspx'
urlTwo = 'http://pythoncarsecurity.com/Products/'
browser = Browser('chrome')
lock = threading.Lock()#create a lock for the url_visitor method

def init():
    browser.visit("https://www.google.fr")
    driver = browser.driver
    driver.execute_script("window.open('{0}', '_blank');") #create a new tab
    tabs = driver.window_handles


def url_visitor(url, tabs):
    with lock:
        if tabs != 0:
            browser.driver.switch_to_window(browser.driver.window_handles[tabs])
        print("url called: " + url)
        print('STARTING  REQUEST TO: ' + url + " at "+ str(time.time() - start_time))
        browser.visit(url)
        print('END REQUEST TO: ' + url + " at "+ str(time.time() - start_time))
        browser.quit()


def main():
    p1 = threading.Thread(target=url_visitor, args=[urlTwo, 0])
    p2 = threading.Thread(target=url_visitor, args=[urlOne, 1])
    p1.start()
    p2.start()

if __name__=="__main__":
    init() #create a browser with two tabs
    main()
Adonis
  • 3,984
  • 3
  • 31
  • 47
  • Thank you for the suggestion! I would like both threads/processes to be acting on the same window, to have relations in the actions they perform. With two windows, they have no relations. Would having one process/thread opening up a new tab but on the same window as the main process/thread be a possibility? Thank you in advance! – Jo Ko Apr 28 '17 at 18:22
  • @JoKo Unlikely when you have two processes, they don't share the same memory, as far as I know for using a single window you will be stuck with multithreading. I will come back later with an example. – Adonis Apr 28 '17 at 18:39
  • Got it. Looking forward to it. Thanks asettouf! – Jo Ko Apr 28 '17 at 21:22
  • Just checking in to see if you had the chance. Thanks in advance! – Jo Ko Apr 30 '17 at 03:53
  • @JoKo Edited my answer, though except if it is really necessary I would definitely suggest against using it – Adonis Apr 30 '17 at 11:06
  • Appreciate the response! Would like two threads working on a same window, or of that concept, what do you suggest? Opening two windows as mentioned prior, does not do the job. – Jo Ko May 01 '17 at 05:06
  • @JoKo As you can see above you need to synchronize your code to use multiple tabs, as whenever you send a request to the webdriver, the command will be executed in the active tab, see https://stackoverflow.com/questions/30808606/can-selenium-use-multi-threading-in-one-browser That's why the general recommendation is to use multiple instances to achieve parallelism using webdriver. Now nothing forbids you to use multiple threads as long as you use locks when issuing commands to your webdriver instance. – Adonis May 01 '17 at 10:39