3

I'm building a python script to check the price of an amazon item every 5-10 seconds. Problem is, the script stops 'working' after a few minutes. There is no output to the console but it shows up as 'running' in my processes.

I'm using requests sessions for making http requests and time to display the time of request.

My code is as follows;

target_price = raw_input('Enter target price: ')
url = raw_input('Enter the product url: ')
while True:
    delay=randint(5,10)

    print datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S')+': '+'Sleeping for ' + str(delay) + ' seconds'
    time.sleep(delay)
    try:
        with requests.Session() as s:
            page = s.get(url,headers=headers,proxies=proxyDict,verify=False,timeout=5)
            tree = html.fromstring(page.content)
            price = tree.xpath('//div[@class="a-row a-spacing-mini olpOffer"]/div[@class="a-column a-span2 olpPriceColumn"]/span[@class="a-size-large a-color-price olpOfferPrice a-text-bold"]/text()')[0]
            new_price = re.findall("[-+]?\d+[\.]?\d+[eE]?[-+]?\d*", price)[0]
            old_price = new_price
            print new_price
            if float(new_price)<float(target_price):
                print 'Lower price found!'
                mydriver = webdriver.Chrome()
                send_simple_message()
                login(mydriver)
                print 'Old Price: ' + old_price
                print 'New Price: ' + new_price
            else:
                print 'Trying again'
    except Exception as e:
        print e
        print 'Error!'

EDIT: I've removed the wait() function and used time.sleep instead.

EDIT2: When I use Keyboard interrupt to stop the script, here's the output

    Traceback (most recent call last):
  File "checker.py", line 85, in <module>
    page = s.get(url,headers=headers,proxies=proxyDict,verify=False,timeout=5)
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 488, in get
return self.request('GET', url, **kwargs)
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 596, in send
    r = adapter.send(request, **kwargs)
  File "C:\Python27\lib\site-packages\requests\adapters.py", line 423, in send
timeout=timeout
  File "C:\Python27\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 589, in urlopen
self._prepare_proxy(conn)
  File "C:\Python27\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 797, in _prepare_proxy
conn.connect()
  File "C:\Python27\lib\site-packages\requests\packages\urllib3\connection.py",
line 267, in connect
self._tunnel()
  File "C:\Python27\lib\httplib.py", line 729, in _tunnel
line = response.fp.readline()
KeyboardInterrupt

Is it requests that is running into an infinite loop?

  • What is an active loop? How can I change that? I originally used time.sleep() but changed that to a custom function as I thought time.sleep() might be causing the program to stop working. – Shikhar Gupta Oct 05 '16 at 13:36
  • @ShikharGupta Active loop means a loop that doesn't do anything like the `while` loop you have with `pass` in it. – afsafzal Oct 05 '16 at 13:41
  • Do you know at which line of your code it is stuck? What is the last printed output you see? – afsafzal Oct 05 '16 at 13:43
  • I've originally tried time.sleep() instead of wait() but that still lead to random freezes in about 30 minutes. The script shows up as running and uses about 30MB CPU in the task managaer even after it has stopped working. – Shikhar Gupta Oct 05 '16 at 13:44
  • @afsafzal The last output I see is print datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S')+': '+'Sleeping for ' + str(delay) + ' seconds' – Shikhar Gupta Oct 05 '16 at 13:45
  • I think you've got a typo there....tiem.sleep instead of TIME.sleep – danidee Oct 05 '16 at 13:46
  • It's possible that Amazon detects that you're probably running a bot and locks you out for a bit. I'm not sure about that, though. – apnorton Oct 05 '16 at 13:46
  • @danidee Fixed that. That type was just on stackoverflow, wrote it down after seeing the comments. That's not what's causing it to freeze. – Shikhar Gupta Oct 05 '16 at 13:47
  • Can you add one more prints before and after `s.get(...)`? It is possible that Amazon is blocking you. – afsafzal Oct 05 '16 at 13:47
  • @apnorton I'm using different headers and proxies for each request. When amazon blocks the bot, I see a different error, it doesn't freeze because of that either. – Shikhar Gupta Oct 05 '16 at 13:48
  • @afsafzal If amazon blocked me, wouldn't requests timeout? Or wouldn't lxml not find what I'm looking for and spit out an error? – Shikhar Gupta Oct 05 '16 at 13:48
  • My suspicion is that the `re.findall()` is taking too long or getting into an infinite loop. You should set a timeout for that as well. – afsafzal Oct 05 '16 at 13:51
  • 1
    what is interesting in those situations is to check CPU usage of the python program: if it's 0 then you're blocked by some external resource. If it's 50% or 12% or whatever it means: active infinite loop. – Jean-François Fabre Oct 05 '16 at 13:52
  • @Jean-FrançoisFabre I'm running 3 instances of the script and each instance is taking up 33%, so almost 100% is being consumed by the scripts when they're frozen. I'm guessing that's an active infinite loop? I've changed wait() to time.sleep as you suggested but the problem still remains. – Shikhar Gupta Oct 05 '16 at 13:57
  • @afsafzal Could you please point me in the right direction on how to add timeout to re.findall? – Shikhar Gupta Oct 05 '16 at 13:57
  • @ShikharGupta if I were you, I would have made sure that's the problem by putting extra prints at every line to see where it's stuck. But if that was the problem http://stackoverflow.com/questions/11901328/how-to-timeout-function-in-python-timeout-less-than-a-second could help you. – afsafzal Oct 05 '16 at 14:02
  • @afsafzal Keyboard interrupt shows it is stuck on requests. Please see the edit to the question. Adding print to each line and testing the script again. – Shikhar Gupta Oct 05 '16 at 14:04

1 Answers1

1

The timeout argument to the s.get() function is tricky. Here I found a good explanation for its unusual behavior. The timeout will stop the process if the requested url does not respond but it wouldn't stop it if it respond infinitely.

In your case, the connection is established nut the requested page is just sending responses in an infinite loop.

You can set a timeout for the whole function call: Timeout function if it takes too long to finish

Community
  • 1
  • 1
afsafzal
  • 552
  • 4
  • 14