0

I have tried to solve this issue but there is no way (for me) to resume next after raising the error in Python.

I am querying this site: https://w.wiki/msg I adjust the query by changing the City for each loop, the cities are inside the [listElements]. The code interrupts, when I have a city like "Awaradam". (You could basically hard code it instead of the listElement)

Trying to put a sleep timer inside was not solving the issue (I thought I am attempting to often a request).

The error is the following:

Traceback (most recent call last):
  File "C:/Users/xxx/PycharmProjects/pythonProject3/xxx.py", line 30, in <module>
    data = r.json()
  File "C:\ProgramData\Anaconda3\envs\pythonProject3\lib\site-packages\requests\models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\pythonProject3\lib\json\__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "C:\ProgramData\Anaconda3\envs\pythonProject3\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\ProgramData\Anaconda3\envs\pythonProject3\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code (I edited it, so it can be reproduced, as of now it makes no sense to have a code like that, after a certain time of loops it just interrupts):

 import requests
listPops = [[], []]
url = 'https://query.wikidata.org/sparql'
zaehler = -1
for i in range(100):
    zaehler = zaehler + 1
    #print(str(listElements[1][i]))
    #query = r"SELECT ?population WHERE { SERVICE wikibase:mwapi {bd:serviceParam mwapi:search '" + str(listElements[1][i]) + "' . bd:serviceParam mwapi:language 'en' . bd:serviceParam wikibase:api 'EntitySearch' . bd:serviceParam wikibase:endpoint 'www.wikidata.org' . bd:serviceParam wikibase:limit 1 . ?item wikibase:apiOutputItem mwapi:item .} ?item wdt:P1082 ?population} "
    query = """ SELECT ?population WHERE { SERVICE wikibase:mwapi {
          bd:serviceParam mwapi:search '""" + "Awaradam" + """'.    
          bd:serviceParam mwapi:language "en" . 
          bd:serviceParam wikibase:api "EntitySearch" .
          bd:serviceParam wikibase:endpoint "www.wikidata.org" .
          bd:serviceParam wikibase:limit 1 .
          ?item wikibase:apiOutputItem mwapi:item .
      }
      ?item wdt:P1082 ?population
    }
    """
    r = requests.get(url, params={'format': 'json', 'query': query}, timeout=10)
    #time.sleep(5)
    data = r.json()
    try:
        #population = r['results']['bindings'][0]['population']['value']
        if data['results']['bindings'][0]['population']['value']:
            population = data['results']['bindings'][0]['population']['value']
            print(str(zaehler) + ": " + "Population in " + str(listElements[1][i]) + ": " + f"{int(population):,}")
            listPops[0].append(str(listElements[1][i]))
            listPops[1].append(population)
    except:
        continue

print('Finished scrape.')
smartini
  • 194
  • 1
  • 14
  • Your main `print` is rather weird. If you are using f-strings anyway, why not use them everywhere? `print(f"{zaehler}: Population in {listElements[1][i]}: {population}")` – tripleee Nov 17 '20 at 08:40
  • Do you have reason to believe `listElements[1][i]` is not already a `str`? – tripleee Nov 17 '20 at 08:41
  • No, there is no reason, I just did it in before error. The "listElements" is not in the code anymore here on Stackoverflow, for the ease of simplicity. there is always a city name inside, which is the reason for the for loop – smartini Nov 17 '20 at 08:44
  • regarding the main print. What's the issue with that? after the code went through I paste all items from the listElements into an Excel file. – smartini Nov 17 '20 at 08:45
  • Just that you have unnecessary complications there. Stylistically, either `print(str(zaehler) + ": Population in " + listElements[1][i] + ": " + str(population))`or use an f-string, but mixing these styles just brings the worst of both worlds (hard to read *and* complex). – tripleee Nov 17 '20 at 08:49

2 Answers2

1

The traceback means that the result you got back is not JSON. You can't make the remote server send JSON if it doesn't want to, but you can skip this item (or try a different query, if you can figure out one which will work) when that happens.

try:
    data = r.json()
except json.decoder.JSONDecodeError as err:
    logging.warning('Not JSON: %s (result %r)', err, r.text)
    continue

You will have to import logging (or just print the warning instead) and import json if you don't already do that.

Your blanket try / except would also work (just move the try up above the failing line), but it's really bad form. See Why is "except: pass" a bad programming practice?. In practice, it is shielding the fact that there are no results for Awaradam in Wikidata, and you are running a fruitless loop trying to fetch them again and again.

Here is a quick and dirty fix:

import requests
import time
import json

listPops = [[], []]
listElements = [[], ['Bangalore', 'Hyderabad', 'Awaradam', 'Rawalpindi']]
url = 'https://query.wikidata.org/sparql'

for i, city in enumerate(listElements[1]):
    query = """ SELECT ?population WHERE { SERVICE wikibase:mwapi {
          bd:serviceParam mwapi:search '""" + city + """'.    
          bd:serviceParam mwapi:language "en" . 
          bd:serviceParam wikibase:api "EntitySearch" .
          bd:serviceParam wikibase:endpoint "www.wikidata.org" .
          bd:serviceParam wikibase:limit 1 .
          ?item wikibase:apiOutputItem mwapi:item .
      }
      ?item wdt:P1082 ?population
    }
    """
    r = requests.get(url, params={'format': 'json', 'query': query}, timeout=10)
    time.sleep(5)
    try:
        data = r.json()
    except json.decoder.JSONDecodeError as err:
        print('Not JSON: %s (result %r)' % (err, r.text))
    assert 'results' in data
    assert 'bindings' in data['results']
    if not data['results']['bindings']:
        #logging.warning('No results for %s', city)
        print('No results for', city)
        continue
    assert data['results']['bindings'], 'type %s %r' % (type(data['results']['bindings']), data['results']['bindings'])
    assert 'population' in data['results']['bindings'][0]
    assert 'value' in data['results']['bindings'][0]['population']
    if data['results']['bindings'][0]['population']['value']:
        population = data['results']['bindings'][0]['population']['value']
        print(f"{i}: Population in {city}: {int(population):,}")
        listPops[0].append(str(listElements[1][i]))
        listPops[1].append(population)
tripleee
  • 139,311
  • 24
  • 207
  • 268
  • thanks, but it does not work when I use the code snippet from you. It just breaks when entering the logging part. When removing the try the except should also be removed right? that does not work either – smartini Nov 17 '20 at 08:46
  • Like it says right after the code snippet, you have to `import logging` or change it into a `print`. – tripleee Nov 17 '20 at 08:50
  • sorry, I am a newbie to Python. I typed "import logging" at the very top of my code now. what do you mean by "change it into a print"? – smartini Nov 17 '20 at 08:52
  • 1
    If you don't want to `import logging` you can replace `logging.warning(x, y, z)` with `print(x % (y, z))`. The advantages of `logging` is that it doesn't go to standard output (so if your script prints something useful, it doesn't get mixed up with random status messages) and you can tweak the logging level to simply turn off warnings (but still see errors, for example). – tripleee Nov 17 '20 at 08:53
  • ok, i want to import logging, and I alos did. but it does not work. I get "IndexError: list index out of range" when the IF-Statement is running – smartini Nov 17 '20 at 08:56
  • It works for me, in the sense that I get an error message in `r.text` which tells me why I am not getting a valid JSON result (viz *Error: 429, Too many requests. Please comply with the User-Agent policy to get a higher rate limit: https://meta.wikimedia.org/wiki/User-Agent_policy*) – tripleee Nov 17 '20 at 08:57
  • 1
    I haven't attempted to troubleshoot your indexing; you asked about the JSON error and I answered that. Probably (accept this answer, or post one of your own and accept that, and) ask a new question about the indexing problem; try to reduce it to a [mre] so that the answers you get will solve your problem completely. – tripleee Nov 17 '20 at 09:03
  • ah, OK. so it was actually related to the amount of requests. nevertheless I cannot reproduce your solution :( – smartini Nov 17 '20 at 09:03
  • Because you didn't take out the blanket `except`, and it's looping forever with hidden errors? – tripleee Nov 17 '20 at 09:09
  • The immediate problem seems to be that `data['results']['bindings']` is an empty list. It's also not clear why you are looping over `i` or what indexing on that is supposed to accomplish. – tripleee Nov 17 '20 at 09:24
  • I have a list (listElements) with 9000 entries. i loop through each element of that list and take that item and put it into the query text, so that the query gets dynamic. Instead of "Awaradam" I fill in the item of the list. I need to query 9000 times. – smartini Nov 17 '20 at 09:26
  • See updated answer now. Unclear which Awaradam you mean anyway; there is a place named Awarradam in Suriname but it doesn't have a WIkipedia article either. – tripleee Nov 17 '20 at 09:54
0

As @tripleee have mentioned, the problem is that you query does not return a valid JSON (and return HTML message instead). The server should inform you on the status of your query. To handle it you should check the status of the request:

r = requests.get(url, params={'format': 'json', 'query': query}, timeout=10)
if r.status_code != 200:
  handle_your_error(r)

For example, after running your example I've got HTTP error 429: Too many requests.

igrinis
  • 9,040
  • 10
  • 29