0

I am using Indeeds API to scrape job listings. Their API only allows 25 results per call so that's why I have to iterate through the range.

I need to know the number of results returned (for the range), to use as my numresults variable. Right now I am just doing the same search in my browser and manually inputting the result.

I want iterate through multiple countries or search terms so I need to pass in the value "totalResults" to numresults which is found in the JSON.

The problem is I don't understand how to extract this value.

Can I do this right after the call (where would the json be stored) or do I need to create the JSON file first?

Here is my working scraper:

import requests
api_url = 'http://api.indeed.com/ads/apisearch? publisher=XXXXXXXXXXX&v=2&limit=100000&format=json'
Country = 'au'
SearchTerm = 'Insight'
number = -25
numresults = 3925
# must match the actual number of job results to the lower of the 25    increment or the last page will repeat over and over
#so if there are 392 results, then put 375

for number in range(-25, numresults, 25):
    url = api_url + '&co=' + Country + '&q=' + SearchTerm + '&start=' + str(number + 25)
    response = requests.get(url)
    f = open(SearchTerm + '_' + Country +'.json','a')
    f.write (response.content)
    f.close()
    print 'Complete' , url

Here is a sample of the returned JSON:

{
    "version" : 2,
    "query" : "Pricing",
    "location" : "",

    "dupefilter" : true,

    "highlight" : true,

    "start" : 1,
    "end" : 25,
    "totalResults" : 1712,

    "pageNumber" : 0,


    "results" : [

                {
                    "jobtitle" : "New Energy Technical Specialist",
                    "company" : "Rheem",
                     etc.
martineau
  • 99,260
  • 22
  • 139
  • 249
ThomasRones
  • 415
  • 5
  • 18

1 Answers1

0

Why not use the python json module ?

import json
# inside the loop, after the request.
json_content = json.loads(r.content)
print(json_content["version"]) # should display 2

Be careful, check before is the content returned by the request is really in json format. The doc is here: https://docs.python.org/2/library/json.html

SnoozeTime
  • 333
  • 1
  • 9
  • Thanks, I'd been trying to use that module, but for some reason couldn't get it right until I saw your example. – ThomasRones May 26 '16 at 02:53
  • No problem, be also careful about the difference between loads and load (the final s). And don't forget to mark the question solved if it is answered =) – SnoozeTime May 26 '16 at 04:23