0

I am trying to send some json requests for scraping an infinite scroll box like this link. Its json link is:

http://www.marketwatch.com/news/headline/getheadlines?ticker=XOM&countryCode=US&dateTime=12%3A00+a.m.+Nov.+8%2C+2016&docId=&docType=2007&sequence=6e09aca3-7207-446e-bb8a-db1a4ea6545c&messageNumber=1826&count=10&channelName=%2Fnews%2Fpressrelease%2Fcompany%2Fus%2Fxom&topic=&_=1479366266513

Some of the parameters are not neccesary and I created a dictionary of effective parameters. For example,the parameter Count is the number of items that are shown in each scrolling. My code is :

import json
import requests

parameters = {'countryCode':'US','dateTime':'', 'docId':'','sequence':'6e09aca3-7207-446e-bb8a-db1a4ea6545c', 
         'messageNumber':'1826','count':'10','channelName':'', 'topic':'_:1479366266513' }
data = json.dumps(parameters)
firstUrl = "http://www.marketwatch.com/investing/stock/xom"
html = requests.post(firstUrl, params = data).text 

My problem is that I cannot send the requests according to the parameters, when I remove all parameters, I get the same page (firstUrl link) as if I include all of them. Do you have any idea why it happens and how I can fix this problem?

mk_sch
  • 912
  • 1
  • 11
  • 26
  • I guess, content that you want to scrap couldn't be received via single request (even if you specify `count:1000`) as each time you make another scroll, your browser send new `XHR` request for another (10 entries) piece of data. – Andersson Nov 17 '16 at 16:42
  • Thank you Anderson, my problem is that even without defining any parameter, I get the same result which is the main page and not the container that I am interested in(there are 3 different infinite scroll boxes and I am interested in one of them ), I am giving the parameters of that specific element but it couldn't detect it . – mk_sch Nov 18 '16 at 07:36

2 Answers2

1

I think the firstUrl you are using is not correct. Moreover you should use requests.get instead of post. You should send the same parameters as in your link.

import json
import requests

parameters = {'ticker':'XOM', 'countryCode':'US','dateTime':'', 'docId':'','sequence':'6e09aca3-7207-446e-bb8a-db1a4ea6545c', 
         'messageNumber':'1826','count':'10','channelName':'', 'topic':'_:1479366266513' }
firstUrl = "http://www.marketwatch.com/news/headline/getheadlines"
html = requests.get(firstUrl, params = parameters)
print(json.loads(html.text)) # array of size 10
vtni
  • 860
  • 3
  • 11
  • 42
0

params expects a Python dictionary, not a string, so you should directly pass parameters:

parameters = {'countryCode':'US','dateTime':'', 'docId':'','sequence':'6e09aca3-7207-446e-bb8a-db1a4ea6545c', 
         'messageNumber':'1826','count':'10','channelName':'', 'topic':'_:1479366266513' }

html = requests.post(firstUrl, parameters).text

Also, make sure that you should actually be using post and not get.

DeepSpace
  • 65,330
  • 8
  • 79
  • 117
  • @DeepSpapce, thank you so much , I directly inserted parameters but it didn't change the result. when I change count from 10 to 100, I expect to get 100 items, but it's still the same. – mk_sch Nov 17 '16 at 09:09