0

How can I use python to dump all of the network requests and responses? What I'm looking to do would compare to the following (this example is in nodejs https://github.com/ariya/phantomjs/blob/master/examples/netlog.js)

I have been trying a tonne of different tools, including the following:

Example:

import requests
import logging

logging.basicConfig(level=logging.DEBUG)
r = requests.get('http://www.google.com')

Example:

import urllib2   

request = urllib2.Request('http://jigsaw.w3.org/HTTP/300/302.html')
response = urllib2.urlopen(request)
print "Response code was: %d" % response.getcode()

Example:

import urllib2

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
authhandler = urllib2.HTTPBasicAuthHandler(passman)
handler=urllib2.HTTPHandler(debuglevel=1)
opener = urllib2.build_opener(handler)
opener=urllib2.build_opener(authhandler, urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
response = urllib2.urlopen('http://groupon.com')
print response

...there are more.

An example of the type of information that I would like to capture is the following (I used fiddler2 to get this information. All of this and more came from visiting groupon.com):

#   Result  Protocol    Host    URL Body    Caching Content-Type    Process Comments    Custom  
6   200 HTTP    www.groupon.com /   23,236  private, max-age=0, no-cache, no-store, must-revalidate text/html; charset=utf-8    chrome:6080         
7   200 HTTP    www.groupon.com /homepage-assets/styles-6fca4e9f48.css  6,766   public, max-age=31369910    text/css; charset=UTF-8 chrome:6080         
8   200 HTTP    Tunnel to   img.grouponcdn.com:443  0           chrome:6080         
9   200 HTTP    img.grouponcdn.com  /deal/gsPCLbbqioFVfvjT3qbBZo/The-Omni-Mount-Washington-Resort_01-960x582/v1/c550x332.jpg    94,555  public, max-age=315279127; Expires: Fri, 18 Oct 2024 22:20:20 GMT   image/jpeg  chrome:6080         
10  200 HTTP    img.grouponcdn.com  /deal/d5YmjhxUBi2mgfCMoriV/pE-700x420/v1/c220x134.jpg   17,832  public, max-age=298601213; Expires: Mon, 08 Apr 2024 21:35:06 GMT   image/jpeg  chrome:6080         
11  200 HTTP    www.groupon.com /homepage-assets/main-fcfaf867e3.js 9,604   public, max-age=31369913    application/javascript  chrome:6080         
12  200 HTTP    www.groupon.com /homepage-assets/locale.js?locale=en_US&country=US  1,507   public, max-age=994 application/javascript  chrome:6080         
13  200 HTTP    www.groupon.com /tracky 3       application/octet-stream    chrome:6080         
14  200 HTTP    www.groupon.com /cart/widget?consumerId=b577c9c2-4f07-11e4-8305-0025906127fe    17  private, max-age=0, no-cache, no-store, must-revalidate application/json; charset=utf-8 chrome:6080         
15  200 HTTP    www.googletagmanager.com    /gtm.js?id=GTM-B76Z 39,061  private, max-age=911; Expires: Wed, 22 Oct 2014 20:48:14 GMT    text/javascript; charset=UTF-8  chrome:6080         
maudulus
  • 9,035
  • 7
  • 64
  • 101
  • Must the answer be specifically for `urllib2`, or are you open to using better libraries like `requests`? If so, you need to use requests history, as seen here: http://stackoverflow.com/questions/20475552/python-requests-library-redirect-new-url – VooDooNOFX Oct 22 '14 at 22:39
  • @VooDooNOFX isn't that only going to give me the history of a redirect? http://docs.python-requests.org/en/latest/user/quickstart/ – maudulus Oct 23 '14 at 13:45
  • Perhaps you can clarify what you're looking for. Do you want a history of all network requests from the machine entirely, or only requests made within your python script? If the later, requests.history provides all the details you're looking for to replicate the report yourself in python. – VooDooNOFX Oct 23 '14 at 22:19

1 Answers1

-1

This isn't exactly it, but it's close enough, and yes, it was urllib2:

from bs4 import BeautifulSoup
import requests
import re
import csv
import json
import time
import fileinput
import urllib2

data = urllib2.urlopen("http://stackoverflow.com").read()
soup = BeautifulSoup(data)

The .read() returns enough data to be scraped for all the http header's urls.

maudulus
  • 9,035
  • 7
  • 64
  • 101