4

If I use wget, and get an html file from a URL. How can I generate a HAR file from that HTML file.

  1. Any open source implementations for generating har files from html files?
  2. Once the HAR file is generated I can read data from the HAR file using harlib.

If possible please suggest C, C++ or Java implementations.

cytinus
  • 4,664
  • 8
  • 30
  • 44
Vivek Sharma
  • 3,712
  • 6
  • 34
  • 48

1 Answers1

5

The primary point of the HAR format is to have a standard HTTP tracing format that many tools can use and analyze. In other words, it's original intent was and primarily is, for performance analysis, not "archiving" webpages per se.

If you fetch a page with wget, you're missing 99% of all the performance data. To capture the necessary data you really need a browser to execute the requests, fetch all the associated resources, save all the timers, etc. This will enable you to build the waterfall charts, etc.

If you need to capture this data on the server, then you can use pcap to capture the TCP trace and then convert that to HAR, although you still need a client which will actually parse the HTML and request all the sub-resources (pcap is just listening in the background). Alternatively, you can route your browser through a proxy and let it spit out a HAR file for you.

Last but not least, you can just drive the browser through its debug interface and export the HAR file that way. Java example for driving Firefox: https://github.com/Filirom1/browsermob-page-perf

igrigorik
  • 8,543
  • 2
  • 27
  • 29