14

I am using requests to download files, but for large files I need to check the size of the file on disk every time because I can't display the progress in percentage and I would also like to know the download speed. How can I go about doing it ? Here's my code :

import requests
import sys
import time
import os

def downloadFile(url, directory) :
  localFilename = url.split('/')[-1]
  r = requests.get(url, stream=True)

  start = time.clock()
  f = open(directory + '/' + localFilename, 'wb')
  for chunk in r.iter_content(chunk_size = 512 * 1024) :
        if chunk :
              f.write(chunk)
              f.flush()
              os.fsync(f.fileno())
  f.close()
  return (time.clock() - start)

def main() :
  if len(sys.argv) > 1 :
        url = sys.argv[1]
  else :
        url = raw_input("Enter the URL : ")
  directory = raw_input("Where would you want to save the file ?")

  time_elapsed = downloadFile(url, directory)
  print "Download complete..."
  print "Time Elapsed: " + time_elapsed


if __name__ == "__main__" :
  main()

I think one way to do it would be to read the file every time in the for loop and calculate the percentage of progress based on the header Content-Length. But that would be again an issue for large files(around 500MB). Is there any other way to do it?

Mayank Kumar
  • 971
  • 3
  • 11
  • 18

2 Answers2

26

see here: Python progress bar and downloads

i think the code would be something like this, it should show the average speed since start as bytes per second:

import requests
import sys
import time

def downloadFile(url, directory) :
  localFilename = url.split('/')[-1]
  with open(directory + '/' + localFilename, 'wb') as f:
    start = time.clock()
    r = requests.get(url, stream=True)
    total_length = r.headers.get('content-length')
    dl = 0
    if total_length is None: # no content length header
      f.write(r.content)
    else:
      for chunk in r.iter_content(1024):
        dl += len(chunk)
        f.write(chunk)
        done = int(50 * dl / total_length)
        sys.stdout.write("\r[%s%s] %s bps" % ('=' * done, ' ' * (50-done), dl//(time.clock() - start)))
        print ''
  return (time.clock() - start)

def main() :
  if len(sys.argv) > 1 :
        url = sys.argv[1]
  else :
        url = raw_input("Enter the URL : ")
  directory = raw_input("Where would you want to save the file ?")

  time_elapsed = downloadFile(url, directory)
  print "Download complete..."
  print "Time Elapsed: " + time_elapsed


if __name__ == "__main__" :
  main()
Community
  • 1
  • 1
freeforall tousez
  • 817
  • 10
  • 26
  • This code looks good but IMO it won't show dynamic downloading , since when we request for `requests.get(...)` it will download entire file then it will come out of get function. This is dynamic features . – sonus21 Aug 19 '15 at 11:18
  • 2
    @sonukumar, notice the `stream` parameter in the get call `request.get(url , stream=True)`. Check out [the documentation](http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow). – allonhadaya Feb 04 '16 at 15:44
  • @freeforalltousez What's the meaning of multiplying 50 when calculating the downloaded percentage ? – Juancho Jun 29 '17 at 06:52
  • @Juancho it's the length of the progress bar. See the linked answer. – freeforall tousez Jun 29 '17 at 12:08
4

An improved version of the accepted answer for python3 using io.Bytes (write to memory), result in Mbps, support for ipv4/ipv6, size and port arguments.

def speed_test(size=5, ipv="ipv4", port=80):
    import sys, time, io, requests
    if size == 1024:
        size = "1GB"
    else:
        size = f"{size}MB"

    url = f"http://{ipv}.download.thinkbroadband.com:{port}/{size}.zip"

    with io.BytesIO() as f:
        start = time.clock()
        r = requests.get(url, stream=True)
        total_length = r.headers.get('content-length')
        dl = 0
        if total_length is None: # no content length header
            f.write(r.content)
        else:
            for chunk in r.iter_content(1024):
                dl += len(chunk)
                f.write(chunk)
                done = int(30 * dl / int(total_length))
                sys.stdout.write("\r[%s%s] %s Mbps" % ('=' * done, ' ' * (30-done), dl//(time.clock() - start) / 100000))

    print( f"\n{size} = {(time.clock() - start):.2f} seconds")

Usage Examples:

speed_test()
speed_test(10)
speed_test(50, "ipv6")
speed_test(1024, port=8080)

Output Sample:

[==============================] 61.34037 Mbps
100MB = 17.10 seconds

Available Options:

size: 5, 10, 20, 50, 100, 200, 512, 1024

ipv: ipv4, ipv6

port: 80, 81, 8080

Pedro Lobito
  • 75,541
  • 25
  • 200
  • 222
  • The function `time.clock()` has been removed, after having been deprecated since Python 3.3: use `time.perf_counter()` in above solution code. – Shiro May 07 '21 at 15:56