10

I need to download all the files under this links where only the suburb name keep changing in each link

Just a reference https://www.data.vic.gov.au/data/dataset/2014-town-and-community-profile-for-thornbury-suburb

All the files under this search link: https://www.data.vic.gov.au/data/dataset?q=2014+town+and+community+profile

Any possibilities?

Thanks :)

Bharath
  • 111
  • 1
  • 1
  • 8

2 Answers2

16

You can download file like this

import urllib2
response = urllib2.urlopen('http://www.example.com/file_to_download')
html = response.read()

To get all the links in a page

from bs4 import BeautifulSoup

import requests
r  = requests.get("http://site-to.crawl")
data = r.text
soup = BeautifulSoup(data)

for link in soup.find_all('a'):
    print(link.get('href'))
naren
  • 12,849
  • 5
  • 34
  • 43
1

You should first read the html, parse it using Beautiful Soup and then find links according to the file type you want to download. For instance, if you want to download all pdf files, you can check if the links end with the .pdf extension or not.

There's a good explanation and code available here:

https://medium.com/@dementorwriter/notesdownloader-use-web-scraping-to-download-all-pdfs-with-python-511ea9f55e48

x89
  • 1,075
  • 10
  • 32