Download all the files in the website

Question

I need to download all the files under this links where only the suburb name keep changing in each link

Just a reference https://www.data.vic.gov.au/data/dataset/2014-town-and-community-profile-for-thornbury-suburb

All the files under this search link: https://www.data.vic.gov.au/data/dataset?q=2014+town+and+community+profile

Any possibilities?

Thanks :)

I suggest writing some code that will do that – Ofer Sadan Aug 07 '17 at 06:50 — Ofer Sadan, Aug 07 '17 at 06:50

score 16 · Accepted Answer · answered Aug 07 '17 at 06:59

You can download file like this

import urllib2
response = urllib2.urlopen('http://www.example.com/file_to_download')
html = response.read()

To get all the links in a page

from bs4 import BeautifulSoup

import requests
r  = requests.get("http://site-to.crawl")
data = r.text
soup = BeautifulSoup(data)

for link in soup.find_all('a'):
    print(link.get('href'))

x89 · Answer 2 · 2020-06-21T11:43:40.510

You should first read the html, parse it using Beautiful Soup and then find links according to the file type you want to download. For instance, if you want to download all pdf files, you can check if the links end with the .pdf extension or not.

There's a good explanation and code available here:

https://medium.com/@dementorwriter/notesdownloader-use-web-scraping-to-download-all-pdfs-with-python-511ea9f55e48

Download all the files in the website

2 Answers2