0

I want to download a doc file in a website throw python spider. I have the file url, that means the file will be downloaded automatically when I put input the url in the browser after I login. If I did not login, it will return a 404 error. I only konw urllib.urlretrieve(url, 'path/filename') can download, but I do not know how to simulate into login state using urlretrieve. Or is there any other ways to download it? Help me please, thanks.

thiiiiiking
  • 1,093
  • 2
  • 11
  • 16

1 Answers1

0

maybe you can try grab framework(others can do so, this is just an example), it's easy to fill in the input and submit:

from grab import Grab
import logging

logging.basicConfig(level=logging.DEBUG)
g = Grab()
g.go('https://github.com/login')
g.set_input('login', '***')
g.set_input('password', '***')
g.submit()

then you can download your doc files.

Sinux
  • 1,478
  • 2
  • 12
  • 26