How to download doc file using python spider

Question

I want to download a doc file in a website throw python spider. I have the file url, that means the file will be downloaded automatically when I put input the url in the browser after I login. If I did not login, it will return a 404 error. I only konw urllib.urlretrieve(url, 'path/filename') can download, but I do not know how to simulate into login state using urlretrieve. Or is there any other ways to download it? Help me please, thanks.

Try using requests for a simple solution: http://stackoverflow.com/a/17633072/4131059 Use requests.Session to make a session, and then you can post the request. — Alexander Huszagh, Dec 07 '15 at 00:32

score 0 · Accepted Answer · answered Dec 07 '15 at 02:46

maybe you can try grab framework(others can do so, this is just an example), it's easy to fill in the input and submit:

from grab import Grab
import logging

logging.basicConfig(level=logging.DEBUG)
g = Grab()
g.go('https://github.com/login')
g.set_input('login', '***')
g.set_input('password', '***')
g.submit()

then you can download your doc files.

How to download doc file using python spider

1 Answers1