1

I have some code that pulls incomplete URLs from an API and appends these to a base URL. I am trying to extend this to test each URL to make sure it does not lead to a 404 before printing out to screen.

I looked over other answers on how to use urllib with python3 and thought I had done everything correctly, however, I am getting the error in the title.

testurl is my request and resp is my response. This is the code I am using:

                testurl=urllib.request.urlopen("http://www.google.com")
                try:
                    resp = urllib.request.urlopen(testurl)
                except urllib.error.HTTPError as e:
                    if e.code == 404:
                        blah = 1
                    else:
                        print("it worked!")

What am I missing?

The full error output:

Traceback (most recent call last):
  File "imgtst.py", line 27, in <module>
    resp = urllib.request.urlopen(testurl)
  File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.7/urllib/request.py", line 517, in open
    protocol = req.type
AttributeError: 'HTTPResponse' object has no attribute 'type'

edit:

After the problem is pointed out due to Bruno's answer, I try the following code instead:

try:
    resp = urllib.request.urlopen("http://www.google.com")
except urllib.error.HTTPError as e:
    if e.code == 404:
        print("error")
    else:
        print("it worked")

However, this results in nothing being printed at all.

Jake Rankin
  • 574
  • 3
  • 15
  • Can you explain what's the error you're getting with this code? Also, where is the `urljoin(base_url, modurl)` function defined? – ParvBanks Jan 02 '19 at 09:05
  • @Parvbanks I didn't think the urljoin code was relevant, as it is tested and working, trying to only put the relevant code in question. The error is as per the title, I will edit the question to show the exact output – Jake Rankin Jan 02 '19 at 09:09
  • Possibly duplicate with https://stackoverflow.com/questions/18070600/attributeerror-httpresponse-object-has-no-attribute-type – duong_dajgja Jan 02 '19 at 09:23
  • @duong_dajgja the cause of the error in the question you linked is not the cause of the problem in this question here. The error is the same but the question is not at all a dupe. – Jake Rankin Jan 02 '19 at 09:25

1 Answers1

0

Here:

 testurl=urllib.request.urlopen("http://www.google.com")
 try:
    resp = urllib.request.urlopen(testurl)

The first line calls urlopen and binds the result (an HTTPResponse object) to testurl. Then in the try block, you call urlopen a second time, with the HTTPResponse object as argument - which is, of course, invalid.

EDIT:

with your edited code, ie:

try:
    resp = urllib.request.urlopen("http://www.google.com")
except urllib.error.HTTPError as e:
    if e.code == 404:
        print("error")
    else:
        print("it worked")

"it worked" will only get printed if an HTTPError is raised AND it's not a 404 - the else clause matches the if e.code == 404. So of course if there's no error then nothing get printed at all.

What you want is along the line of:

try:
    result = something_that_may_raise(...)
except SomeExceptionType as e:
    handle_the_error
else:
    do_something_with(result)

So in your case, it would looks like:

try:
    response = urllib.request.urlopen("http://www.google.com")
except urllib.error.HTTPError as e:
    print("error code {}".format(e.code))
else:
    print("it worked: {}".format(response))

Note that here the else clause matches the try clause.

bruno desthuilliers
  • 68,994
  • 6
  • 72
  • 93
  • I based my code here on the accepted answer to this quetsion: https://stackoverflow.com/questions/1726402/in-python-how-do-i-use-urllib-to-see-if-a-website-is-404-or-200 - If what I am doing is incorrect why would it be the accepted answer to another question? – Jake Rankin Jan 02 '19 at 09:31
  • @JakeRankin Did you really bothered reading my answer at all ??? FWIW, in the accepted answer of the post you link to, no one is passing an `HTTPResponse` to `urlopen` - which doesn't make any sense, quite obviously. – bruno desthuilliers Jan 02 '19 at 09:52
  • my apologies. If I remove the first line assigning testurl, and give a url as a paramter to urlopen being assigned to resp, e.g. `resp = urllib.request.urlopen("http://www.google.com")`, nothing is printed. If that is not the correct approach, what is? – Jake Rankin Jan 02 '19 at 10:00
  • The "correct approach" is to try to understand what you're doing actually. wrt/ your second code snippet, the reason why "nothing is printed" is absolutely obvious, cf my edited answer. – bruno desthuilliers Jan 02 '19 at 12:54