29

I'm working on a small application using Google App Engine which makes use of the Quora RSS feed. There is a form, and based on the input entered by the user, it will output a list of links related to the input. Now, the applications works fine for one letter queries and most of two-letter words if the words are separated by a '-'. However, for three-letter words and some two-letter words, I get the following error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 48: ordinal not in range(128)

Here's my Python code:

import os
import webapp2
import jinja2
from google.appengine.ext import db
import urllib2
import re

template_dir = os.path.join(os.path.dirname(__file__), 'templates')
jinja_env = jinja2.Environment(loader = jinja2.FileSystemLoader(template_dir), autoescape=True)

class Handler(webapp2.RequestHandler):
    def write(self, *a, **kw):
        self.response.out.write(*a, **kw)
    def render_str(self, template, **params):
        t = jinja_env.get_template(template)
        return t.render(params)
    def render(self, template, **kw):
        self.write(self.render_str(template, **kw))

class MainPage(Handler):
    def get(self):
        self.render("formrss.html")
    def post(self):
        x = self.request.get("rssquery")
        url = "http://www.quora.com/" + x + "/rss"
        content = urllib2.urlopen(url).read()
        allTitles =  re.compile('<title>(.*?)</title>')
        allLinks = re.compile('<link>(.*?)</link>')
        list = re.findall(allTitles,content)
        linklist = re.findall(allLinks,content)
        self.render("frontrss.html", list = list, linklist = linklist)



app = webapp2.WSGIApplication([('/', MainPage)], debug=True)

Here's the html code:

<h1>Quora Live Feed</h1><br><br><br>

{% extends "rssbase.html" %}

{% block content %}
    {% for e in range(1, 19) %}
        {{ (list[e]) }} <br>
        <a href="{{ linklist[e] }}">{{ linklist[e] }}</a>
        <br><br>
    {% endfor %}
{% endblock %}
Manas Chaturvedi
  • 4,110
  • 15
  • 43
  • 98
  • 1
    Can you give us the *full* traceback? Just the exception tells us nothing about where the exception was raised or how Python came to that location. – Martijn Pieters Jan 03 '14 at 15:26
  • 1
    This error is quite terrible and happens sometimes with python and it's really confusing and I see I'm not alone in having had it. And I had it multiple times and the answer are not even always clear whether to encode or decode. This error never happens in Java for instance where "everything is unicode" so why is Python imposing this source of confusion at us when Java never has this problem? I had this error multiple times coding internatiolalized webapps on Google App Engine and it was never clear not even when it was working what to do. – Niklas R. Jan 03 '14 at 15:59

2 Answers2

45

Python is likely trying to decode a unicode string into a normal str with the ascii codec and is failing. When you're working with unicode data you need to decode it:

content = content.decode('utf-8')
Jon Wayne Parrott
  • 1,341
  • 10
  • 18
0

In my AppEngine app, I convert it like:

content = unicode(content)

I think it more clear and easy to use.

KimKha
  • 3,977
  • 34
  • 43