Error with encoding with regular expression

Question

I am using scrapy and django. While cleaning the data I use this:

html = re.sub(r'(™|®|©|&trade;|&reg;|&copy;|&#8482;|&#174;|&#169;)', '',html, flags=re.IGNORECASE)

Running in normal python shell is fine. However every time I try to run this with scrapy crawl, I get this error:

SyntaxError: Non-ASCII character '\xe2' in file /somefile/ on line 105, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

Can someone please help me out. Thanks!

Did you try to extract your search string out and declare it a unicode string explicitly? `mystring=u'regexp|regexp'` and using that in the substring matching? — user1603472, Feb 23 '15 at 23:44

score 0 · Answer 1 · answered Feb 24 '15 at 00:37

0

I declared encoding in my file by:

#!/usr/bin/python
# -*- coding: utf-8 -*-

Make sure to put them on the first line of the file. This seemed to fix my problem.

Thank You Everyone!

answered Feb 24 '15 at 00:37

Nazariy

671
5
22

Important: This just sets the encoding for the code, not for files you read. – Matthias Feb 24 '15 at 09:02

Error with encoding with regular expression

1 Answers1