8

I am trying to pull a list of 500 restaurants in Amsterdam from TripAdvisor; however after the 308th restaurant I get the following error:

Traceback (most recent call last):
  File "C:/Users/dtrinh/PycharmProjects/TripAdvisorData/LinkPull-HK.py", line 43, in <module>
    writer.writerow(rest_array)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

I tried several things I found on StackOverflow, but nothing is working as of right now. I was wondering if someone could take a look at my code and see any potential solutions that would be great.

        for item in soup2.findAll('div', attrs={'class', 'title'}):
            if 'Cuisine' in item.text:
                item.text.strip()
                content = item.findNext('div', attrs=('class', 'content'))
                cuisine_type = content.text.encode('utf8', 'ignore').strip().split(r'\xa0')
        rest_array = [account_name, rest_address, postcode, phonenumber, cuisine_type]
        #print rest_array
        with open('ListingsPull-Amsterdam.csv', 'a') as file:
                writer = csv.writer(file)
                writer.writerow(rest_array)
    break
dtrinh
  • 195
  • 2
  • 4
  • 16
  • 1
    `cuisine_type` is a list because you use `.split` (and I'm not sure why you're splitting on non-break spaces...). However, the contents of a row that you pass to `.writerow` need to be strings or numbers. Also, when using the Python 2 `csv` module you're supposed to open the CSV files in binary mode, as mentioned in [the docs](https://docs.python.org/2/library/csv.html). You may find this article helpful: [Pragmatic Unicode](http://nedbatchelder.com/text/unipain.html), which was written by SO veteran Ned Batchelder. – PM 2Ring Nov 15 '16 at 21:24

3 Answers3

16

The rest_array contains unicode strings. When you use csv.writer to write rows, you need to serialise bytes strings (you are on Python 2.7).

I suggest you to use "utf8" encoding:

with open('ListingsPull-Amsterdam.csv', mode='a') as fd:
    writer = csv.writer(fd)
    rest_array = [text.encode("utf8") for text in rest_array]
    writer.writerow(rest_array)

note: please, don't use file as variable because you shadow the built-in function file() (an alias of open() function).

If you want to open this CSV file with Microsoft Excel, you may consider using another encoding, for instance "cp1252" (it allows u"\u2019" character).

Laurent LAPORTE
  • 18,299
  • 4
  • 45
  • 86
4

You're writing a non-ascii character(s) to your csv output file. Make sure you open the output file with the appropriate character encoding that allows for the character(s) to be encoded. A safe bet is often UTF-8. Try this:

with open('ListingsPull-Amsterdam.csv', 'a', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(rest_array)

edit this is for Python 3.x, sorry.

Irmen de Jong
  • 2,428
  • 1
  • 11
  • 24
0

Add these lines at the start of your script

import sys
reload(sys)
sys.setdefaultencoding('utf-8')