0

I am trying to load the "Crimes in Boston" database on Kaggle notebook, (https://www.kaggle.com/AnalyzeBoston/crimes-in-boston) by the way, the most updated version of this data can be found here: (https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system/resource/12cb3883-56f5-47de-afa5-3b1cf61b257b) , and when I want to read the data using Pandas, I get this error:

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte

Here is how I try to load the data into the kernel:

my_filepath = '../input/crimes-in-boston/crime.csv'
my_data = pd.read_csv(my_filepath , encoding = 'utf8')
Safa
  • 63
  • 8
  • 1
    Please share the data of your csv file. – Sagar Gupta Aug 26 '19 at 08:31
  • 4
    Then it is not an UTF-8 file. You did not link to it, and I don't want to search for it, but... try another encoding. – Amadan Aug 26 '19 at 08:31
  • 1
    @SagarGuptaFTW https://www.kaggle.com/AnalyzeBoston/crimes-in-boston – Safa Aug 26 '19 at 08:33
  • @amadan Sorry for forgetting to put the link. here is the link to the data : https://www.kaggle.com/AnalyzeBoston/crimes-in-boston – Safa Aug 26 '19 at 08:35
  • @SafaGharavi Unable to download. Asking for login. – Sagar Gupta Aug 26 '19 at 08:35
  • @SagarGuptaFTW what about this? https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system/resource/12cb3883-56f5-47de-afa5-3b1cf61b257b – Safa Aug 26 '19 at 08:44
  • @SafaGharavi What solved your question? – Sagar Gupta Aug 26 '19 at 09:03
  • @SafaGharavi With providing a different dataset than your original question, you have actually tampered with the original question, and basically hacked your own question. This isn't appreciated. – Sagar Gupta Aug 26 '19 at 09:10
  • @SafaGharavi can you please mark the correct answer as the "real answer"? What is the difference between your problem, the current answer and my answer? Why you don´t mark the real answer from Tankred as the answer? – Kampi Aug 26 '19 at 09:37
  • @SagarGuptaFTW I mentioned that I am working on Kaggle kernels environment and trying to read the csv file there. But it seems that (as people commenting here claim so, I haven't tested it yet) the problem is because of the Kaggle's data version. – Safa Aug 26 '19 at 09:41
  • @SafaGharavi Just imagine a new user comes to this page. They will be completely baffled by what's going on here. Question is tampered, not making much sense anymore. You should infact delete the question altogether, I would say, because itsn't adding any value to StackOverflow wiki. And please ask questions in a better way in the future. – Sagar Gupta Aug 26 '19 at 09:45
  • @SagarGuptaFTW Man, I'm just a beginner trying to learn something here, not hacking my own question or anything else. – Safa Aug 26 '19 at 09:49
  • @SafaGharavi We are all learners here. But understand that StackOverflow is not a question-answer site, it's a wiki. It should contain useful resources for all users (including future visitors). This question isn't a good quality question anymore, so should be deleted. – Sagar Gupta Aug 26 '19 at 09:52
  • @SagarGuptaFTW StackOverflow doesn't let me delete my question because people have commented on it so far. – Safa Aug 26 '19 at 09:54

1 Answers1

2

I could read the file using encoding='ansi'. See this question for some info on ANSI encoding.

My solution:

import pandas as pd
df = pd.read_csv('crime.csv', encoding='ansi')

Update: If you are getting the error LookupError: unknown encoding: ansi use encoding='cp1252'.

Tankred
  • 136
  • 1
  • 7