I'm writing a web crawler and need to save the html
from the webpage I crawled into my MongoDB
database. This is what I'm trying to do(I'm using pymongo
):
c=urllib2.urlopen(myUrl)
html=c.read()
db.urls.insert(
{
"url":myUrl,
"HTML":html
}
)
When I run my script, I get the following error:
InvalidStringData: strings in documents must be valid UTF-8
I tried looking up my problem and figured out that I need to process the HTML somehow before saving it, so it's UTF-8 compatible, but I couldn't find how.
I don't think my question is a duplicate of python encoding utf-8 since I do not see how that question is related to HTML. If I'm wrong, or my problem has nothing to do with HTML, please direct me.