0

I'm developing a web application using Flask and a Postgresql database (using SQLAlchemy). It works just fine. Except for a little problem: all text data comes in some encoding that it's not utf-8 (at least I think it's not).

I looked around to see any possible solutions and this is what I've tried:

  • Include this line at every Python file in the project:
# -*- coding: utf-8 -*-
  • Set the engine like this:
def __init__(self):
        self.engine = sql.create_engine(os.environ['DATABASE_URL'], encoding='utf8')
        self.conn = self.engine.connect()
        self.metadata = sql.MetaData()

And a few variations of these previous items like client_encoding='utf8 instead of encoding='utf8. I also tried these solutions:

For example, this is one of my select functions (but the problem happens in every select / insert / update functions):

def get_produtos(self, id):
        produtos = sql.Table('produtos', self.metadata, autoload=True,
                             autoload_with=self.engine).columns
        parceiros = sql.Table('parceiros', self.metadata, autoload=True,
                              autoload_with=self.engine).columns
        q = sql.select([produtos.id, produtos.nome, parceiros.nome_parceiro,
                        produtos.valor, produtos.qtd_desconto,
                        parceiros.id_parceiro])
        q = q.where(produtos.id == id)
        return self.conn.execute(q).fetchall()

The program does not raises any exception but the data just comes with the wrong encoding.

This is how the data is in my database: https://i.imgur.com/8BA1i3N.png

This is how the data shows up in my app: https://i.imgur.com/NZBxFtl.png

  • Can you [edit] your question to add the output of this command from the psql console: `\l '`. This will show us the defaults for your database. – snakecharmerb Sep 25 '19 at 06:40
  • The application is running on Heroku so I can only acces the database itself. Running the command you said logged in my database (not sure if it's the right thing to do), I got this: https://i.imgur.com/S1ZQxOT.png – Marcos Sombra Sep 25 '19 at 16:00
  • 1
    A flagrant [mojibake](https://en.wikipedia.org/wiki/Mojibake) case. You are right suspecting that _all text data comes in some encoding that it's not utf-8_. `'Se‡Æo Eletr“nicos'` is definitely a result from this: `'Seção Eletrônicos'.encode('cp850').decode('cp1252')`… – JosefZ Jan 21 '21 at 16:09
  • @JosefZ this problem was actually solved a while ago, but thanks for answering since it might help other people – Marcos Sombra Jan 21 '21 at 18:26
  • @MarcosSombra Well. Then, you should [answer the question yoursef](https://stackoverflow.com/help/self-answer) (and finally accept the answer ([See this page](https://meta.stackexchange.com/questions/5234/) for an explanation of why this is important.) – JosefZ Jan 21 '21 at 20:16

1 Answers1

1

As JosefZ said, it's a Mojibake case. This problem in particular occured because some entries were inserted via shell and there was no encoding and led to this inconsistency.