1

I have a User class in SQLAlchemy. I want to be able to encrypt the user's email address attribute in the database but still make it searchable through the filter query.

My problem is that if I use @hybrid_property my query theoretically works, but my construction doesn't, and if I use @property my construction works but my query doesn't

from cryptography.fernet import Fernet  # <- pip install cryptography
from werkzeug.security import generate_password_hash
class User(db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True)
    email_hash = db.Column(db.String(184), unique=True, nullable=False)
    password_hash = db.Column(db.String(128))

    # @property       # <- Consider this as option 2...
    @hybrid_property  # <- Consider this as option 1...
    def email(self):
        f = Fernet('SOME_ENC_KEY')
        value = f.decrypt(self.email_hash.encode('utf-8'))
        return value
    @email.setter
    def email(self, email):
        f = Fernet('SOME_ENC_KEY')
        self.email_hash = f.encrypt(email.encode('utf-8'))

    @property
    def password(self):
        raise AttributeError('password is not a readable attribute.')
    @password.setter
    def password(self, password):
        self.password_hash = generate_password_hash(password)

    def __init__(self, **kwargs):
        super(User, self).__init__(**kwargs)
        # other checks and modifiers

For option 1: When I attempt to construct a user with User(email='a@example.com',password='secret') I receive the traceback,

~/models.py in __init__(self, **kwargs)
    431     # Established role assignment by default class initiation
    432     def __init__(self, **kwargs):
--> 433         super(User, self).__init__(**kwargs)
    434         if self.role is None:
    435             _default_role = Role.query.filter_by(default=True).first()

~/lib/python3.6/site-packages/sqlalchemy/ext/declarative/base.py in _declarative_constructor(self, **kwargs)
    697             raise TypeError(
    698                 "%r is an invalid keyword argument for %s" %
--> 699                 (k, cls_.__name__))
    700         setattr(self, k, kwargs[k])
    701 _declarative_constructor.__name__ = '__init__'
TypeError: 'email' is an invalid keyword argument for User

For option 2: If instead I change @hybrid_property to @property the construction is fine but then my query User.query.filter_by(email=form.email.data.lower()).first() fails and returns None.

What should I change to get it working as required?

==============

Note I should say that I have tried to avoid using dual attributes since I didn't want to make extensive edits to the underlying codebase. so I have explicitly tried to avoid separating creation with querying in terms of User(email_input='a@a.com', password='secret') and User.query.filter_by(email='a@a.com').first():

class User(db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True)
    email_hash = db.Column(db.String(184), unique=True, nullable=False)
    password_hash = db.Column(db.String(128))

    @hybrid_property
    def email(self):
        f = Fernet('SOME_ENC_KEY')
        value = f.decrypt(self.email_hash.encode('utf-8'))
        return value
    @property
    def email_input(self):
        raise AttributeError('email_input is not a readable attribute.')
    @email_input.setter
    def email_input(self, email):
        f = Fernet('SOME_ENC_KEY')
        self.email_hash = f.encrypt(email.encode('utf-8'))

    @property
    def password(self):
        raise AttributeError('password is not a readable attribute.')
    @password.setter
    def password(self, password):
        self.password_hash = generate_password_hash(password)

    def __init__(self, **kwargs):
        super(User, self).__init__(**kwargs)
        # other checks and modifiers
Attack68
  • 2,931
  • 1
  • 9
  • 25

1 Answers1

0

In your hybrid_property, email, the line self.f.decrypt(self.email_hash.encode('utf-8')) is fine if self.email_hash is a str type, however, as email is a hybrid_property, when SQLAlchemy uses it to generate SQL self.email_hash is actually a sqlalchemy.orm.attributes.InstrumentedAttribute type.

From the docs regarding hybrid properties:

In many cases, the construction of an in-Python function and a SQLAlchemy SQL expression have enough differences that two separate Python expressions should be defined.

And so you can define an hybrid_property.expression method which is what SQLAlchemy will use to generate sql, allowing you to keep your string treatment intact in your hybrid_property method.

Here is the code I ended up with that worked for me given your example. I've stripped quite a bit out of your User model for simplicity but all the important parts are there. I also had to make up implementations for other functions/classes that were called in your code but not supplied (see MCVE):

class Fernet:
    def __init__(self, k):
        self.k = k

    def encrypt(self, s):
        return s

    def decrypt(self, s):
        return s

def get_env_variable(s):
    return s

def generate_password_hash(s):
    return s

class User(db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True)
    email_hash = db.Column(db.String(184), unique=True, nullable=False)

    f = Fernet(get_env_variable('FERNET_KEY'))

    @hybrid_property
    def email(self):
        return self.f.decrypt(self.email_hash.encode('utf-8'))

    @email.expression
    def email(cls):
        return cls.f.decrypt(cls.email_hash)

    @email.setter
    def email(self, email):
        self.email_hash = self.f.encrypt(email.encode('utf-8'))



if __name__ == '__main__':
    db.drop_all()
    db.create_all()
    u = User(email='a@example.com')
    db.session.add(u)
    db.session.commit()
    print(User.query.filter_by(email='a@example.com').first())
    # <User 1> 

Unfortunately, the code above only works because the mock Fernet.decrypt method returns the exact object that was passed in. The problem with storing a Fernet encoded hash of the user's email addresses is that Fernet.encrypt does not return the same fernet token from one execution to the next, even with the same key. E.g.:

>>> from cryptography.fernet import Fernet
>>> f = Fernet(Fernet.generate_key())
>>> f.encrypt('a@example.com'.encode('utf-8')) == f.encrypt('a@example.com'.encode('utf-8'))
False

So, you want to query a database for a record, but with no way of knowing what the stored value of field that you are querying actually is at query time. You could build a classmethod that queries the entire users table and loop through each record, decrypting it's stored hash and comparing it to the clear text email. Or you can build a hashing function that will always return the same value, hash new users emails using that function and query the email_hash field directly with the hash of the email string. Of those, the first would be very inefficient given lots of users.

The Fernet.encrypt function is:

def encrypt(self, data):
    current_time = int(time.time())
    iv = os.urandom(16)
    return self._encrypt_from_parts(data, current_time, iv)

So, you could define static values of current_time and iv and directly call Fermat._encrypt_from_parts yourself. Or you could use python's built in hash and just set a fixed seed so that it is deterministic. You could then hash the email string that you want to query and first and directly query Users.email_hash. As long as you didn't do any of the above for password fields!

SuperShoot
  • 7,110
  • 1
  • 22
  • 42
  • I think this works in your example because your encrypt and decrypt functions do not differentiate. However, I tried this approach and now get the new error: `TypeError("token must be bytes")`, on this line `return f.decrypt(cls.email_hash)` which is due to the decryption function. And I simply can't put `.encode('utf-8')` on it since it is an `InstrumentedAttribute` type as you point out. **note I amended my code to include the precise libraries** – Attack68 Jul 19 '18 at 11:39
  • Yes I also realised the inherent mandatory random nonce/salt on the Fernet encryption was problematic. So I switched it to a static AES with constant initialisation vector in the pyCryptodome library. That way the encryptions and decryptions give the same values! Im still not sure this solution fixes the problem, but it has been helpful discussion nonetheless and I am grateful for your time. Will accept answer after a little more testing! – Attack68 Jul 19 '18 at 14:49
  • Yeh, you'll still have the same issue where you won't be able to query the database on a clear text email address but you don't really have to. You can now just query on the `email_hash` and create a regular python `@property` called email which can return the decrypted email of an instantiated User if you need it. Happy I could have been helpful! – SuperShoot Jul 19 '18 at 23:31
  • @Attack68 Were you able to make it work? I'm also facing similar problem. I've tried using `property` `hybrid_property` both but couldn't get it working. Could you please share the code snippet. – Suyash Soni Jan 04 '19 at 12:10
  • @SuyashSoni sorry i never came up with a solution that worked as I originally intended. If you post a new question please link it here – Attack68 Jan 05 '19 at 18:20