99

How should I escape or cleanse user-provided passwords before I hash them and store them in my database?

When PHP developers consider hashing users' passwords for security purposes, they often tend to think of those passwords like they would any other user-provided data. This subject comes up often in PHP questions related to password storage; the developer often wants to cleanse the password using functions such as escape_string()(in various iterations), htmlspecialchars(), addslashes() and others before hashing it and storing it in the database.

Taryn
  • 224,125
  • 52
  • 341
  • 389
Jay Blanchard
  • 32,731
  • 15
  • 70
  • 112

2 Answers2

100

You should never escape, trim or use any other cleansing mechanism on passwords you'll be hashing with PHP's password_hash() for a number of reasons, the single largest of which is because doing additional cleansing to the password requires unnecessary additional code.

You will argue (and you see it in every post where user data is accepted for use in your systems) that we should cleanse all user input and you would be right for every other piece of information we're accepting from our users. Passwords are different. Hashed passwords cannot offer any SQL injection threat because the string is turned into hash prior to storing in the database.

The act of hashing a password is the act of making the password safe to store in your database. The hash function doesn't give special meaning to any bytes, so no cleansing of its input is required for security reasons

If you follow the mantras of allowing users to use the passwords / phrases they desire and you don't limit passwords, allowing any length, any number of spaces and any special characters hashing will make the password/passphrase safe no matter what is contained within the password. As of right now the most common hash (the default), PASSWORD_BCRYPT, turns the password into a 60 character wide string containing a random salt along with the hashed password information and a cost (the algorithmic cost of creating the hash):

PASSWORD_BCRYPT is used to create new password hashes using the CRYPT_BLOWFISH algorithm. This will always result in a hash using the "$2y$" crypt format, which is always 60 characters wide.

The space requirements for storing the hash are subject to change as different hashing methods are added to the function, so it is always better to go larger on the column type for the stored hash, such as VARCHAR(255) or TEXT.

You could use a complete SQL query as your password and it would be hashed, making it unexecutable by the SQL engine e.g.,

SELECT * FROM `users`;

Could be hashed to $2y$10$1tOKcWUWBW5gBka04tGMO.BH7gs/qjAHZsC5wyG0zmI2C.KgaqU5G

Let's see how different sanitizing methods affect the password -

The password is I'm a "dessert topping" & a <floor wax>! (There are 5 spaces at the end of the password which are not displayed here.)

When we apply the following methods of trimming we get some wildy different results:

var_dump(trim($_POST['upassword']));
var_dump(htmlentities($_POST['upassword']));
var_dump(htmlspecialchars($_POST['upassword']));
var_dump(addslashes($_POST['upassword']));
var_dump(strip_tags($_POST['upassword']));

Results:

string(40) "I'm a "dessert topping" & a <floor wax>!" // spaces at the end are missing
string(65) "I'm a &quot;dessert topping&quot; &amp; a &lt;floor wax&gt;!     " // double quotes, ampersand and braces have been changed
string(65) "I'm a &quot;dessert topping&quot; &amp; a &lt;floor wax&gt;!     " // same here
string(48) "I\'m a \"dessert topping\" & a <floor wax>!     " // escape characters have been added
string(34) "I'm a "dessert topping" & a !     " // looks like we have something missing

What happens when we send these to password_hash()? They all get hashed, just as the query did above. The problem comes in when you try to verify the password. If we employ one or more of these methods we must re-employ them prior to comparing them with password_verify(). The following would fail:

password_verify($_POST['upassword'], $hashed_password); // where $hashed_password comes from a database query

You would have to run the posted password through the cleansing method you chose before using the result of that in password verification. It is an unnecessary set of steps and will make the hash no better.


Using a PHP version less than 5.5? You can use the password_hash() compatibility pack.

You really shouldn't use MD5 password hashes.

Community
  • 1
  • 1
Jay Blanchard
  • 32,731
  • 15
  • 70
  • 112
  • 1
    If you hash the password with the trailing spaces, and buddy does not include them on his next attempt to authenticate, does he still get access? – Dan Bracuk Apr 14 '16 at 16:18
  • 13
    No. If he created his password with trailing spaces, which is allowed, he must use them on login @DanBracuk – Jay Blanchard Apr 14 '16 at 16:19
  • 1
    So your recommendation has a major disadvantage with respect to usability. – Dan Bracuk Apr 14 '16 at 16:20
  • 12
    How so @DanBracuk? If we allow the user to setup the password s/he desires, including leading/trailing spaces? – Jay Blanchard Apr 14 '16 at 16:22
  • 16
    That's why most things require you to enter your chosen password twice. If user added the spaces on accident they will figure it out before getting any further. If user did it on purpose than it's a non-issue. – I wrestled a bear once. Apr 14 '16 at 16:24
  • 2
    I didn't really get the whole point of this question-answer thing. The rule of thumb is to sanitize what the business requires to and to **escape everything going in the db**. Period. You have to escape the hash, nobody know how things will change in the future. Regarding the password cleansing, that's an hot topic. Google for example [trims the password](https://support.google.com/accounts/answer/41078?hl=en), I personally don't like it. But its help center surely handle more angry users than I do. – Margaret Bloom Apr 14 '16 at 22:08
  • 4
    @MargaretBloom, a rule of thumb is just a heuristic. We sometimes still need to think things through, like for passwords. You say "nobody knows how things will change in the future", but it seems if anything is going to change it's the way we escape data before we put it into the database, in which cases users would find themselves locked out when their passwords no longer match what we've stored. What is the danger in not escaping password hashes vs. the danger of escaping them? – DavidS Apr 14 '16 at 23:05
  • 1
    And maybe things are different in the PHP world, but in the Java world the rule of thumb isn't to escape everything going in the db, it's to use prepared statements. So much for rules of thumb. – DavidS Apr 14 '16 at 23:24
  • 3
    Exactly: you will of course "escape the hash" in the limited sense of correctly passing it to a parameterized SQL query, where some code in your SQL connector may or may not do anything with it that corresponds to "escaping", you don't know and don't care. You just won't have to write any specific code to achieve that, because it's completely routine for all your SQL queries unless you've previously made some poor life decisions. – Steve Jessop Apr 14 '16 at 23:25
  • 2
    The PHP world is trying to get to the prepared statements rule of thumb, but older database API's are wide-spread in the wild and have promoted bad habits @DavidS – Jay Blanchard Apr 15 '16 at 12:06
  • What about `filter_input()` I personnaly filter the password input just to strip ASCII < 32. – Louis Loudog Trottier Feb 27 '17 at 20:40
  • Are you filtering when hashing and when verifying @LouisLoudogTrottier? Even if you are you shouldn't because the password is made safe by the act of hashing it. – Jay Blanchard Feb 27 '17 at 20:47
  • @JayBlanchard I filter the raw user input from a html form like so `$password=filter_input(INPUT_POST,"password",FILTER_UNSAFE_RAW,FILTER_FLAG_STRIP_LOW)` that "Strip characters with ASCII value less than 32". Then, it's processed by PHP (hash, verify) and passed to mysqli_statement. So ya the only filter i ever apply only strip ASCII les than 32 ei: NULL, [TAB], [New line], etc.. see http://www.asciitable.com – Louis Loudog Trottier Feb 28 '17 at 03:54
  • in order to prevent this => "Please note that password_hash will ***truncate*** the password at the first NULL-byte." => 3rd comment on php.net http://php.net/manual/en/function.password-hash.php – Louis Loudog Trottier Feb 28 '17 at 03:58
  • 1
    Good point @LouisLoudogTrottier, but how often would a NULL byte character be entered into a passphrase? It would seem rather low odds for someone specify an octal in their passphrase. – Jay Blanchard Feb 28 '17 at 13:38
  • i don't know either, maybe character encoding on a foreign language, miss configured machine, defect hardware or maybe just like the good old ALT-255 on windows. Or some od string like `%00`. Truly i don't know. I'm just paranoid (or OCD) that way. On the other hand, if anyone does used those low ascii, why not just strip them away to prevent the infinitly small chance that a NULL goes through. i appreciate the reply. – Louis Loudog Trottier Feb 28 '17 at 14:14
  • I am going to perform some testing and, if warranted, I will update the answer with this information @LouisLoudogTrottier – Jay Blanchard Feb 28 '17 at 14:16
  • I've made a fiddle a ran some test on your string and some other random string made with my fr-ca keyboard (±@£¢¤£¦¬¬¤³¦²½³½¾§¶[]}{~l­¯µ) and successfully password_verify() and password_hash() from a filter_input and it works. I don't know how or with what tools we can send null bytes so i didn't post my results. Seems like all other function failed comparing 'RAW' exvept filter_input(). I've dump the code on a sandbox => http://sandbox.onlinephpfunctions.com/code/47ebec3e5001174d3a58041ae5a4a3d955bb4ff6 and can run it on phpfiddle. Geetings. – Louis Loudog Trottier Mar 01 '17 at 03:52
  • Problem occurs, and we have funny behavior (from a funny input) if we force $_POST by adding `$_POST['upassword']="\01234567 ";` bewteen line 2 and 3. Most of them return 'Good', trim fails (beacuse of the trailing spaces) and filter_input 'become' NULL. Maybe my way of testing is wrong or innacurate but seems like my test conclude that it is even worst to use filter input. Now we still need a way to strip NULL bytes from going to password_hash(). – Louis Loudog Trottier Mar 01 '17 at 03:59
  • 1
    @JayBlanchard I am trimming my all user data including Password. So after reading this discussion and answer, I think I should **not trim Password**. Also I have read your article here http://www.jayblanchard.net/proper_password_hashing_with_PHP.html It is really very well explanation of `password_hash()` and `password_verify()` like easily understandable. Please share such more articles like this in easy language. Beast explanation I ever found of `password_hash()` and `password_verify()`. Thank you very much. – Vi_real Aug 09 '19 at 13:59
36

Before hashing the password, you should normalise it as described in section 4 of RFC 7613. In particular:

  1. Additional Mapping Rule: Any instances of non-ASCII space MUST be mapped to ASCII space (U+0020); a non-ASCII space is any Unicode code point having a Unicode general category of "Zs" (with the exception of U+0020).

and:

  1. Normalization Rule: Unicode Normalization Form C (NFC) MUST be applied to all characters.

This attempts to ensure that if the user types the same password but using a different input method, the password should still be accepted.

Ali
  • 2,713
  • 2
  • 16
  • 39
legoscia
  • 37,068
  • 22
  • 103
  • 148
  • What would be an example of "the same password but using a different input method"? – DavidS Apr 14 '16 at 21:31
  • 3
    @DavidS, A super shiny North american Mac Book (that Joe used just before leaving) and a poorly internationalized Taiwanese internet café computer (that Joe is trying to use to download is flight back boarding card). – Margaret Bloom Apr 14 '16 at 22:15
  • 2
    Sounds jingoistic. :-) Thanks though. – DavidS Apr 14 '16 at 22:46
  • 3
    Hmm. If you do this, then you should also validate passwords to reject any that contain as-yet-unassigned characters. It would be terrible if a user uses NEWFANGLED SPACE, which your app doesn't recognize and therefore hashes as-is, and then you upgrade your Unicode Character Database and suddenly NEWFANGLED SPACE gets mapped to SPACE before hashing, such that (s)he can no longer enter a password that your app will hash to the old hash. – ruakh Apr 14 '16 at 22:46
  • 1
    Curious - how would a non-Ascii space get entered into the password if the application is UTF-8 all the way through? – Jay Blanchard Apr 27 '16 at 19:39
  • @JayBlanchard Unicode has [17 different space characters](http://www.fileformat.info/info/unicode/category/Zs/list.htm). – user3942918 Apr 27 '16 at 19:49
  • 1
    Right @PaulCrovella, but if an applications and its database are setup to use UTF-8 all the way through why would you need any additional mapping? – Jay Blanchard Apr 27 '16 at 19:59
  • 4
    @JayBlanchard Because when you press a space bar on one machine and when you press it on another machine you might get two different Unicode code points, and they'll have two different UTF-8 encodings, without the user being aware of anything. It could be argued that this is a problem you wish to ignore, but RFC 7613 was borne out of such real-life issues, it's not a make-work recommendation. – Kuba hasn't forgotten Monica Apr 27 '16 at 20:03
  • 1
    @ruakh Once you decide on handling passwords in a certain way, they must remain handled that way, or else things will break for existing use cases. If you intend to change the preprocessing method in the future, you should store it along the preprocessed and hashed representation of the password. That way, once you receive the input, you select the preprocessing/hashing method based on what you're comparing to. – Kuba hasn't forgotten Monica Apr 27 '16 at 20:06
  • 1
    @JayBlanchard For the same reason you want to apply NFC - it's not always clear to a user that, for example, their phone's on-screen keyboard might be sending different codepoints that just happen to look the same as what they send via their laptop's keyboard. For a more visible example (and reason to apply NFC normalization) one might send ά as [U+03AC](http://www.fileformat.info/info/unicode/char/03ac/index.htm) while the other sends it as [U+03B1](http://www.fileformat.info/info/unicode/char/3b1/index.htm) followed by [U+0301](http://www.fileformat.info/info/unicode/char/0301/index.htm). – user3942918 Apr 27 '16 at 20:10
  • Thanks, those are good points to consider @PaulCrovella. I want to do some testing based on this. – Jay Blanchard Apr 27 '16 at 20:18
  • 1
    @KubaOber: NFC is guaranteed to be stable for future versions of Unicode, as is the "general category" of each character, provided the input consists only of already-assigned characters. So legoscia's approach is fine, even if you continue to add support for new characters over time, as long as you reject as-yet-unassigned characters. – ruakh Apr 28 '16 at 14:59
  • By "reject" do you mean they should be stripped out @ruakh? – Jay Blanchard Apr 28 '16 at 18:46
  • @JayBlanchard: Certainly not! That would cause the same problem. By "reject" I mean that you should show an error message. – ruakh Apr 28 '16 at 22:57
  • So, if I am understanding correctly @ruakh, the methodology you would employ is to analyse the proposed password for non-assigned characters and if it contains any of those reject the password and deliver a message to the user stating they cannot use certain characters in their password (I would tell them the non-assigned characters found in their attempt.)? I will try to find resources on the unassigned characters and try to determine what kind of UX impact that might have. – Jay Blanchard Apr 29 '16 at 12:51