Disk space is cheap, data breaches are not!

Disk space is cheap, data breaches are not!

Many, many years ago (way back in 2015), I wrote an article about keeping data secure. A few things have changed since then, most notably the introduction of GDPR in the EU. You can't have been anywhere near a computer in the last couple of years and not heard about GDPR, but essentially the gist is "ensure you only gather the data of subject you need, keep it secure and only as long as required, or you will get big fines". That's great, but what about when companies don't allow people to secure their own data on their platforms?

This article is my gripe against sites and systems which restrict password length, preventing people from using secure passwords. Yes they need to be stored on a disk somewhere, but disk space is cheap (2TB hard drives for under £60), and data breaches are expensive.

One of the points I raised in my article was to create better passwords. Make them at least 12 characters, and mix upper case, lowercase, numbers, and symbols.  Something like o7baPmIU6@2PE4Ww would be a reasonable password (don't use that - it's public on the internet and no longer secure. Someone will add that to a list of passwords to check).

The problem is, some services still limit the number of characters you can use for your passwords. There's no real technical reason behind this. "Why is that?", you may ask. Simple, passwords should be stored in a way which keeps them secure - this is done by hashing.

A hash is a computational process which takes an input, performs some computations on it, and outputs something completely different. There's a number of wonderful things about hashes:

  • for a given hash output, it's impossible to know the input value
  • a slight change to the input has a huge change to the output, so you can't tell if the input was even close to that of another hash
  • the hash output of a given hash type is always the same length

Let's cover those points in a bit more detail. Unless otherwise stated, the hashes below will be MD5. Do not use md5 to store passwords.

For a given hash output, it's impossible to know the input

For the md5 hash output 6944667566377a4864a2813ad34d168b you do not know what the value I used to input was.  It could have been hello, world!, maybe even wake up, Neo!, or it could be the password o7baPmIU6@2PE4Ww. The only way to tell would be to run each of those through the md5 hash function to see. It's actually none of those. That hash used the input https://www.garybell.co.uk. You would have to try a lot of inputs to get that same hash output.

A slight change of input has a huge change on output

Using the hash input to generate the hash for the point above, if I make the input all uppercase, I get the md5 hash output 562b231eefc45595c6207dddb80a189c. If I only capitalise the h, the output becomes 532813ddc6531a1fc05b57fbd2cce0f0.

That means people with similar passwords will end up with completely different hashes, if a password hash is implemented. If password hashing isn't being used, run away from whatever service that is (I'm looking at you, Facebook Lite).

A hash output for the same type is always the same length

I've used md5 as the example hash here, but there are loads more out there. Others include SHA1, SHA256, and SHA512. SHA1 is in the same boat as md5, in that it should not be used for password storage, because it's not deemed to be secure enough. Once over it was, but times have changed.

The beauty of these is that every md5 hash is the same length as every other md5 hash; every sha1 hash is the same length as other sha1 hashes; every sha256 hash is the same length as other sha256 hash - but is different from sha1 hashes.

Hash Type Length (printable characters)
MD5 32
SHA1 40
SHA256 64
SHA512 128

The reason this consistent length is important for my argument is simple - every input will generate the same output length. This means it doesn't matter if the input is 8 characters or 8000 characters, the result is consistent. The cost for storage is always equal, and essentially trivial.

Stop restricting password lengths

Given that any service will be (at the very least, should be!) storing hashed passwords, they will know the output length of those hashes. They can certainly run a test and check. With that in mind, there's no real reason why they should be limiting password lengths in this day and age.

To those services who are limiting user password lengths, stop! Unless you can honestly justify doing so (like you're on a seriously old platform/database) just stop. If you are on an old platform and can't allow longer passwords yet, upgrade quickly! It might be painful, but at least you won't lose your users because of an archaic technology stack you refused to update.

I think I'll start naming and shaming services I find which limit the password length.