Cryptographic hash functions are one of the most fundamental building blocks of modern computing security. They are used everywhere — in password storage, digital signatures, file integrity verification, blockchain technology, version control systems like Git, and SSL certificates. Yet many developers use hash functions without fully understanding what they are, how they work, or why choosing the right one matters. This guide explains hashing clearly and practically.
What is a Hash Function?
A hash function is a mathematical algorithm that takes any input — a single character, a paragraph of text, or an entire file — and produces a fixed-length output called a hash, digest, or checksum. No matter how large or small the input is, the output is always exactly the same length for a given algorithm. SHA-256 always produces exactly 64 hexadecimal characters. MD5 always produces 32. The same input always produces the same output — this property is called determinism. But even a tiny change in input — changing a single letter or a single bit — produces a completely different output. This is called the avalanche effect.
Key Properties of Cryptographic Hash Functions
- Deterministic — the same input always produces the same hash output
- One-way (pre-image resistance) — you cannot reverse a hash to find the original input
- Avalanche effect — a tiny change in input produces a completely different hash
- Fixed output length — the output is always the same size regardless of input size
- Collision resistance — it should be computationally infeasible to find two different inputs that produce the same hash
MD5 — Fast But Broken
MD5 (Message Digest 5) was designed in 1991 and produces a 128-bit (32 hex character) hash. It was widely used for password hashing and digital signatures for many years. However, in 2004 researchers demonstrated that MD5 is vulnerable to collision attacks — meaning two different inputs can be crafted to produce the same MD5 hash. This completely breaks its usefulness for security applications. Today MD5 should only be used for basic non-security file integrity checks where an adversary is not trying to manipulate the data.
SHA-1 — Deprecated
SHA-1 (Secure Hash Algorithm 1) produces a 160-bit (40 hex character) hash and was considered secure for many years after MD5's weaknesses were discovered. However, in 2017 Google's research team demonstrated the first practical SHA-1 collision attack, called SHAttered. Since then SHA-1 has been deprecated by all major standards bodies and browsers. Like MD5 it should not be used for security-sensitive applications. Git still uses SHA-1 for commit hashes but is migrating to SHA-256.
If you are maintaining a legacy system that still uses MD5 or SHA-1 for security purposes, prioritize migrating to SHA-256 as soon as possible.
SHA-256 and SHA-512 — The Modern Standard
SHA-256 and SHA-512 are both part of the SHA-2 family designed by the NSA and published in 2001. No practical attacks against either have been discovered. SHA-256 produces a 256-bit (64 hex character) hash and is the most widely used secure hash algorithm today — it is used in Bitcoin, TLS certificates, code signing, and countless security protocols. SHA-512 produces a 512-bit (128 hex character) hash and is theoretically more resistant to brute-force attacks, though in practice SHA-256 is considered secure for almost all applications.
Real World Uses of Hash Functions
- Password storage — websites store the hash of your password, not the password itself. When you log in, the entered password is hashed and compared to the stored hash.
- File integrity verification — software downloads include SHA-256 checksums so you can verify the file was not corrupted or tampered with during download
- Git version control — every commit, file, and tree in Git is identified by its SHA-1 (soon SHA-256) hash
- Digital signatures — documents and code are signed by hashing the content and encrypting the hash with a private key
- Blockchain — Bitcoin uses SHA-256 to create the chain of blocks and in the proof-of-work mining algorithm
- SSL/TLS certificates — certificates use SHA-256 to create a tamper-evident fingerprint of the certificate data
Never use MD5 or SHA-1 for storing passwords. And never use SHA-256 or SHA-512 alone for passwords either — always use a dedicated password hashing function like bcrypt, scrypt, or Argon2 which are specifically designed to be slow and resistant to brute force attacks.