Understanding Hashing: Why MD5 is Dead and SHA-256 Rules
In the world of cybersecurity, a good hash is the difference between a secure password and a data breach. Here is what every developer needs to know about hashing algorithms.
Hashing is the process of taking an arbitrary input — a password, a file, an entire disk image — and transforming it into a fixed-length string of characters called a digest. It's a one-way function: you can turn a steak into ground beef, but you cannot turn ground beef back into a steak. That irreversibility is the foundation of modern cryptography.
But not all hash functions are equal, and the difference between a good choice and a bad one is the difference between a secure system and a data breach waiting to happen. This guide covers what every developer needs to know about MD5, SHA-1, SHA-256, and password hashing.
What Makes a Hash Function Cryptographically Secure?
Before looking at specific algorithms, it helps to understand what properties a cryptographic hash function must have:
- Deterministic: The same input always produces the same output. SHA-256("hello")
will always be
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824. - One-way (pre-image resistance): Given a hash output, it should be computationally infeasible to find the original input.
- Avalanche effect: A single-bit change in the input should completely change the output. SHA-256("hello") and SHA-256("hellO") look nothing alike.
- Collision resistance: It should be computationally infeasible to find two different inputs that produce the same hash output.
- Fixed output size: Regardless of whether the input is 1 byte or 10 gigabytes, the output is always the same length.
When researchers say an algorithm is "broken," they almost always mean collision resistance has failed — they have found a method to produce collisions faster than brute force. Pre-image resistance may still hold, but collision vulnerabilities are severe enough to retire an algorithm from security use.
1. The Fallen Hero: MD5
Message Digest algorithm 5 (MD5) was designed in 1991 by Ron Rivest and was, for a decade, the standard for checksums and digital signatures. It produces a 128-bit (32-character hex) digest and runs extremely fast — which turned out to be its fatal flaw.
The collision problem: In 2004, Xiaoyun Wang demonstrated the first practical MD5 collision. By 2008, researchers had created a rogue SSL certificate using an MD5 collision, effectively allowing them to impersonate any website. In 2012, the Flame malware used an MD5 collision to forge a Microsoft code-signing certificate and spread as a legitimate Windows update. This was not a theoretical attack — it was used in production malware targeting critical infrastructure.
A collision occurs when two different inputs produce the exact same hash output. If an attacker can craft a malicious file with the same MD5 hash as a trusted file, your integrity check becomes worthless — it will accept the malicious version as legitimate.
Verdict: Never use MD5 for anything security-related — signatures, integrity verification of untrusted files, certificate fingerprints, or password storage. It is acceptable for non-security checksums (e.g., deduplicating cached files in a private system where attackers cannot influence the input), but there is little reason not to use SHA-256 instead.
2. The Retired Veteran: SHA-1
SHA-1 (Secure Hash Algorithm 1) was designed by the NSA and published in 1995 as the successor to MD5. It produces a 160-bit digest and was the backbone of SSL certificates, Git commit identifiers, and code signing for over 15 years.
In 2005, theoretical weaknesses were identified. In 2017, Google's Project Zero team published SHAttered — the first practical SHA-1 collision attack. They produced two different PDF files with identical SHA-1 hashes, requiring approximately 9.2 quintillion SHA-1 computations (about the equivalent of 6,500 years of single-CPU computation, but achievable in practice using GPU clusters at a cost of around $110,000 in 2017).
By 2020, the cost had dropped dramatically, and SHA-1 collision attacks became practical for well-funded attackers. All major browsers had already dropped support for SHA-1 SSL certificates by 2017.
Verdict: Do not use SHA-1 for new applications. Git still uses SHA-1 for commit hashes (though it is in the process of migrating to SHA-256), which is acceptable because the threat model for version control differs from security certificates — but for any new security use case, use SHA-256 or SHA-3.
3. The Current Standard: SHA-256
SHA-256 is part of the SHA-2 family, designed by the NSA and published in 2001. It produces a 256-bit (64-character hex) digest. As of 2026, no practical collision or pre-image attacks against SHA-256 are known. It is the current gold standard for general-purpose cryptographic hashing.
It is used in:
- Bitcoin and most blockchains: Mining and transaction verification use SHA-256 throughout.
- TLS/SSL certificates: All modern HTTPS certificates use SHA-256 signatures.
- Digital signatures: Code signing, software update verification, and document authentication.
- File integrity: Verifying downloaded files, comparing file states in build pipelines.
- HMAC: Message authentication codes for API request signing (AWS, GitHub webhooks).
Using SHA-256 in JavaScript:
// Using the Web Crypto API (built into modern browsers and Node.js)
async function sha256(message) {
const msgBuffer = new TextEncoder().encode(message);
const hashBuffer = await crypto.subtle.digest('SHA-256', msgBuffer);
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
const hash = await sha256('Hello, World!');
// "dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986d"
The Web Crypto API is available in all modern browsers and Node.js 15+, so there is no need for external cryptography libraries for basic hashing tasks.
4. SHA-3: The Alternative Standard
SHA-3 was standardised by NIST in 2015, not as a replacement for SHA-2 but as an alternative with a completely different internal construction (Keccak sponge function vs. Merkle-Damgaard). The rationale was hedging: if a fundamental weakness were found in SHA-2's construction, SHA-3 would be unaffected because it works differently at the mathematical level.
SHA-3 is not faster than SHA-256 on most hardware, and SHA-256 has shown no signs of weakness, so SHA-3 adoption has been limited. You will mostly encounter it in government and defence contexts, or in specific protocols that mandate it. For everyday use, SHA-256 remains the practical choice.
5. Passwords Are a Special Case
Even SHA-256 is the wrong choice for password storage. The reason is speed — SHA-256 is deliberately fast. A modern GPU can compute billions of SHA-256 hashes per second. That is exactly what an attacker needs for a brute-force attack against a stolen password database.
The attack works like this: an attacker steals a database of hashed passwords, then runs a dictionary of common passwords (plus variations) through SHA-256, comparing results against the stolen hashes. At billions of attempts per second, even complex passwords fall quickly. This is called a rainbow table or dictionary attack.
The fix: slow hashing algorithms + salts.
Password-specific algorithms are designed to be slow and memory-intensive:
- bcrypt: Designed in 1999 specifically for passwords. Has a "cost factor" that controls how slow it runs — you increase it as hardware gets faster. Widely supported across languages and frameworks. Limits passwords to 72 bytes, which is a minor downside.
- scrypt: Adds a memory-hardness requirement on top of bcrypt's slowness. An attacker needs both CPU time and significant RAM to brute-force it, which makes GPU attacks more expensive.
- Argon2: Winner of the 2015 Password Hashing Competition. Has three variants (Argon2i, Argon2d, Argon2id) for different threat models. Argon2id is the current recommendation for most use cases — it is both memory-hard and resistant to side-channel attacks.
A salt is a random value generated per-password and prepended to the input before hashing. It ensures that two users with the same password get different hashes, which defeats pre-computed rainbow tables entirely. Modern password hashing libraries (bcrypt, Argon2) handle salting automatically — you do not implement it manually.
// Node.js example using bcrypt
const bcrypt = require('bcrypt');
const SALT_ROUNDS = 12; // higher = slower = more secure
// Storing a new password
const hash = await bcrypt.hash(plaintextPassword, SALT_ROUNDS);
// Store 'hash' in the database — never store the plain password
// Verifying a login attempt
const isMatch = await bcrypt.compare(inputPassword, storedHash);
// Returns true or false
Quick Reference: When to Use What
| Use Case | Algorithm | Notes |
|---|---|---|
| File integrity / checksums | SHA-256 | Fast, secure, widely supported |
| Password storage | Argon2id or bcrypt | Never use SHA-256 for passwords |
| HMAC / API signatures | HMAC-SHA256 | Standard for webhook verification |
| Digital certificates | SHA-256 | SHA-1 certs are rejected by browsers |
| Non-security deduplication | MD5 or SHA-256 | MD5 only when adversarial input is impossible |
Frequently Asked Questions
Can you reverse a hash?
Not directly — that is the definition of a one-way function. However, "reversing" a hash is possible through brute force: an attacker can hash millions of candidate inputs and compare them against a known hash. For short or common passwords with fast algorithms like MD5 or SHA-256, this is practical. For long, random passwords or values hashed with Argon2/bcrypt, it is computationally infeasible. The answer is: technically no, practically yes for weak inputs with weak algorithms.
Why is MD5 still used for checksums if it's "broken"?
The collision vulnerability in MD5 requires an attacker to craft a specific input that collides — they cannot choose the collision target arbitrarily. For a simple download checksum on a file you control (like verifying a Linux ISO you published yourself), an MD5 collision attack is not in the threat model. The attacker would need to compromise your server to replace the file, at which point they could replace the checksum too. MD5 checksums catch accidental corruption, not adversarial tampering. That said, SHA-256 checksums are trivially easy to generate and there is no reason to use MD5 for new work.
What is the difference between hashing and encryption?
Encryption is reversible: a ciphertext can be decrypted back to plaintext using the correct key. Hashing is one-way: a digest cannot (practically) be reversed to the original input. Use encryption when you need to recover the original data later (e.g., storing API keys you need to use). Use hashing when you only need to verify that data matches (e.g., passwords — you never need to know the original password, only whether the user's input matches the stored hash).
Conclusion
The rule of thumb is simple: SHA-256 for integrity checking, Argon2id for passwords, and HMAC-SHA256 for authentication codes. MD5 and SHA-1 are legacy algorithms that should not appear in new code.
Want to see these hashes in action? Try the Hash Generator — it computes MD5, SHA-1, SHA-256, and SHA-512 in your browser, with no data sent to a server.