MD5 vs SHA-256 vs SHA-512: Which Hash Algorithm Should You Use?
The pull request was tiny. Two lines changed. A colleague was swapping out SHA-256 for MD5 in a content-addressable cache because, as the PR description put it, "MD5 is faster and we do not need security here, this is just a cache key." I left a comment, he pushed back, and we ended up in a 40 minute Slack debate about whether a cache key can ever be a security decision. The deeper I dug, the more I realized that most "which hash should I use" debates are actually debates about three questions nobody is asking out loud: what does the attack surface look like, how often will the hash be computed, and who else is going to read this hash and decide what to trust based on it.
Table of Contents
- 1. A Short History of the SHA Family
- 2. How MD5 Actually Broke (and Why It Matters)
- 3. The Birthday Paradox and Why Output Size Matters
- 4. Construction Matters: Merkle-Damgard vs Sponge
- 5. Real Performance Numbers in 2026
- 6. The Decision Matrix by Use Case
- 7. Why MD5 Is Still Everywhere
- 8. Generating Hashes with StackConvert
- 9. Frequently Asked Questions
I want to skip the basics and get into the parts that actually decide which algorithm is right for a given job. If you already know what a hash function is in the abstract and you want a quick number for something, the StackConvert hash generator will give you MD5, SHA-1, SHA-256, or SHA-512 in the browser without touching a command line. The rest of this article is for the harder question: when you sit down to design a system, how do you actually choose?
A Short History of the SHA Family
The algorithms we argue about did not arrive all at once. They came out of a 35 year arc of cryptographic research, competitions, and broken standards. Knowing the order helps, because "newer" and "longer" are not always the same thing as "stronger".
- MD5, 1991. Ronald Rivest at MIT designed MD5 as a successor to MD4, and RFC 1321 formalized it in April 1992. It produces a 128-bit digest. The design was meant to be fast on 32-bit processors of the era, and it succeeded. It was never designed to resist the kinds of attacks that would become feasible on commodity hardware 15 years later.
- SHA-0, 1993. The NSA published SHA in 1993 as FIPS PUB 180. It was withdrawn within two years for reasons the NSA never fully disclosed, widely believed to be a flaw discovered after publication.
- SHA-1, 1995. A patched redesign published as FIPS PUB 180-1, specified again in RFC 3174. A 160-bit digest. It became the universal default for TLS certificates, code signing, Git, and more, and held up until the 2000s when theoretical attacks began to chip at it.
- SHA-2 family, 2001. FIPS PUB 180-2 introduced SHA-224, SHA-256, SHA-384, and SHA-512 as one family with a common structure but different output sizes. SHA-512/224 and SHA-512/256 were added later in FIPS 180-4 (2015). RFC 6234 is the reference implementation text most programmers quote.
- SHA-3, 2015. Standardized as FIPS PUB 202 after a six year competition that NIST ran from 2007 to 2012. The winner was Keccak by Bertoni, Daemen, Peeters, and Van Assche, a design that has nothing in common with MD5 or SHA-2 internally. NIST picked it specifically to have an algorithm with a different construction, so a future break of SHA-2 would not affect SHA-3 and vice versa.
- BLAKE2 (2012) and BLAKE3 (2020). Not FIPS standards but both widely deployed. BLAKE2 was optimized for speed, BLAKE3 went further with a Merkle-tree structure that parallelizes beautifully on multi-core processors.
The practical lesson is that MD5 is three generations of design thinking behind SHA-3, and SHA-2 sits in the comfortable middle. When a protocol from 2005 says "use SHA-1 for this" and you are maintaining it in 2026, you are not looking at a modern decision. You are looking at a frozen decision from before we knew what we know now.
How MD5 Actually Broke (and Why It Matters)
The line "MD5 is broken" gets thrown around a lot, but most engineers I talk to cannot explain what the break actually was or why it stopped mattering as a theoretical problem and started mattering as a practical one.
The first real blow landed in August 2004 at the CRYPTO conference. Xiaoyun Wang and her collaborators presented a method that found MD5 collisions in about an hour on ordinary hardware. A collision is two different inputs that produce the same MD5 output. In 2004, the cost of finding one was about $10 in electricity. That was alarming but still abstract. You could generate random collisions but you could not pick what the colliding documents would say.
The second blow was the invention of chosen-prefix collisions. Marc Stevens at CWI Amsterdam built HashClash, a tool that lets an attacker pick two arbitrary prefixes and then compute suffixes that cause the two prefixed documents to collide under MD5. Now the attacker controls the content. This is a fundamentally different threat model.
The third blow arrived in 2012, and it was operational, not academic. The Flame malware, uncovered while inspecting infected machines in Iran, contained a forged Microsoft code-signing certificate. The forgery worked because Microsoft's Terminal Server Licensing Service was still issuing certificates signed with MD5. The attackers used a chosen-prefix collision to produce a certificate that was valid from Microsoft's perspective but contained fields of their choosing. Once signed, that certificate could sign malware that Windows trusted as "Microsoft". This is the only time anyone has proven a nation-state level attack used an MD5 collision to sign malicious code, but it is exactly the kind of attack every warning assumed was possible.
Watch out: If your code flow includes "hash a blob with MD5, sign the hash, trust anyone with a matching hash", you already have a Flame-shaped hole. It does not matter whether the blob is code, configuration, or a license key. The attack does not require breaking the signature. It requires finding a second blob that hashes the same.
SHA-1 followed a similar arc. Theoretical attacks appeared in 2005. In February 2017, Google's SHAttered project demonstrated the first known public SHA-1 collision, using about 6,500 CPU years plus 110 GPU years of computation. That is expensive but not impossible. By 2020, chosen-prefix SHA-1 collisions were down to about $45,000 in cloud compute. Browsers stopped trusting SHA-1 TLS certificates by January 2017. Git 2.29 (October 2020) added experimental SHA-256 support for commit IDs. The migration is still in progress in 2026.
The Birthday Paradox and Why Output Size Matters
If you throw 23 people into a room, there is a better than even chance two of them share a birthday. That tiny piece of combinatorics is the same reason a 128-bit hash does not give you 128 bits of collision resistance.
The math says that for a hash with an output of n bits, the effort to find a collision by brute force is roughly 2 to the power of n/2, not 2 to the power of n. This is called the birthday bound. For each common algorithm, the numbers shake out like this:
| Algorithm | Output bits | Birthday bound | Real-world feasibility |
|---|---|---|---|
| MD5 | 128 | 2^64 | Cheap today; literally seconds on a laptop for random collisions |
| SHA-1 | 160 | 2^80 (theoretical), 2^63 (after SHAttered) | Tens of thousands of dollars on cloud GPUs |
| SHA-256 | 256 | 2^128 | Beyond reach; consumes more energy than humanity can produce |
| SHA-512 | 512 | 2^256 | Past-the-heat-death-of-the-universe infeasible |
When someone asks me why SHA-512 exists when SHA-256 is already unbreakable, the honest answer is that it is not really for raw collision resistance. Nobody is going to brute-force SHA-256 this decade or next. SHA-512 exists for defense in depth (different failure modes), for performance reasons on 64-bit CPUs that I will cover in a moment, and for high-security domains where auditors and regulators prefer the longer digest on paper.
Construction Matters: Merkle-Damgard vs Sponge
Two hashes with the same output size can still have very different security properties because of how they are built internally. MD5, SHA-1, and the entire SHA-2 family use a construction called Merkle-Damgard. It is a very specific pattern: initialize a fixed state, absorb the input in blocks, mix each block into the state with a compression function, output the final state. Simple and elegant, but it has a subtle weakness called length extension.
The length extension attack works like this. If I know the MD5 or SHA-256 hash of "secret || message" for some unknown secret, and I know the length of the secret, I can compute the hash of "secret || message || padding || my_addition" without ever knowing the secret itself. That is because the Merkle-Damgard state at the end of the hash is exactly the state I need to continue hashing more data.
This is not a theoretical weakness. In 2009, Netflix had to patch a subscription API that used raw SHA-1 for signatures. Attackers could extend legitimate requests and forge signed actions. Flickr's 2009 API authentication had the same flaw. The fix in both cases was to stop using a raw hash for message authentication and switch to HMAC.
# Vulnerable: raw hash used as MAC
signature = sha256(secret + message).hexdigest()
# Correct: HMAC handles the padding correctly and is not length-extensible
import hmac, hashlib
signature = hmac.new(secret, message, hashlib.sha256).hexdigest()SHA-3 does not have this problem. It uses a sponge construction where the internal state is larger than the output, and not all of it is revealed. You cannot resume hashing from an output, because the output is only a window into the state. This is why SHA-3 is sometimes described as "immune by design" to length extension.
Pro tip: If you are building anything that uses a hash as a message authenticator, use HMAC with SHA-256 or SHA-512, not a raw hash. HMAC is defined in RFC 2104 and exists specifically to make length extension a non-issue. Every language stdlib has it. It costs you one function call and saves you a category of bugs.
Real Performance Numbers in 2026
The classic statement is "MD5 is fast, SHA-256 is slow". That was true in 2005. It is not true in 2026 on any CPU made in the last decade, and it has been causing people to pick the wrong algorithm for a long time.
Intel added SHA-NI instructions in 2016 (Goldmont and later; AMD since Ryzen). ARMv8 added dedicated SHA-256 and SHA-1 instructions as an optional crypto extension, widely deployed since 2017. These are single-instruction hardware accelerators for the round functions of SHA-1 and SHA-256. When software takes advantage of them, SHA-256 can exceed 2 GB/s per core on a modern server CPU. Without them, the same code runs closer to 400 MB/s. MD5 has no such instruction because nobody wanted to accelerate a broken algorithm, so it sits around 600 MB/s regardless of hardware.
Here are representative throughput numbers on an x86_64 server (Xeon class, SHA-NI enabled) running OpenSSL 3.0:
| Algorithm | Throughput (single core) | Notes |
|---|---|---|
| MD5 | ~600 MB/s | No hardware acceleration, pure SIMD |
| SHA-1 | ~900 MB/s with SHA-NI | Accelerated on recent x86 |
| SHA-256 | ~2 GB/s with SHA-NI | Faster than MD5 on modern hardware |
| SHA-512 | ~1.2 GB/s | Often faster than SHA-256 when SHA-NI is absent (ARM, Apple Silicon pre-2021) |
| BLAKE3 | 6+ GB/s on 16 cores | Parallel by design; scales linearly |
A quick way to see what your machine is capable of, if you have OpenSSL installed:
openssl speed -evp md5 sha1 sha256 sha512Before assuming MD5 will be faster for your workload, run that command. On most modern Linux servers you will find SHA-256 already matches or beats MD5. For in-memory hashing the difference rarely matters anyway; for disk-backed hashing, I/O dominates and the algorithm choice is invisible in the profile.
The Decision Matrix by Use Case
This is the section I wanted to read when I was learning all of this. Forget the generic table. Here is what to use for the things you actually build.
| Use case | Recommended | Why |
|---|---|---|
| Downloaded file integrity (you control both ends) | SHA-256 | Universal tooling; faster than MD5 with SHA-NI. For the full publisher-side workflow see verifying file downloads. |
| TLS certificate signatures | SHA-256 or SHA-384 | SHA-1 distrusted by all browsers since Jan 2017 per CA/B Forum |
| Code signing for long-lived binaries | SHA-384 or SHA-512 | Extra margin; certificates may be valid for years |
| Git commit IDs | SHA-256 (Git 2.29+) or SHA-1 legacy | SHA-256 mode available; repos migrate slowly due to tooling |
| Bitcoin proof-of-work and transaction IDs | Double SHA-256 | Satoshi's original design; no plans to change |
| Ethereum and EVM chains | Keccak-256 | Pre-standardization Keccak, not the FIPS 202 SHA-3 variant |
| HMAC message authentication | HMAC-SHA-256 | Standard across AWS SigV4, Stripe webhooks, most APIs |
| Subresource Integrity (HTML scripts/styles) | SHA-256, SHA-384, or SHA-512 | HTML spec allows only these three |
| Content-addressable storage (IPFS, object dedup) | SHA-256 | IPFS default; widely tooled |
| HTTP ETag generation | MD5 is fine | Server-controlled; security not a factor |
| Internal cache keys, dedup in trusted env | MD5 or xxHash | Speed matters, attacker cannot influence inputs |
| Password storage | None of these; use bcrypt or Argon2 | General-purpose hashes are too fast; see the bcrypt guide |
The line I find myself repeating in code reviews: if an attacker could ever choose what gets hashed and benefit from a collision, use SHA-256 or stronger. If the inputs are fully under your control and the hash is never compared against an externally supplied value, MD5 is still acceptable. The hard part is being honest about which world you are in.
Why MD5 Is Still Everywhere
Given everything, you might expect MD5 to have disappeared by now. It has not. In 2026 it is still shipping in new code every day, and that is not because developers are careless. It is because the non-security use cases for a fast 128-bit fingerprint are real.
- HTTP ETag headers. RFC 7232 leaves the format unspecified. Most web servers default to something cheap, and MD5 of the response body is the most common choice. Nothing an attacker can exploit here; the server picks the ETag and the client just echoes it back.
- Content deduplication in storage systems. Backup tools, rsync, and legacy object stores often use MD5 as a first-pass dedupe check, falling back to a byte compare or a stronger hash on collision. Fast is the feature.
- Database partitioning and sharding keys. When you need to distribute rows across N shards by hashing a primary key, all you need is uniform distribution, not cryptographic strength. MurmurHash and xxHash are even more common than MD5 for this now, but MD5 still appears.
- Legacy protocols. SNMPv3 authentication in HMAC-MD5-96 form is still deployed. NTLMv2 uses MD5 internally. RADIUS shared-secret authentication uses MD5. Ripping them out requires vendor coordination; in practice, they are layered with TLS or IPsec and MD5's weakness is not the weakest link.
- JWT "kid" (key ID) headers. Often just MD5 of the public key. It is an identifier, not a signature.
- Language stdlib availability. Every language exposes MD5 in one line. For non-security fingerprinting, MD5 is literally the path of least resistance.
None of these are "wrong" uses of MD5. What is wrong is assuming that because MD5 is "fine here", it is fine somewhere else you have not thought about. Every time a codebase expands an MD5 usage past its original context, someone has to re-verify the reasoning. That is the real tax of keeping MD5 around.
Generating Hashes with StackConvert
When you need a quick hash to verify something, paste into a test, or compare against a published checksum, the StackConvert online hash tool runs entirely in your browser. Nothing leaves your machine, which matters when the input is a license key, an API secret, or a configuration snippet you do not want logged to a server. It supports MD5, SHA-1, SHA-256, SHA-512, and a handful of others, so you can compare algorithms on the same input in seconds.
For automation and CI, I use the language-native APIs. A few minimum-viable examples:
# Python
import hashlib
digest = hashlib.sha256(b'hello world').hexdigest()
# Node.js
const crypto = require('crypto');
const digest = crypto.createHash('sha256').update('hello world').digest('hex');
# Go
import "crypto/sha256"
sum := sha256.Sum256([]byte("hello world"))
# Shell (one-liner)
echo -n "hello world" | sha256sumIf your workflow involves generating many hashes and comparing them, or computing hashes of strings rather than files, the hash generator walkthrough goes through the tool step by step and shows the OS-native checksum commands for each platform.
Frequently Asked Questions
Is there a practical attack where MD5 collisions would compromise my software downloads?
Yes, if an attacker can choose the content before the hash is published. The 2012 Flame malware is the canonical example. If you publish an MD5 for a file you generated, and an attacker later wants to replace it with a malicious file of the same MD5, they have to find a collision against your specific file, which is still difficult. If an attacker can submit a file for you to sign or publish, chosen-prefix attacks let them prepare two files in advance, one benign and one malicious, that have the same MD5. Then they get you to sign the benign one. Bump to SHA-256 and the attack is currently infeasible.
Why is SHA-512 sometimes faster than SHA-256 on a 64-bit CPU?
SHA-256 uses 32-bit words in its compression function. SHA-512 uses 64-bit words. On a 64-bit CPU without hardware SHA-256 acceleration, operating on 64-bit words hits native register width and is more efficient per byte of input. With SHA-NI available, SHA-256 gets an order of magnitude boost and usually wins again. So on older ARM or pre-2016 x86 you often see SHA-512 ahead; on modern server CPUs, SHA-256 wins.
Should I care about length extension attacks in my code?
Only if you are implementing a message authentication code, session token, signed URL, or any protocol where you hash a secret concatenated with user input. If so, use HMAC-SHA-256 instead of raw hashing. If you are just computing a checksum of a file or an identifier for a cache, length extension is not relevant.
Is SHA-3 just a bigger SHA-2?
No, and that is the whole point of its existence. SHA-3 uses the Keccak sponge construction, which has nothing in common with the Merkle-Damgard structure behind MD5, SHA-1, and SHA-2. NIST picked Keccak after a six-year competition specifically to have a structurally different backup algorithm. If a future flaw hits SHA-2, SHA-3 is unlikely to be affected by the same kind of attack.
BLAKE3 is faster and newer. Why is not everyone using it?
Standardization inertia. BLAKE3 is not a FIPS standard, so projects with compliance requirements (government, healthcare, finance) cannot use it without an exception. It is also still too new for most OS-native checksum tools to include by default. That said, content-addressable storage systems and VCS projects like Jujutsu are starting to adopt it, and Rust's cargo uses it for package hashing since 2023. Expect it to spread in the next five years.
Why does Bitcoin use SHA-256 twice in a row?
Double SHA-256 (sometimes called SHA-256d) protects against length extension attacks that would otherwise apply to raw SHA-256. Since Bitcoin was designed in 2008, Satoshi chose the belt-and-braces approach of hashing twice. It also doubles the work per hash, which is relevant for proof-of-work economics. Ethereum made a different choice: Keccak-256, the pre-standardization version of SHA-3, which is length-extension-immune by construction and does not need to be doubled.
What does FIPS 140 compliance mean and do I need it?
FIPS 140 is a US government standard for cryptographic modules. FIPS 140-3 is the current revision, issued in 2019. If you sell software to US federal agencies or regulated industries like healthcare or banking, you likely need FIPS-validated cryptographic libraries. That list effectively includes MD5 only for non-security uses, SHA-2, and SHA-3, and excludes unstandardized algorithms like BLAKE3. Commercial libraries like OpenSSL offer FIPS-validated builds. If you are not selling to those customers, FIPS compliance is optional.
How does HMAC fix the weaknesses of a raw hash?
HMAC wraps the underlying hash in a specific double-hash construction: hash((key XOR opad) concat hash((key XOR ipad) concat message)). That structure defeats length extension because the output is a hash of a hash, and the inner hash's state is never revealed. HMAC-SHA-256 is the canonical form used by AWS Signature Version 4, Stripe webhook signatures, GitHub webhook signatures, and most modern APIs. It is defined in RFC 2104.