Web Standards • Published Jan 25, 2026

Base64 Explained: How Encoding Works Under the Hood

Ever wondered what those long strings ending in `==` are? Discover the mechanics of Base64 and why it's the glue of the internet.

Base64 is one of those encodings that appears everywhere — in email attachments, CSS data URIs, JWT tokens, HTTP Basic Auth headers, and TLS certificates — yet its purpose is frequently misunderstood. Developers often confuse it with encryption or compression. It is neither. Understanding what Base64 actually does (and does not do) will save you from subtle bugs and misplaced trust in its "security."

The Problem: Binary Data in Text Protocols

Many protocols that move data across the internet — SMTP (email), HTTP headers, JSON, HTML — were designed around ASCII text. They have special characters that carry protocol meaning: newlines terminate headers, null bytes can truncate strings, certain bytes are used as delimiters.

Binary data (images, audio files, cryptographic keys, compiled executables) contains arbitrary byte values including all of these problematic characters. Sending raw binary through a text-only channel corrupts the data. Base64 solves this by converting any binary input into a string of 64 safe, printable ASCII characters: A-Z, a-z, 0-9, +, and /.

The trade-off is size: Base64 output is approximately 33% larger than the original binary input. You pay in bytes to gain transport safety.

How Base64 Works: Step by Step

Base64 encodes binary data by grouping it into 3-byte (24-bit) chunks, then splitting each chunk into four 6-bit groups. Each 6-bit value (0-63) maps to one character in the Base64 alphabet.

Let us encode the word "Man" (three ASCII characters, perfect for one chunk):

Character:  M        a        n
ASCII:      77       97       110
Binary:     01001101 01100001 01101110

Group into 6 bits:
010011 010110 000101 101110

Decimal:    19     22      5      46
Base64:      T      W      F      u

Result: "TWFu"

Every 3 bytes of input become 4 Base64 characters. For a 3KB image, the Base64 output is approximately 4KB.

The Math of the 33% Overhead

The overhead is precisely 4/3 — for every 3 bytes in, you get 4 characters out. A 100-byte binary becomes approximately 136 Base64 characters (plus padding).

This matters when you are embedding Base64 in contexts that have size limits:

HTTP headers have practical limits (often 8KB in nginx/Apache defaults)
URL query parameters have practical limits (~2KB for broad browser compatibility)
JWT tokens embedded in cookies must fit within the cookie size limit (4KB)
Data URIs in CSS increase stylesheet size, which affects render performance

What Is the `==` Padding?

Base64 works in 3-byte groups. If the input is not divisible by 3, the last group is padded with zero bits, and = characters are appended to the output to indicate how many padding bytes were added.

1 remaining byte → 2 Base64 chars + == (2 padding chars)
2 remaining bytes → 3 Base64 chars + = (1 padding char)
3 remaining bytes → 4 Base64 chars, no padding needed

The padding ensures the decoder knows exactly how many bytes the original input had. Some implementations omit the padding (this is valid for many use cases), but others require it — always check the spec of the system you are integrating with.

Base64 vs. Base64url

Standard Base64 uses + and / as the 62nd and 63rd characters. These are special characters in URLs and file paths — + means a space in URL query strings, and / is a path separator.

Base64url is a URL-safe variant that replaces + with - and / with _, and usually omits the = padding. This is used in:

JWT (JSON Web Tokens) — the header and payload are Base64url encoded
OAuth 2.0 PKCE — the code challenge parameter
URL-safe file identifiers — storage systems that need unique, URL-safe IDs

If you decode a JWT and get garbage, you are probably using standard Base64 on a Base64url string. Swap the characters before decoding.

Base64 in the Browser

JavaScript provides two built-in functions for Base64: btoa() (binary to ASCII, i.e., encode) and atob() (ASCII to binary, i.e., decode). The names are counterintuitive — remember: binary to ASCII = encode.

// Encoding
const encoded = btoa("Hello, World!");
// "SGVsbG8sIFdvcmxkIQ=="

// Decoding
const decoded = atob("SGVsbG8sIFdvcmxkIQ==");
// "Hello, World!"

// IMPORTANT: btoa() only handles Latin-1 characters.
// For Unicode, you must encode to UTF-8 first:
function toBase64Unicode(str) {
    return btoa(
        encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, (_, p1) =>
            String.fromCharCode('0x' + p1)
        )
    );
}

// Modern alternative using TextEncoder (Node.js 16+ and modern browsers):
function toBase64(str) {
    const bytes = new TextEncoder().encode(str);
    return btoa(String.fromCharCode(...bytes));
}

The btoa() caveat with Unicode is a common source of bugs — if your string contains any character outside Latin-1 (emoji, Chinese characters, accented letters beyond basic Latin), btoa() throws a DOMException. Always handle this if user input is involved.

Data URIs: Inlining Images and Files

A data URI embeds file content directly into HTML or CSS, eliminating an HTTP request:

<!-- Inline PNG image as Base64 -->
<img src="data:image/png;base64,iVBORw0KGgo...">

/* Inline SVG icon in CSS */
.icon {
    background-image: url("data:image/svg+xml;base64,PHN2Zy4u...");
}

When data URIs help: Small icons (under ~2KB) that are used on nearly every page load. The eliminated HTTP request can outweigh the extra bytes, especially on high-latency connections.

When data URIs hurt: For images larger than a few KB, the 33% overhead and the inability to be separately cached makes data URIs slower than a separate HTTP request with a long cache TTL. Do not inline large images or images that appear only on some pages.

Base64 Inside JWTs

JSON Web Tokens have a structure of three Base64url-encoded parts separated by dots: header.payload.signature. The header and payload are just JSON objects encoded with Base64url — they are not encrypted. Anyone can decode them. The signature prevents tampering, but it does not hide the content.

// Decode a JWT payload (without verifying the signature)
function decodeJWTPayload(token) {
    const [, payload] = token.split('.');
    // Base64url to standard Base64
    const base64 = payload.replace(/-/g, '+').replace(/_/g, '/');
    return JSON.parse(atob(base64));
}

const claims = decodeJWTPayload("eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo0Mn0.sig");
// { user_id: 42 }

This is why you should never put sensitive data (passwords, PII, secrets) in a JWT payload unless the token itself is encrypted (JWE, not JWS).

Frequently Asked Questions

Is Base64 encryption?

No. Base64 is an encoding — a reversible transformation from binary to ASCII text. Anyone who sees a Base64 string can decode it instantly with a single function call. There is no key, no secret, and no security provided. Do not use Base64 to "protect" sensitive data. If you need to protect data, use actual encryption (AES-256-GCM via the Web Crypto API, for example).

Why does the `=` padding matter?

Padding tells the decoder how many bytes of actual data the last Base64 group contains. Without it, the decoder might misinterpret the final bytes. However, many systems work correctly without padding because they can infer the padding from the string length. The rule: include padding unless the spec you are implementing explicitly says to omit it (JWT and Base64url typically omit it).

Does Base64 compress data?

The opposite — Base64 expands data by approximately 33%. If you need compression, use gzip or Brotli at the transport level (Content-Encoding in HTTP). Compressing before encoding can reduce the expansion, but Base64 itself never reduces size.

Want to encode or decode Base64 right now? Use the Base64 Encoder — it handles standard Base64, Base64url, and UTF-8 input, all locally in your browser.