JSON vs XML: Why JSON Won the Web
In the early 2000s, the web was built on angle brackets. Today, it's built on curly braces. Here's how a simple JavaScript object notation conquered the world.
In the late 1990s and early 2000s, XML was the dominant data format for everything: APIs, config files, document storage, inter-service messaging, and build systems. Entire careers were built on XML expertise. Frameworks like SOAP, XSLT, and XPath thrived. Then JSON happened, and within a decade, XML's dominance in web APIs collapsed almost entirely.
This was not accidental. JSON won the API format wars for specific, technical reasons — and understanding those reasons helps you make better decisions about when JSON is the right tool, and when XML still has a legitimate role.
A Brief History: How JSON Won
JSON (JavaScript Object Notation) was popularised by Douglas Crockford in the early 2000s as a
lightweight alternative to XML for browser-server communication. Its key insight was that
JavaScript already had a native syntax for objects and arrays — JSON was essentially that
syntax written as a portable string. No schema required. No transformation step. Just
JSON.parse() and you had a live JavaScript object.
REST APIs, popularised by Roy Fielding's 2000 dissertation and then by companies like Twitter and Flickr offering JSON alongside SOAP in the mid-2000s, gave developers a choice. They chose JSON overwhelmingly. By 2010, new public APIs defaulted to JSON. By 2015, XML-only APIs were considered legacy.
1. The Readability Factor
The same data structure in both formats illustrates the readability gap immediately:
<!-- XML -->
<user>
<id>42</id>
<name>Noah</name>
<roles>
<role>admin</role>
<role>editor</role>
</roles>
<active>true</active>
</user>
// JSON
{
"id": 42,
"name": "Noah",
"roles": ["admin", "editor"],
"active": true
}
The JSON version is shorter and visually lighter. But readability is not just aesthetics — less noise means less cognitive load when debugging, less to scan in log files, and less to type when writing tests. XML's opening and closing tags for every element are essential for documents with complex mixed content (text with embedded markup), but for data-only payloads they are pure overhead.
2. File Size and Bandwidth
The verbose tag syntax means XML payloads are consistently larger than their JSON equivalents. For the user example above, the XML is approximately 170 bytes versus JSON at 70 bytes — a 2.4x difference. At scale (millions of API calls per day), this difference in bandwidth costs money and increases latency.
HTTP compression (gzip, Brotli) significantly reduces this gap — both formats compress well since both are repetitive text. In practice, compressed XML and compressed JSON are much closer in size than their raw forms. But raw transfer size still matters for responses that do not warrant compression (small payloads), and for scenarios where CPU overhead for decompression is a concern.
3. Native Parsing in JavaScript
This was JSON's decisive advantage in the browser era. JavaScript can parse JSON to a native object in a single function call:
// JSON — one step, returns a real JS object
const data = JSON.parse(responseText);
console.log(data.user.name); // "Noah"
// XML — requires DOM parsing, then traversal
const parser = new DOMParser();
const doc = parser.parseFromString(responseText, "text/xml");
const name = doc.querySelector("user name").textContent; // "Noah"
The XML approach requires a DOM parser, creates an entirely different object model, and uses different traversal APIs than the rest of JavaScript. JSON simply becomes the object you wanted. For server-side JavaScript (Node.js), the same applies. For other languages, both require a parsing step, but JSON parsers tend to be faster because the grammar is simpler.
4. Data Types: JSON's Native Advantage
JSON has six native types: string, number, boolean, null, object, and array. When you parse
{"active": true, "count": 42}, you get a real boolean and a real integer —
not strings that need to be parsed again.
XML has one native type: text. Everything is a string. <active>true</active>
gives you the string "true", not the boolean true. You must handle
type coercion yourself, or use XML Schema Definition (XSD) to define types — but then you
need both the document and the schema, adding complexity.
This matters for:
- Integers that should sort numerically, not alphabetically
- Booleans used in conditional logic
- Null values vs. missing elements (XML conflates absent elements with explicitly null values)
- Floating-point numbers that must not be re-parsed from string representations
5. JSON5 and JSONC: The Unofficial Extensions
Standard JSON is deliberately minimal — no comments, no trailing commas, no single quotes. This makes it easy to parse but frustrating to write and maintain in configuration files where you want to explain why a value is set a certain way.
Two informal supersets address this:
- JSON5 — adds comments, trailing commas, single-quoted strings, hex literals, and multiline strings. Used by some build tools and config systems.
- JSONC (JSON with Comments) — adds only
//and/* */comments. Used by VS Code'ssettings.json,tsconfig.json, and similar configuration files.
Neither is valid standard JSON — you cannot use JSON.parse() on them. They
require their own parsers (json5 npm package, or VS Code's internal parser).
Use them only in config files, never in API payloads.
6. Schema Validation: JSON Schema vs XSD
XML has had formal schema validation (XSD) since 1998. JSON Schema is a more recent specification (currently at Draft 2020-12) that provides equivalent capability:
// JSON Schema example
{
"type": "object",
"properties": {
"id": { "type": "integer", "minimum": 1 },
"name": { "type": "string", "minLength": 1 },
"active": { "type": "boolean" }
},
"required": ["id", "name"]
}
Libraries like Ajv (JavaScript) and jsonschema (Python) validate JSON payloads against a schema at runtime. Many API frameworks (FastAPI, NestJS, Spring) generate JSON Schema from type annotations automatically.
When XML Is Still the Right Choice
XML has genuine strengths that JSON does not replicate well:
- Document markup: XML was designed for documents with mixed content — text with embedded elements. HTML is a form of XML. SVG is XML. If you need to mark up paragraphs where parts of sentences are bold, italic, or linked, XML handles this naturally. JSON does not have a clean answer for "rich text."
- SOAP and enterprise integrations: Many legacy enterprise systems, government APIs, and banking interfaces use SOAP (XML over HTTP). If you are integrating with SAP, many EDI systems, or certain financial data providers, XML is mandatory.
- XSLT transformations: XML has a mature transformation pipeline (XSLT, XPath, XQuery) for converting one XML structure to another. There is no JSON equivalent with the same standardisation and tooling depth.
- Namespaces: XML namespaces allow multiple vocabularies to coexist in one document without collision — critical for standards like SAML (security assertions) and WSDL (service descriptions).
- Comments in data: XML supports comments natively. This is useful in configuration files and documents where explaining a value inline matters.
Frequently Asked Questions
Can JSON have comments?
Standard JSON cannot. Douglas Crockford intentionally removed comment support to prevent comments from being used as parsing directives (a problem he had encountered with config files). If you need comments in a JSON-like config file, use JSONC or JSON5 (with an appropriate parser), YAML, or TOML — all of which support comments natively.
What is the difference between null and undefined in JSON?
JSON has null as an explicit value. It does not have undefined —
that is a JavaScript concept. When you call JSON.stringify() on an object with
an undefined value, the key is omitted entirely from the output:
JSON.stringify({a: 1, b: undefined}) produces '{"a":1}'. When
parsing, a missing key gives you undefined when you access it in JavaScript,
while an explicit null key gives you null. These are semantically
different: null means "this field exists and has no value";
undefined means "this field does not exist in the document."
Should I use YAML or TOML instead of JSON for config files?
For configuration files (not API payloads), YAML and TOML are often better choices precisely
because JSON lacks comments and has strict quoting requirements. YAML is more concise but
has infamously surprising behaviour (the Norway problem: country: NO parses as
boolean false). TOML is stricter and more explicit, making it safer for config. Many modern
tools default to TOML (Cargo.toml in Rust, pyproject.toml in
Python). For human-edited config, TOML or YAML; for machine-generated and machine-consumed
data, JSON.
Need to format, validate, or explore a JSON payload? The JSON Formatter formats, minifies, and visualises JSON as an interactive tree — paste any API response and explore it in seconds.