JSON Schema Validation: Protecting Your API with Data Contracts
We had a 2 a.m. incident once because an upstream service started sending us price as a string instead of a number. Our TypeScript types had said price: number for two years. Nothing about the deploy that broke us was visible in the type system, the lint output, or the unit tests. The only signal was a queue of currency-formatted invoices rendering as "NaN." That was the night I stopped trusting compile-time types as my only line of defense.
Types Are Not Enough at Runtime
TypeScript, Pydantic, Java records — these tools are wonderful for catching mistakes you can prove statically. The moment data crosses your application boundary, though, the guarantee evaporates. JSON arriving from an HTTP request, a queue, a webhook, or a third-party API has whatever shape the sender felt like producing. The compiler never sees it. If you cast or unmarshal blindly, you are betting that every upstream is well-behaved forever.
A data contract closes the gap. It is a machine-readable description of what your service is willing to accept, written once and enforced both at the edge of your system and in the documentation you publish to consumers. The lingua franca for that description in the JSON-shaped world is JSON Schema: a JSON document that says, in detail, "valid input looks like this."
The two tools work together rather than competing. Static types prevent your code from compiling if you misuse a value. JSON Schema prevents a malformed value from reaching your code in the first place. Drop either one and you will rediscover why the other exists.
Anatomy of a JSON Schema
A schema is itself a JSON document. The simplest non-trivial example describes an object with a few properties:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://api.example.com/schemas/invoice.json",
"title": "Invoice",
"type": "object",
"required": ["id", "amount", "currency", "issuedAt"],
"additionalProperties": false,
"properties": {
"id": { "type": "string", "format": "uuid" },
"amount": { "type": "number", "minimum": 0, "exclusiveMaximum": 1000000 },
"currency": { "type": "string", "enum": ["USD", "EUR", "KRW", "JPY"] },
"issuedAt": { "type": "string", "format": "date-time" },
"memo": { "type": "string", "maxLength": 280 }
}
}Even at this size the schema is doing real work. It rejects invoices missing a currency, it rejects negative amounts, it rejects payloads where someone added an extra "internalNote" field by accident, it rejects strings in fields that should be ISO date-times. The rest of your code can assume the input is sane because the schema already said so.
The Anchors
$schemadeclares which dialect (draft) the schema is written in. Validators use it to pick the correct rules.$idgives the schema a stable URI so other schemas can reference it via$ref.type,required, andpropertiesdefine the structural shape of the value.additionalProperties: falseturns "extra fields are silently ignored" into "extra fields are an error." Use it for inbound validation. Avoid it on outbound responses to keep forward compatibility.
Which Draft Should You Pick?
JSON Schema has gone through several drafts. The differences are mostly about sharper semantics for edge cases, but the choice still matters because tooling support varies.
| Draft | Year | Notable changes | When to pick it |
|---|---|---|---|
| draft-04 | 2013 | First widely deployed draft | Legacy systems only |
| draft-07 | 2018 | if/then/else, contentEncoding | Anything OpenAPI 3.0 |
| 2019-09 | 2019 | unevaluatedProperties / Items | OpenAPI 3.1 baseline |
| 2020-12 | 2020 | Tuple prefixItems, dynamic refs | New projects in 2026 |
The default for new work today is 2020-12. Ajv, Hyperjump, and the Python jsonschema library all support it. If you have to interoperate with OpenAPI 3.0, you will be writing draft-07 (with a few OpenAPI-specific deviations); OpenAPI 3.1 finally aligned with 2020-12 so the gap is closing.
The Keywords That Actually Matter
JSON Schema has dozens of keywords. In practice you will lean on a small handful daily.
Composition: anyOf, oneOf, allOf
{
"oneOf": [
{
"type": "object",
"required": ["kind", "cardLast4"],
"properties": {
"kind": { "const": "card" },
"cardLast4": { "type": "string", "pattern": "^[0-9]{4}$" }
}
},
{
"type": "object",
"required": ["kind", "iban"],
"properties": {
"kind": { "const": "bank" },
"iban": { "type": "string", "minLength": 15, "maxLength": 34 }
}
}
]
}oneOf means the value must match exactly one of the listed branches. anyOf accepts any number of matches, and allOf requires all of them. For tagged unions like the one above, prefer oneOf with a const discriminator. Validator output is much cleaner because the engine can pinpoint which branch you intended.
Reuse: $ref
Define common pieces once, reference them from many places. A typical layout puts every reusable schema under $defs at the top of a root document, then refers to them with { "$ref": "#/$defs/Money" }. You can also reference external files via URL once you have stable $ids, which is how multi-service organizations share schema libraries.
Strings: format, pattern
formatcovers common semantic types —email,uri,uuid,date-time,ipv4. Most validators only enforce these when you ask for "strict mode" or load a format plugin, because the spec defines them as annotations by default.patternis a regular expression. Keep these short and well-tested. A regex you cannot read in five seconds is a regex you will inadvertently break later.
Numbers and Arrays
For numbers use minimum, maximum, multipleOf for currency precision. For arrays use items (homogeneous collections), prefixItems (heterogeneous tuples in 2020-12), minItems, maxItems, and uniqueItems.
Validating in Code: Ajv on Node.js
Ajv is the de facto JavaScript validator. It compiles a schema into a specialized validation function so the per-call cost stays low even for complex shapes.
import Ajv2020 from 'ajv/dist/2020';
import addFormats from 'ajv-formats';
const ajv = new Ajv2020({ allErrors: true, strict: true, removeAdditional: 'failing' });
addFormats(ajv); // email, uri, uuid, date-time …
// 컴파일은 1회. 검증 함수는 수십만 회 재사용한다.
const validateInvoice = ajv.compile(invoiceSchema);
export function parseInvoice(json: unknown) {
if (validateInvoice(json)) {
// json은 이제 InvoiceShape로 안전하게 사용 가능 (타입 좁히기)
return { ok: true as const, value: json as Invoice };
}
return { ok: false as const, errors: validateInvoice.errors ?? [] };
}A few habits that pay off later:
- Compile once. Building the validator function from a schema is the expensive step. Cache the compiled function at module scope, not per-request.
- Turn on
allErrors. Default behavior aborts at the first failure. You will return more useful errors to API clients with the full list. - Use
strictmode. Ajv warns on unknown keywords or accidental contradictions likeminimum: 10paired withmaximum: 5. Catching these at startup beats catching them at 3 a.m. - Pair the schema with a generated TypeScript type so your handler code gets the static guarantee while runtime validation keeps the data honest.
Where Validation Lives in a Real Service
For HTTP handlers, validate as close to the edge as possible — ideally before any business logic touches the payload. The pattern below is what most production Express, Fastify, and Koa apps end up with after a few iterations.
// Express 예시 — 스키마 검증 미들웨어
function validateBody(schema: object) {
const validate = ajv.compile(schema);
return (req: Request, res: Response, next: NextFunction) => {
if (validate(req.body)) return next();
return res.status(400).json({
error: 'INVALID_REQUEST_BODY',
details: (validate.errors ?? []).map((e) => ({
path: e.instancePath || '/',
keyword: e.keyword,
message: e.message,
params: e.params,
})),
});
};
}
router.post('/invoices', validateBody(invoiceSchema), createInvoice);Returning structured error details (not just a string) lets API consumers display useful messages without parsing prose. The instancePath in particular maps directly to the JSON pointer of the offending field, which clients can use to highlight a specific form input.
Outbound Validation Too
It is just as valuable to validate responses in tests, especially for endpoints that compose data from multiple sources. A failed response-side check during CI tells you a database migration or a refactor changed your contract before customers find out. Most teams skip this and pay for it later.
OpenAPI, AsyncAPI, and the Bigger Picture
OpenAPI describes HTTP APIs — routes, methods, parameters, and the schemas of request and response bodies. The schema part has always been JSON Schema-shaped, but it was technically a forked dialect until OpenAPI 3.1, which now aligns with JSON Schema 2020-12. AsyncAPI did the same thing for message-driven systems. If you maintain an OpenAPI document, you are already maintaining JSON Schemas; pulling them into a shared $defs registry lets you reuse them in non-HTTP places too: Kafka consumers, cron jobs, internal scripts.
A useful mental model: OpenAPI is the operational layer (where do requests go?), JSON Schema is the structural layer (what do payloads look like?). Keep them in separate files and reference one from the other rather than nesting everything into a single bloated spec.
Generating Schemas vs. Writing Them by Hand
Both directions exist and both have their uses. Generating a schema from an existing sample document is great for bootstrapping. Generating one from your TypeScript or Pydantic types keeps the schema in sync with code automatically. Writing the schema first and deriving code from it is what teams reach for when the schema is the contract that multiple services and multiple languages need to agree on.
- From sample JSON: tools like
quicktypeand BeautiCode's own JSON to JSON Schema give you a starting draft you then tighten by hand (addingrequired, enums, ranges). - From types:
ts-json-schema-generatorfor TypeScript, Pydantic'smodel_json_schema()for Python, Jackson modules for Java. - Schema-first: write the schema, generate code with
json-schema-to-typescriptordatamodel-codegen. This is the right pattern when the same payload has to be parsed in three different languages.
Pragmatic suggestion: for a single-team service in a single language, generate the schema from your types. For cross-team APIs or anything you publish externally, write the schema first and treat it as the source of truth.
Performance Notes from Production
JSON Schema validation is fast when used correctly and slow when used carelessly. The difference is whether you are paying compile costs once or once per request.
- Compile validators at module load. A compiled Ajv validator can run tens of thousands of times per second on commodity hardware. Recompiling each request turns a microsecond cost into a millisecond cost.
- Avoid schemas that allocate.
removeAdditional: 'all'anduseDefaultsmutate the input. They are convenient but cost more than read-only validation. Profile before turning them on for hot paths. - Watch out for catastrophic regex. Patterns like
^(a+)+$can take exponential time on adversarial input. If you accept user-controlled strings, run regex through a tester first. - Limit payload size before validating. A 5 GB JSON document is a denial-of-service waiting to happen, schema or not. Cap request bodies at the HTTP server.
Common Mistakes
Forgetting required
JSON Schema is permissive by default. Listing a property under propertiesdoes not make it required — it just describes what shape the value should have if it is present. If a field has to be there, add it to the required array.
Treating format as enforcement
Per the spec, format is an annotation. Some validators enforce it, some do not, some do only if you load a plugin. If you care about the format actually being checked, configure your validator explicitly (Ajv: load ajv-formats and pass strict).
Schemas that drift from code
A schema that does not run is a schema that lies. Wire validation into the actual request path or message handler — do not leave it as documentation that nobody runs. CI should also verify that example payloads in the schema validate against the schema itself; this catches the most common docs-vs-reality drift.
Anti-pattern:using JSON Schema to express business rules ("email must belong to an existing tenant," "coupon must be unexpired"). Those are domain checks, not structural checks; doing them in schema couples your data model to live application state and makes the schema impossible to reuse.
Try It with BeautiCode Tools
Schemas are easier to grok when you can iterate on them in seconds. Each of the tools below runs entirely in your browser, no upload required.
- JSON Schema Validator — paste a schema and a sample document, see exactly which keywords pass or fail. Useful for debugging mysterious 400 responses from a service that does enforce validation.
- JSON to JSON Schema — bootstrap a draft schema from a representative payload, then tighten the result (mark fields required, add enums, restrict numeric ranges).
- JSON to TypeScript — pair a generated schema with the matching
interfaceso static and runtime checks describe the same data. - JSON Formatter — pretty-print the schema and the payload side by side. Validation errors are much easier to read once both ends are formatted consistently.
Frequently Asked Questions
How is JSON Schema different from TypeScript types?
TypeScript types disappear at runtime. They are erased during compilation, so the JS that actually executes has no idea what shape a value should have. JSON Schema is data, not code, and is interpreted by a validator at runtime. The two complement each other: types prevent you from writing the wrong code, schemas prevent the wrong data from reaching your code in the first place.
Should I use Zod instead?
Zod and similar libraries (Yup, io-ts, Valibot) are excellent inside a single TypeScript codebase. They give you runtime validation and type inference in one expression. Where JSON Schema wins is portability: the same .json file validates payloads in your Node service, your Python pipeline, and your Java consumer without any of them needing to share code. Many teams use both: Zod inside the service, JSON Schema as the published contract.
How do I evolve a schema without breaking clients?
Treat backward compatibility the way you would for any wire format. Adding optional fields is safe. Adding new enum members is technically additive but breaks consumers that treat the enum as exhaustive. Removing or renaming a field is a breaking change — version the schema (a new $id URI) and run both versions in parallel during the migration window.
What if the same schema is needed in five languages?
Publish the schema once, in a versioned package or a static URL, and generate language-specific code from it as part of CI. Most ecosystems have a generator: json-schema-to-typescript, datamodel-code-generator for Python, jsonschema2pojo for Java, go-jsonschema for Go. The shared schema becomes the single source of truth; the per-language types are just artifacts.
Do I really need to validate trusted internal traffic?
Internal services drift in subtle ways: someone deploys a refactor, a queue replays old messages from a previous schema, a feature flag flips on a code path nobody tested with production data. Validation at internal boundaries turns those drifts into loud, debuggable 400s instead of silent corruption that surfaces three weeks later in a billing report. The cost is low; the savings on incident hours are real.
Related Tools
JSON Schema Validator
Validate JSON data against JSON Schema Draft-07 with detailed error reporting.
JSON to JSON Schema
Generate Draft-07 JSON Schema from JSON data with type inference and required field detection.
JSON Formatter
Format, minify, and validate your JSON data for better readability.
JSON to TypeScript
Generate TypeScript interfaces from JSON data with support for modern TypeScript features.
Related Articles
How to Generate Secure Passwords in 2026: A Complete Guide
Learn why strong passwords matter and how to generate secure passwords using entropy, length, and complexity. Includes practical tips and free tools.
2025-12-15 · 8 min readData FormatsJSON vs YAML: When to Use What — A Developer's Guide
Compare JSON and YAML formats with syntax examples, pros and cons, and use case recommendations for APIs, configs, and CI/CD pipelines.
2025-12-28 · 10 min read