Structured outputs: JSON, XML, and the tax of each
Get reliable structured data out of language models.
If your post-processing pipeline parses JSON out of an LLM response with regex, you're carrying technical debt. Structured outputs — native schema-constrained generation — replace that mess.
The three eras of structured output
- "Please return JSON". Works maybe 95% of the time. That 5% is production incidents.
- JSON mode. Guarantees valid JSON syntax but not schema conformance. The model can still return JSON with the wrong fields.
- Schema-constrained generation. The model's sampler is restricted to only produce tokens that match your schema. Guaranteed conformant. Available across major providers (OpenAI response_format, Anthropic tool use, Google structured output).
If you're still on era 1 or 2, migrating is almost always worth it.
What it costs you
Two things:
- Schema complexity. The model has to reason in schema-land. Deeply nested objects with dozens of fields degrade quality. Flat schemas with clear field names perform best.
- Expressiveness. Any field you want the model to generate, you have to declare up front. Truly open-ended responses don't fit.
Trade-off: you give up "anything goes" for "always parseable." For production systems, that's nearly always a net win.
Schema design that works
- Descriptive field names.
customer_sentimentbeatscs. The model uses the field name as a hint about what to produce. - Enum over free text whenever the set of values is known. Forces discipline.
- Required vs. optional. Make truly optional fields optional — marking everything required and then telling the model "leave this empty if not present" is how you get empty strings that should have been null.
- One level of nesting max unless you truly need more. Deeper schemas produce more errors.
JSON vs. XML
OpenAI and Anthropic both support schema-constrained JSON. Many Anthropic prompts internally use XML for input structure because it's resilient to parsing errors — but outputs should still be JSON for downstream consumption.
Streaming structured outputs
JSON mode with streaming is a mild pain — you can't JSON.parse a partial response. Two strategies:
- Stream the tokens to the user for perceived latency, but only parse the complete response server-side.
- Stream only specific fields — some providers (OpenAI, Vercel AI SDK) let you stream partial structured outputs where complete fields arrive as they finish.
The common trap
Asking for "reasoning": string as a field. The model will write a long reasoning, and now you've paid for a bunch of output tokens to generate text you probably don't show the user. Either:
- Use the model's built-in reasoning channel (reasoning models).
- Skip the field and ask for the decision directly.
- Keep the reasoning field only if you actually want to display/log it.
Check your understanding
2-question self-check
Optional. Your answers feed your knowledge score on the track certificate.
Q1.Schema-constrained generation differs from 'please return JSON' in that it…
Q2.Which schema choice usually hurts quality on structured outputs?
Continue in this track
More lessons from Prompt Engineering Mastery.
Lesson 2
Chain-of-thought: when it helps and when it hurts
The real trade-offs of asking a model to think out loud.
Lesson 3
Few-shot learning done right
How to pick examples that teach, not just pad.
Lesson 5
Prompt chaining vs. one-shot prompting
When to break a task apart and when to let the model handle it whole.