Structured outputs: JSON, XML, and the tax of each

If your post-processing pipeline parses JSON out of an LLM response with regex, you're carrying technical debt. Structured outputs — native schema-constrained generation — replace that mess.

The three eras of structured output

"Please return JSON". Works maybe 95% of the time. That 5% is production incidents.
JSON mode. Guarantees valid JSON syntax but not schema conformance. The model can still return JSON with the wrong fields.
Schema-constrained generation. The model's sampler is restricted to only produce tokens that match your schema. Guaranteed conformant. Available across major providers (OpenAI response_format, Anthropic tool use, Google structured output).

If you're still on era 1 or 2, migrating is almost always worth it.

What it costs you

Two things:

Schema complexity. The model has to reason in schema-land. Deeply nested objects with dozens of fields degrade quality. Flat schemas with clear field names perform best.
Expressiveness. Any field you want the model to generate, you have to declare up front. Truly open-ended responses don't fit.

Trade-off: you give up "anything goes" for "always parseable." For production systems, that's nearly always a net win.

Schema design that works

Descriptive field names. customer_sentiment beats cs. The model uses the field name as a hint about what to produce.
Enum over free text whenever the set of values is known. Forces discipline.
Required vs. optional. Make truly optional fields optional — marking everything required and then telling the model "leave this empty if not present" is how you get empty strings that should have been null.
One level of nesting max unless you truly need more. Deeper schemas produce more errors.

JSON vs. XML

OpenAI and Anthropic both support schema-constrained JSON. Many Anthropic prompts internally use XML for input structure because it's resilient to parsing errors — but outputs should still be JSON for downstream consumption.

Streaming structured outputs

JSON mode with streaming is a mild pain — you can't JSON.parse a partial response. Two strategies:

Stream the tokens to the user for perceived latency, but only parse the complete response server-side.
Stream only specific fields — some providers (OpenAI, Vercel AI SDK) let you stream partial structured outputs where complete fields arrive as they finish.

The common trap

Asking for "reasoning": string as a field. The model will write a long reasoning, and now you've paid for a bunch of output tokens to generate text you probably don't show the user. Either:

Use the model's built-in reasoning channel (reasoning models).
Skip the field and ask for the decision directly.
Keep the reasoning field only if you actually want to display/log it.

If your post-processing pipeline parses JSON out of an LLM response with regex, you're carrying technical debt. Structured outputs — native schema-constrained generation — replace that mess.

The three eras of structured output

"Please return JSON". Works maybe 95% of the time. That 5% is production incidents.
JSON mode. Guarantees valid JSON syntax but not schema conformance. The model can still return JSON with the wrong fields.
Schema-constrained generation. The model's sampler is restricted to only produce tokens that match your schema. Guaranteed conformant. Available across major providers (OpenAI response_format, Anthropic tool use, Google structured output).

If you're still on era 1 or 2, migrating is almost always worth it.

What it costs you

Two things:

Schema complexity. The model has to reason in schema-land. Deeply nested objects with dozens of fields degrade quality. Flat schemas with clear field names perform best.
Expressiveness. Any field you want the model to generate, you have to declare up front. Truly open-ended responses don't fit.

Trade-off: you give up "anything goes" for "always parseable." For production systems, that's nearly always a net win.

Schema design that works

Descriptive field names. customer_sentiment beats cs. The model uses the field name as a hint about what to produce.
Enum over free text whenever the set of values is known. Forces discipline.
Required vs. optional. Make truly optional fields optional — marking everything required and then telling the model "leave this empty if not present" is how you get empty strings that should have been null.
One level of nesting max unless you truly need more. Deeper schemas produce more errors.

JSON vs. XML

Streaming structured outputs

JSON mode with streaming is a mild pain — you can't JSON.parse a partial response. Two strategies:

Stream the tokens to the user for perceived latency, but only parse the complete response server-side.
Stream only specific fields — some providers (OpenAI, Vercel AI SDK) let you stream partial structured outputs where complete fields arrive as they finish.

The common trap

Asking for "reasoning": string as a field. The model will write a long reasoning, and now you've paid for a bunch of output tokens to generate text you probably don't show the user. Either:

Use the model's built-in reasoning channel (reasoning models).
Skip the field and ask for the decision directly.
Keep the reasoning field only if you actually want to display/log it.

Structured outputs: JSON, XML, and the tax of each

The three eras of structured output

What it costs you

Schema design that works

JSON vs. XML

Streaming structured outputs

The common trap

2-question self-check

Continue in this track

Structured outputs: JSON, XML, and the tax of each

The three eras of structured output

What it costs you

Schema design that works

JSON vs. XML

Streaming structured outputs

The common trap

2-question self-check

Continue in this track