The AI landscape: who makes what, and why it changes every month

Naming which model is "best" is a fool's errand — it changes every six weeks. Understanding the shape of the landscape doesn't. Here's the map.

The tiers

Frontier labs. OpenAI (GPT family), Anthropic (Claude family), Google DeepMind (Gemini family), xAI (Grok). These train the largest closed models. Each has a distinct flavor: OpenAI moves fast and shapes UX, Anthropic emphasizes safety and long context, Google has the best multimodal and massive context windows, xAI leans into real-time data.

Fast followers. Mistral, Cohere, AI21. Smaller orgs that ship competitive models, often faster to release or better licensed.

Open weights. Meta (Llama family), Mistral's open releases, DeepSeek, Qwen. You can download the weights and run them yourself — with a GPU. Capability varies; the top open models are within striking distance of frontier closed models but lag on reasoning-heavy tasks.

Fine-tune shops. Companies like Together, Fireworks, and Replicate host open-weight models as APIs and often offer fine-tuning. Useful when you want open-weight privacy without running infrastructure.

How to choose, shape-first

Forget benchmarks for a second. Decide on four axes:

Closed vs. open weights. Closed = easier, better at frontier. Open = more control, better for privacy, more work.
Managed API vs. self-hosted. Managed = simpler. Self-hosted = more control over cost, latency, and data handling.
General vs. specialized. Most projects want a general model. Specialized (coding-specific, medical-specific) makes sense when the niche is deep.
Reasoning vs. fast. "Reasoning" models (o-series, Claude Thinking, Gemini Thinking) take longer but handle harder problems. Fast models are better for latency-sensitive surfaces.

The capability you actually need

Most production AI features don't need frontier. They need:

Reliable instruction-following.
Reasonable latency.
Tolerable cost per call.
A provider you trust on privacy.

That's a much wider field than "whichever model topped LMSYS Arena this week."

Why the landscape shifts so fast

Three things drive churn:

New releases every 6-12 weeks at the frontier. Rankings reshuffle.
Price drops — per-token costs have fallen ~50× in four years. Models that were "too expensive to use" become default-acceptable fast.
Capability jumps are usually stepwise — a model gains a category of skills (vision, long context, tool use) in one release.

How to keep current without being consumed

Pick two things to watch:

One leaderboard (LMSYS, MTEB, or a task-specific one that maps to your use case).
One team's technical blog (Anthropic, OpenAI, Meta AI).

Skip the breathless announcement cycle. Check the leaderboard monthly. Read the technical posts when they drop. Everything else is noise.

Naming which model is "best" is a fool's errand — it changes every six weeks. Understanding the shape of the landscape doesn't. Here's the map.

The tiers

Fast followers. Mistral, Cohere, AI21. Smaller orgs that ship competitive models, often faster to release or better licensed.

How to choose, shape-first

Forget benchmarks for a second. Decide on four axes:

Closed vs. open weights. Closed = easier, better at frontier. Open = more control, better for privacy, more work.
Managed API vs. self-hosted. Managed = simpler. Self-hosted = more control over cost, latency, and data handling.
General vs. specialized. Most projects want a general model. Specialized (coding-specific, medical-specific) makes sense when the niche is deep.
Reasoning vs. fast. "Reasoning" models (o-series, Claude Thinking, Gemini Thinking) take longer but handle harder problems. Fast models are better for latency-sensitive surfaces.

The capability you actually need

Most production AI features don't need frontier. They need:

Reliable instruction-following.
Reasonable latency.
Tolerable cost per call.
A provider you trust on privacy.

That's a much wider field than "whichever model topped LMSYS Arena this week."

Why the landscape shifts so fast

Three things drive churn:

New releases every 6-12 weeks at the frontier. Rankings reshuffle.
Price drops — per-token costs have fallen ~50× in four years. Models that were "too expensive to use" become default-acceptable fast.
Capability jumps are usually stepwise — a model gains a category of skills (vision, long context, tool use) in one release.

How to keep current without being consumed

Pick two things to watch:

One leaderboard (LMSYS, MTEB, or a task-specific one that maps to your use case).
One team's technical blog (Anthropic, OpenAI, Meta AI).

Skip the breathless announcement cycle. Check the leaderboard monthly. Read the technical posts when they drop. Everything else is noise.

The AI landscape: who makes what, and why it changes every month

The tiers

How to choose, shape-first

The capability you actually need

Why the landscape shifts so fast

How to keep current without being consumed

2-question self-check

Continue in this track

The AI landscape: who makes what, and why it changes every month

The tiers

How to choose, shape-first

The capability you actually need

Why the landscape shifts so fast

How to keep current without being consumed

2-question self-check

Continue in this track