The first 30 days with a new model

Every month a new model lands and half the internet declares it changes everything. You don't have time to chase that. You have a product. Here's how we onboard a new model without losing a week.

Week 1 — read, don't integrate

Read the model card. Read the eval methodology. Find two independent reviews (not launch-day ones). Write down what you'd expect the model to be better and worse at based on what the lab actually measured, not what they're selling.

Week 2 — swap it into one prompt

Pick one production prompt. Run the new model side-by-side on your regression set. Don't change the prompt. If the new model loses on your set even though it's "better" on benchmarks, that tells you something about both the prompt and the model.

Week 3 — adapt one prompt for the new model

Now let yourself rewrite the prompt for the new model's strengths. A better model often wants a shorter prompt. Measure the delta vs. week 2's straight swap.

Week 4 — decide

Either the new model earns a production slot (with the rewritten prompt) or it doesn't. Write down why. That note is worth more than the eval results next time a new model drops.

What you're not doing

You're not rewriting your whole app. You're not retiring the old model. You're collecting enough information to make one decision, then moving on.