Measuring AI impact (beyond usage dashboards)

"We rolled out AI to 500 people and saved $3M" is a claim that almost always evaporates under examination. Measuring AI impact honestly takes a little more work and produces much more defensible numbers.

The hierarchy of signals

From weakest to strongest:

Usage metrics. Logins, queries, seats activated. Confirms people touch the thing.
Self-reported time savings. Surveys. Useful but optimistic — people over-report gains.
Before/after task metrics. Same task measured before and after. Cleaner but hard to control for confounds.
Controlled experiments. Pilot group vs. control group. The gold standard, rarely possible in corporate settings.

Use as much of the ladder as you can. Stop when you have a defensible answer.

The obvious metrics that mislead

Seats deployed. Correlates only weakly with value. Plenty of seats go cold.
Queries per user. High query counts include messing around, tutorial runs, and people using the tool to write personal emails on company time.
Time saved, self-reported. People routinely over-report. Typical bias is 2-3× on the high side.

The metrics that actually matter

Task completion time, observed or measured. Before AI: 12 minutes to draft a status update. After AI: 4 minutes. Real number.
Defect rate downstream. If AI-assisted work produces more errors that show up later, the "time saved" was borrowed.
Quality score on sampled output. A reviewer rates AI-assisted vs. baseline work blind. Catches quality regressions.
Employee retention / satisfaction. Long-horizon signal that AI tools are genuinely improving work rather than degrading it.

Defining "savings" honestly

When you claim "AI saved us 20% of ticket handle time," interrogate it:

Is 20% time saved or tickets moved per hour? The first doesn't automatically become the second unless the work queue is constantly full.
Does "saved time" translate to higher throughput (more tickets resolved) or to less work (people finish earlier)?
What's the net quality on handled tickets? Faster with more errors is worse.

The ROI formula that stands up

Net ROI = (Benefit × adoption rate × durability) − (License cost + integration + training + ongoing governance)

Details:

Benefit: realistic per-user value (not the vendor's promise).
Adoption rate: of people with access, how many actually use it productively. Often 30-70%, rarely 90%+.
Durability: does the benefit hold after the novelty wears off? Check at 6 and 12 months.
Costs: include hidden ones — governance, security review, retraining on new models.

Teams that skip any of these terms produce ROI numbers they can't defend in a follow-up review.

The dashboard that works

Usage over time — trend matters more than snapshot.
Task-level metrics from the team using it — observed, not self-reported.
Net Promoter or CSAT — do users say they'd be upset if this were taken away?
Incident count — has this tool caused any privacy, security, or quality incidents?
Cost trend — absolute and per-active-user.

The question to ask every quarter

"If we turned this off next Monday, what would break?"

If the answer is "nothing noticeable," it was never a win. If the answer is "we'd need to staff up by X people" or "Y workflow would slow down dramatically" — you have measurable value.

"We rolled out AI to 500 people and saved $3M" is a claim that almost always evaporates under examination. Measuring AI impact honestly takes a little more work and produces much more defensible numbers.

The hierarchy of signals

From weakest to strongest:

Usage metrics. Logins, queries, seats activated. Confirms people touch the thing.
Self-reported time savings. Surveys. Useful but optimistic — people over-report gains.
Before/after task metrics. Same task measured before and after. Cleaner but hard to control for confounds.
Controlled experiments. Pilot group vs. control group. The gold standard, rarely possible in corporate settings.

Use as much of the ladder as you can. Stop when you have a defensible answer.

The obvious metrics that mislead

Seats deployed. Correlates only weakly with value. Plenty of seats go cold.
Queries per user. High query counts include messing around, tutorial runs, and people using the tool to write personal emails on company time.
Time saved, self-reported. People routinely over-report. Typical bias is 2-3× on the high side.

The metrics that actually matter

Task completion time, observed or measured. Before AI: 12 minutes to draft a status update. After AI: 4 minutes. Real number.
Defect rate downstream. If AI-assisted work produces more errors that show up later, the "time saved" was borrowed.
Quality score on sampled output. A reviewer rates AI-assisted vs. baseline work blind. Catches quality regressions.
Employee retention / satisfaction. Long-horizon signal that AI tools are genuinely improving work rather than degrading it.

Defining "savings" honestly

When you claim "AI saved us 20% of ticket handle time," interrogate it:

Is 20% time saved or tickets moved per hour? The first doesn't automatically become the second unless the work queue is constantly full.
Does "saved time" translate to higher throughput (more tickets resolved) or to less work (people finish earlier)?
What's the net quality on handled tickets? Faster with more errors is worse.

The ROI formula that stands up

Net ROI = (Benefit × adoption rate × durability) − (License cost + integration + training + ongoing governance)

Details:

Benefit: realistic per-user value (not the vendor's promise).
Adoption rate: of people with access, how many actually use it productively. Often 30-70%, rarely 90%+.
Durability: does the benefit hold after the novelty wears off? Check at 6 and 12 months.
Costs: include hidden ones — governance, security review, retraining on new models.

Teams that skip any of these terms produce ROI numbers they can't defend in a follow-up review.

The dashboard that works

Usage over time — trend matters more than snapshot.
Task-level metrics from the team using it — observed, not self-reported.
Net Promoter or CSAT — do users say they'd be upset if this were taken away?
Incident count — has this tool caused any privacy, security, or quality incidents?
Cost trend — absolute and per-active-user.

The question to ask every quarter

"If we turned this off next Monday, what would break?"

If the answer is "nothing noticeable," it was never a win. If the answer is "we'd need to staff up by X people" or "Y workflow would slow down dramatically" — you have measurable value.

Measuring AI impact (beyond usage dashboards)

The hierarchy of signals

The obvious metrics that mislead

The metrics that actually matter

Defining "savings" honestly

The ROI formula that stands up

The dashboard that works

The question to ask every quarter

2-question self-check

Continue in this track

Measuring AI impact (beyond usage dashboards)

The hierarchy of signals

The obvious metrics that mislead

The metrics that actually matter

Defining "savings" honestly

The ROI formula that stands up

The dashboard that works

The question to ask every quarter

2-question self-check

Continue in this track