Story-point estimation has crossed a line of no return.
Not everywhere. Not for every kind of work. But for a growing class of software features, the economics have shifted enough that the old ritual is starting to look strange.
I have seen estimation sessions take roughly the same amount of time it would take an AI-enabled engineer, or a very small team, to build the first usable version of the thing being estimated.
That is the uncomfortable part.
When the cost of estimating the work approaches the cost of doing the work, estimation stops being governance. It becomes overhead.
And in many cases the meeting is not even about the work itself. It is about coordinating the people who need to agree on the estimate. Product, engineering, delivery, QA, architecture, sometimes management. Everyone gets pulled into a room to produce a number that was never precise in the first place.
Before AI, that tradeoff was easier to defend. Implementation capacity was scarce. Writing code took longer. A sprint backlog represented a meaningful allocation of expensive engineering time.
AI changes the cost curve.
The first version of something is now cheap. A rough prototype, a first implementation, a draft migration plan, a generated test suite, a working UI slice. These can appear faster than the organization can align on how many points they should be.
So implementation effort is the wrong planning currency.
The better question is no longer:
"How many story points is this?"
The better question is:
"What uncertainty are we trying to retire, and what is the cheapest way to get evidence?"
In the old model, a project manager pushed work through a planning machine. Define the story, estimate the story, put it in a sprint, track velocity, report progress.
In the AI model, the job shifts. The valuable work is deciding where to spend attention: human attention and model attention both.
Tokens matter.
They are not magic. They are not free. They are directed engineering effort. Once you spend them, you do not get them back.
A vague ticket can burn tokens and produce slop. Worse, it can produce something plausible enough to create review debt.
That is where the "AI makes everything faster" story falls apart.
- AI makes the first 40β50% cheap.
- The next 10% costs more.
- And the next 10% after that costs more again.
Going from a blank page to a plausible answer is often nearly instant. Going from plausible to correct requires context. Going from correct to production-ready requires tests, review, edge cases, integration, observability, rollout thinking, and sometimes domain experts.
The last 10% may be the most expensive part of the whole process.
And with AI, you never reach 100% in the first place.
So "ready" needs a new definition.
What "ready" means now.
In a traditional agile workflow, ready meant "ready for engineers to start." The ticket had a description, maybe acceptance criteria, maybe a design, maybe enough detail to estimate.
In an agentic workflow, ready should mean "ready to spend tokens and review capacity responsibly."
It forces different questions:
- Is the outcome clear?
- Is this reversible?
- What is the cost of being wrong?
- How much confidence do we actually need?
- What evidence will tell us the result is good enough?
- Who or what verifies the output?
- Are we trying to learn, ship, or harden?
AI punishes weak product judgment. If direction is vague, the team produces more irrelevant output faster. If acceptance criteria are weak, AI fills in the blanks. If the backlog is stale, AI accelerates work that should not exist anymore.
The new project manager.
The best project managers in this new world will not be the ones who get better at extracting point estimates from engineers.
They will be the ones who classify risk, define evidence, shape smaller slices, and decide what confidence level a piece of work deserves.
They will know when to say: "Don't estimate this. Build the smallest version and show me."
They will also know when to say: "Slow down. This is security-sensitive. The agent's first answer is not good enough."
That distinction is the whole game.
Commitments built on confidence.
Commitments built on fictional precision in story points are going to age badly.
The stronger commitment is built on confidence:
- What do we know?
- What have we validated?
- What is still uncertain?
- What is the risk tier?
- What is the next cheapest learning step?
- What would make us stop, continue, or change direction?
"Estimation is dead" is both true and not quite true.
Bad estimation theater is dead. The ritual of pulling a room full of people together to guess at implementation effort is dying.
Alignment, forecasting, and accountability are not.
They are moving.
From story points to confidence. From sprint velocity to decision latency. From backlog grooming to uncertainty management. From "How long will this take?" to "What is the cheapest responsible way to find out?"
That is the real shift.
AI engineering is a different model for deciding where effort, tokens, and human judgment should go. Not just faster delivery. A different way of deciding what to build, how to validate, and when to stop.