Story Points Estimator

Estimate story points using complexity, uncertainty, and effort factors

Result
Story Points
5
Raw Score6.4
Estimated Duration1-3 days
Confidence
High
Risk Level
Low
Should Split?No

About This Tool

Score a backlog item across three dimensions — complexity, uncertainty, and effort — and the estimator maps your scores onto a Fibonacci point value (1, 2, 3, 5, 8, 13). Use it as a tiebreaker during planning when the team disagrees on a number.

The weights default to equal across the three axes, but you can adjust. Some teams care most about uncertainty; others find effort dominates because their work is repetitive but slow. Tune the weights to match how your team's cycle time correlates with the dimensions in retrospect.

Story points are not hours. The whole point is to abstract away from time and capture relative size — how much harder is this story than that one. Treating the output as 'about X days' defeats the purpose; track velocity in points per sprint instead.

Why Fibonacci spacing: estimation accuracy degrades nonlinearly with size. The difference between a 1-point story and a 2-point story is real and meaningful. The difference between a 21-point story and a 22-point story is noise. Fibonacci's exponential gap (1, 2, 3, 5, 8, 13, 21) forces the team to commit to a class of size rather than fake precision. Linear scales (1-10) tempt teams into false specificity — a 7 vs an 8 conversation is wasted breath. Fibonacci ends those debates by removing the option.

Worked example: a backlog item is 'add login with Google.' Complexity: 3/5 (OAuth flow is well-known but has gotchas). Uncertainty: 2/5 (we've done this elsewhere). Effort: 3/5 (UI, server callback, token storage, error handling). Composite: ~3 points. Compare to 'add full conversation memory to the chat agent.' Complexity: 5/5 (state management, vector embeddings, retrieval logic). Uncertainty: 4/5 (we don't know how big the corpus gets). Effort: 5/5 (significant implementation across services). Composite: 13 points — and 13 is the team's signal that the story should be split because nothing 13+ is safe to commit to in a single sprint.

Where story points fail: comparing across teams. Team A's 5 isn't Team B's 5 — points are calibrated against each team's local sense of size. Cross-team velocity comparisons are meaningless and create perverse incentives. Aggregate organization-level capacity from concrete throughput (PRs merged, features shipped) rather than summed points. Also: if your team consistently underestimates large stories, the bias is in the planning culture, not the math. Run a retrospective that compares planned points to actual cycle time, find which categories blow up, and price them higher next time.

The about text and FAQ on this page were drafted with AI assistance and reviewed by a member of the Coherence Daddy team before publishing. See our Content Policy for editorial standards.

Frequently Asked Questions