Relative Estimation in Agile: The Complete Guide to Sizing Work Without Hours

Q: How does relative estimation work with the #NoEstimates movement?

The #NoEstimates movement, championed by Woody Zuill and Vasco Duarte, isn't necessarily opposed to relative estimation - it questions whether formal estimation of individual items adds enough value to justify the time spent. The core argument: if you consistently split stories to roughly similar sizes and track completion rates (throughput), you can forecast just as accurately without estimating each item. In practice, many teams evolve from detailed relative estimation to a hybrid approach: they use relative sizing only for large or uncertain items, while treating well-understood items as roughly equal in size. This reduces estimation overhead while retaining the risk-surfacing benefit of relative comparison for complex work. The key insight is that #NoEstimates isn't 'no planning' - it replaces estimation with empirical throughput data.

Q: How does relative estimation work in SAFe (Scaled Agile Framework) at the Program level?

In SAFe, relative estimation operates at two levels. At the team level, each Agile Team estimates User Stories using story points relative to their own reference stories, exactly as a standalone Scrum team would. At the Program level during PI (Program Increment) Planning, Features are estimated using 'normalized' story points or T-shirt sizes so that the Agile Release Train (ART) can forecast PI capacity. SAFe recommends that all teams on an ART calibrate against shared reference stories to create rough comparability - but it explicitly warns against comparing raw velocity across teams. Some ARTs use a two-stage approach: T-shirt sizing for Feature-level PI planning, then story point estimation for individual team Sprint planning. This preserves the speed of T-shirt sizing for high-level planning while maintaining story point precision for Sprint-level execution.

Q: How do you facilitate relative estimation with remote or distributed teams?

Remote relative estimation requires deliberate tool and process choices. For Planning Poker, tools like Parabol, PlanITpoker, Scrum Poker Online, or built-in Jira estimation features allow simultaneous card reveal without physical presence. For affinity estimation, virtual whiteboards (Miro, Mural, FigJam) work well - team members drag story cards into size columns. Key adaptations for remote: First, establish clear reference stories in a shared document that everyone can see during sessions. Second, use asynchronous pre-estimation where team members review stories independently before the synchronous session, reducing meeting time. Third, set explicit time limits per item because remote discussions tend to run longer without physical cues. Fourth, consider asynchronous Planning Poker where estimates are submitted within a time window and only divergent items trigger synchronous discussion. Many remote teams report that asynchronous estimation reduces synchronous meeting time by 40-60%.

Q: How do you explain relative estimation to management that expects hour-based estimates?

The key is translating relative estimates into what management actually needs: delivery timelines. Management doesn't truly need to know that a feature is '8 story points' - they need to know when it will be delivered. Frame the conversation around velocity and forecasting: 'Our team completes an average of 28 story points per Sprint. The features you've prioritized total 84 points. That's approximately 3 Sprints - 6 weeks - to deliver.' Show them a release burndown chart that converts points to projected completion dates. If they push for hours, explain the accuracy advantage: 'When we estimated in hours, our forecasts were off by 40-60%. With relative estimation and velocity tracking, we're within 15-20%.' Most managers care about forecast reliability, not the estimation unit. Once they see consistent delivery against velocity-based forecasts, resistance typically fades within 2-3 Sprints.

Q: Can AI or machine learning tools replace manual relative estimation?

Several tools now offer AI-assisted estimation - analyzing historical stories to suggest relative sizes for new ones based on similarity in description, acceptance criteria, and technical components. Tools like LinearB, Jira's predictive features, and specialized ML models can provide initial estimates that teams then review. However, AI cannot replace the discussion that relative estimation generates. The primary value of estimation isn't the number - it's the conversation that surfaces hidden assumptions, identifies risks, and aligns the team's understanding. When two developers disagree on whether something is a 3 or an 8, the discussion reveals misunderstandings that could cause days of rework. AI can't facilitate that. The most likely future is AI-augmented estimation: the tool suggests an initial relative size based on patterns, the team reviews and adjusts through brief discussion, and ML improves suggestions over time. This could reduce estimation time by 30-50% while preserving the risk-surfacing benefit.

Q: How does relative estimation interact with DevOps and continuous delivery practices?

In continuous delivery environments where teams deploy multiple times per day, relative estimation still provides value for planning - even if deployments happen continuously, Sprint boundaries or time windows provide regular checkpoints for measuring velocity. However, CD teams that enforce strict work item splitting policies (everything under 3 points) may find that relative estimation adds overhead without proportional benefit, since most items are similar in size. These teams often supplement or replace story points with flow metrics: lead time (request to production), cycle time (work start to completion), and throughput (items completed per week). The decision depends on item size variability: if your items vary significantly in size, relative estimation adds forecasting value that flow metrics alone can't provide. If items are consistently small and similar, flow metrics may serve you better with less overhead.

Q: What estimation games or exercises can help teams learn relative estimation?

Several exercises help teams internalize relative thinking before applying it to real work. The 'Animal Sizing' game: give the team 15-20 animal names and ask them to place them on a Fibonacci scale by relative size (not exact weight). This demonstrates that relative comparison is fast and intuitive. The 'Rock Sorting' exercise: give each team member a bag of rocks and ask them to sort by weight without a scale - then verify with a scale. This shows that relative comparison is surprisingly accurate. The 'Dog Points' exercise from Mike Cohn: estimate the effort of various dog breeds (grooming, exercise, training) in 'dog points' using relative comparison. This separates the concept from software anxiety. The 'Triangulation' exercise: present three completed stories with known sizes, then ask the team to size five new stories by comparison. All these exercises build the comparison muscle without the pressure of real project estimation.

Q: How do you handle relative estimation when team composition changes frequently?

Team composition changes affect velocity, not the relative sizes of work items. A 5-point story is still a 5-point story regardless of who's on the team - the effort, complexity, and uncertainty haven't changed. What changes is how quickly the team delivers those points. When someone leaves, expect velocity to drop for 2-3 Sprints. When someone joins, expect velocity to stay flat or decrease temporarily due to onboarding and mentoring overhead. The key practices: First, maintain thorough reference story documentation so new members can quickly calibrate. Second, invite new members to observe 1-2 estimation sessions before participating. Third, treat the first 3-4 Sprints after a composition change as a recalibration period - don't hold the team to pre-change velocity. Fourth, communicate the impact to stakeholders proactively: 'Our velocity will likely drop from 30 to 24 for the next 3 Sprints as we onboard two new members, with recovery expected by Sprint 4.'

Q: Is relative estimation compatible with fixed-price or fixed-scope contracts?

Yes, but it requires a translation layer. Relative estimation produces sizes (story points), not durations. For contracts that require time or cost commitments, you need velocity data to convert. The approach: estimate the full scope using affinity estimation or Planning Poker, calculate the total story points, divide by the team's historical velocity (or projected velocity for new teams), and convert to a Sprint count with a confidence range. For example: '480 total points / 32 points per Sprint = 15 Sprints. With a 20% buffer for uncertainty, plan for 18 Sprints.' Present the Sprint range (optimistic/likely/pessimistic) rather than a single number. Many government and enterprise contractors use this approach: relative estimation internally for planning and execution, with velocity-based conversion for contractual milestones. The benefit is that scope can be reordered within the fixed budget without re-estimating from scratch.

Q: How does relative estimation work for non-software work like marketing, design, or content creation?

Relative estimation works for any knowledge work where items vary in size and complexity. Marketing teams use it for campaign planning: a social media post might be 2 points while a full video campaign is 13. Design teams use it for design sprints: a button redesign is 1 point while a complete page redesign is 8. Content teams use it for editorial calendars: a blog post is 3 points while a whitepaper is 13. The key is the same as software: establish reference stories for your specific work type, compare new items against them, and track velocity to forecast completion. The reference stories will be domain-specific (a '5-point marketing campaign' looks different from a '5-point software feature'), but the relative comparison mechanism is identical. Teams outside software often find T-shirt sizing more intuitive as a starting point, then transition to numeric scales if they need velocity-based forecasting.

By Abhay Talreja

2/1/2026

My latest article - Empirical Process Control - The Key to Agile Success

Relative Estimation in Agile: The Complete Guide to Sizing Work Without Hours

Relative estimation is a technique where teams size work items by comparing them to each other rather than estimating in absolute units like hours or days. Instead of asking "How long will this take?", teams ask "Is this bigger or smaller than that other thing we did?" This shift in thinking is one of the most powerful - and most misunderstood - concepts in agile. Teams that embrace relative estimation consistently produce more accurate forecasts, spend less time estimating, and surface hidden risks earlier. This guide covers why relative estimation works, the techniques available, how to implement it, and the mistakes that derail teams.

Quick Answer: Relative vs Absolute Estimation

Aspect	Relative Estimation	Absolute Estimation
What you estimate	Size compared to other work items	Hours, days, or calendar time
Core question	"Is this bigger or smaller than X?"	"How many hours will this take?"
Unit of measure	Story points, T-shirt sizes, or buckets	Hours, days, or person-days
Precision	Deliberately coarse (e.g., Fibonacci scale)	Falsely precise ("exactly 14 hours")
Person-dependent	No - the team estimates together	Yes - depends on who does the work
Accuracy over time	Improves as team calibrates velocity	Stays inconsistent regardless of practice
Best for	Sprint planning, release forecasting, backlog sizing	Task breakdown within a story, time tracking

Table Of Contents-

What Is Relative Estimation?

The Core Insight

Relative estimation is the practice of sizing work by comparing items against each other rather than assigning absolute time values. When a team uses relative estimation, they don't ask "How many hours will this story take?" They ask "How does this story compare to other stories we've already sized or completed?"

The output is a relative size - a number, a label, or a category that positions the item on a scale compared to everything else in the Product Backlog. A story estimated at 8 points isn't "8 hours" or "8 days" - it means "this story is roughly 8/5ths the size of our 5-point reference story."

Relative estimation is NOT in the Scrum Guide. The Scrum Guide doesn't prescribe any specific estimation technique. Relative estimation is a complementary practice widely used by Scrum teams because it works exceptionally well with velocity-based planning and empirical process control. On the PSM-1 exam, you need to understand the concept of sizing work relative to other work - not any specific estimation technique.

A Simple Analogy

Imagine sorting a stack of rocks by weight. You have two options:

Absolute approach: Weigh each rock on a scale, write down the grams, and sort by number.
Relative approach: Pick up two rocks - one in each hand - and feel which is heavier. Repeat with other rocks until you have them sorted light to heavy.

The relative approach is faster, requires no tools, and produces a useful ranking. You don't know the exact weight of each rock, but you know with confidence that Rock C is about twice as heavy as Rock A. For planning purposes - "Can I carry these rocks in one trip?" - relative comparison is usually sufficient.

Software estimation works the same way. You rarely need to know exact hours. You need to know relative size so you can answer: "Can we fit this work into the next Sprint?"

Why Relative Estimation Works Better Than Absolute

The Psychology: Weber-Fechner Law

The Weber-Fechner law, established in the 19th century, states that humans perceive differences in stimuli proportionally rather than absolutely. You can easily tell the difference between lifting a 1 kg weight and a 2 kg weight (100% difference). But telling the difference between a 50 kg weight and a 51 kg weight (2% difference) is much harder, even though both differences are exactly 1 kg.

This law explains why the Fibonacci sequence works so well for estimation. The gaps between values grow proportionally: 1-2-3-5-8-13-21. Each number is roughly 60% larger than the previous one. This mirrors how our brains actually process magnitude - we distinguish proportional differences, not absolute ones.

When teams estimate in hours, they're forced to make absolute judgments that their brains aren't wired for. When they estimate in relative sizes with growing intervals, they're working with their cognitive strengths rather than against them.

Cognitive Science: Comparison vs Prediction

Research in cognitive psychology consistently shows that humans are much better at comparing than predicting:

Comparison (relative): "Is building this API endpoint bigger or smaller than the login feature we built last Sprint?" Your brain retrieves a concrete memory and makes a quick comparison. This activates recognition memory, which is fast and reliable.
Prediction (absolute): "How many hours will this API endpoint take?" Your brain has to simulate the entire future execution of the task - every edge case, every interruption, every unknown. This activates constructive imagination, which is slow and unreliable.

Teams that switch from hours to relative estimation typically see their forecasting accuracy improve within 4-6 Sprints because they stop fighting their own cognitive architecture.

The Anchoring Advantage

Relative estimation gives teams a concrete anchor - a reference story - that makes estimation faster and more consistent. Without an anchor, each estimation session starts from scratch: "So... is this 16 hours?" With a reference story, the conversation is grounded: "Our 5-point reference story was the user profile API. This new story is similar in complexity but with more uncertainty from the third-party integration, so it's probably an 8."

Anchoring also reduces the spread of estimates within a team. When everyone compares against the same reference, their estimates naturally converge. Without a reference, each person anchors on their own private mental model, producing wider divergence.

Absorbing Uncertainty

Absolute estimates create pressure for false precision. Saying "14 hours" implies you know the task duration to a level of accuracy that software development rarely supports. When that 14-hour estimate becomes 22 hours, it feels like a failure.

Relative estimates embrace uncertainty by design. Saying "this is about the same size as that 8-point story" acknowledges that you don't know the exact hours - and you don't need to. The Fibonacci scale's growing gaps deliberately prevent false precision: you can't say "this is a 6" when your choices are 5 or 8, and that coarseness is a feature, not a bug.

The Reference Story Approach

Choosing Your Baseline

A reference story is a well-understood, completed piece of work that the team uses as a benchmark for all future estimates. Choosing the right reference story is critical - it becomes the ruler against which everything else is measured.

Good reference story characteristics:

The team has completed it recently enough to remember the details
It was a medium-sized piece of work (not the smallest thing ever, not the biggest)
The effort, complexity, and uncertainty were all moderate
Most team members worked on or are familiar with it
It represents a typical type of work for the team

Assign your reference story a value in the middle of your scale - typically 3 or 5 on a Fibonacci scale.

Building a Reference Catalog

A single reference story isn't enough. Build a catalog of 5-7 reference stories that span your estimation scale:

Points	Reference Story	Why This Size
1	Add a tooltip to existing button	Trivial effort, no complexity, no uncertainty
2	Add input validation to existing form field	Small effort, low complexity, no uncertainty
3	Create new API endpoint with standard CRUD	Moderate effort, low complexity, minimal uncertainty
5	Build user profile page with API integration	Significant effort, moderate complexity, some uncertainty
8	Integrate third-party payment gateway	Large effort, high complexity, notable uncertainty
13	Redesign notification system with real-time push	Very large effort, high complexity, significant uncertainty

Review and update this catalog every quarter or when team composition changes significantly. New team members should study these reference stories before their first estimation session.

Calibration Over Time

Relative estimation improves through calibration - the process of comparing estimates to actual outcomes and adjusting:

Sprint Retrospective review: "We estimated this at 5 points but it was clearly an 8 - what did we miss?"
Pattern identification: "We consistently under-estimate stories involving database migrations by about one Fibonacci level."
Reference update: "Our 5-point reference story no longer reflects what a 5 feels like. Let's pick a better one."

This calibration loop is the engine that makes relative estimation increasingly accurate over time. Teams that skip calibration get stuck with noisy estimates that never improve.

Relative Estimation Techniques

Story Points

Story points are the most widely used relative estimation unit. Each story point represents a blend of effort, complexity, and uncertainty - expressed as a single number on a relative scale.

Key characteristics:

Team-specific: a 5-point story on Team A is not the same as a 5-point story on Team B
Scale-based: typically uses Fibonacci (1, 2, 3, 5, 8, 13, 21) or Modified Fibonacci
Velocity-connected: total story points completed per Sprint = velocity
Not convertible to hours: there is no valid story-point-to-hour conversion

Story points are ideal for Sprint-level planning and release forecasting. They require an investment in reference stories, calibration, and team consistency - but the payoff is reliable velocity data within 4-6 Sprints.

T-Shirt Sizing

T-shirt sizing uses labels instead of numbers: XS, S, M, L, XL, XXL. It's the most accessible form of relative estimation because everyone intuitively understands that a "Large" is bigger than a "Small."

Best for:

Initial backlog sizing when you have 50-200 items to estimate quickly
Roadmap-level planning where precision isn't needed
Teams new to relative estimation who find numbers intimidating
Stakeholder communication (easier to explain than story points)

Limitation: T-shirt sizes don't aggregate into velocity. You can't add up "2 Mediums and 1 Large" to get a capacity number. Many teams start with T-shirt sizing and transition to story points once they're comfortable with relative thinking.

Fibonacci Sequence Scale

The Fibonacci sequence (1, 2, 3, 5, 8, 13, 21) is the most common scale for relative estimation because its growing gaps mirror how human perception works. The sequence forces estimators to make meaningful distinctions at the small end (is this a 2 or a 3?) while preventing false precision at the large end (there's no option between 13 and 21).

Why Fibonacci works for estimation:

Gap growth matches the Weber-Fechner law of diminishing perception
Prevents debates about meaningless differences ("is this a 14 or a 15?")
Forces stories above 13 to be split - large items carry too much uncertainty
Each value is roughly 60% larger than the previous, creating consistent proportional jumps

Some teams use Modified Fibonacci (1, 2, 3, 5, 8, 13, 20, 40, 100) which replaces 21 with 20 for easier mental math and adds 40 and 100 for backlog items that need splitting.

Affinity Estimation

Affinity estimation is a rapid technique for sizing large numbers of items. The team physically or virtually groups items into relative size categories by comparing them to each other - not by discussing each one in detail.

How it works:

Lay out the scale (columns labeled 1, 2, 3, 5, 8, 13)
Place the first item in the middle column as a starting reference
Team members silently place remaining items into columns based on relative size
Review the groupings, discuss disagreements, and adjust

Speed advantage: Affinity estimation can size 50-100 items in 30-60 minutes - 10x faster than Planning Poker for the same number of items.

Best for: Initial sizing of a large backlog, PI planning in scaled frameworks, and any situation where you need rough estimates for many items quickly.

Planning Poker

Planning Poker is the gold standard for detailed relative estimation. Each Developer selects a card with their estimate simultaneously, preventing anchoring bias. When estimates diverge, the team discusses - and these discussions are often the most valuable part of the process.

How it works:

Product Owner presents the story and answers questions
Each Developer privately selects a Fibonacci card
All cards are revealed simultaneously
If estimates converge (e.g., all show 5 or 8), consensus is reached quickly
If estimates diverge (e.g., one shows 3 and another shows 13), outliers explain their reasoning
Re-vote after discussion, typically converging within 2-3 rounds

Best for: Sprint-level refinement of 5-15 items where detailed discussion and risk surfacing matter.

How to Implement Relative Estimation: Step-by-Step

Step 1: Choose Your Technique

Select the technique that matches your current need:

Situation	Recommended Technique
New team, first time estimating	T-shirt sizing (low barrier to entry)
Large backlog needs initial sizing (50+ items)	Affinity estimation
Sprint-level refinement (5-15 items)	Planning Poker with story points
Roadmap or PI planning	T-shirt sizing or affinity estimation
Mature team, stable backlog	Story points with quick consensus

Step 2: Establish Reference Stories

Before your first estimation session, identify 3-5 completed stories that span your scale. Present them to the team and agree on their relative sizes. Write them down - these become your calibration anchor for every future session.

Step 3: Run Your First Session

For Planning Poker:

Start with the reference stories visible on a board or shared screen
Present each new story, allow questions about scope and acceptance criteria
Have each Developer select a card privately, then reveal simultaneously
Discuss divergence, then re-vote
Aim for 2-5 minutes per story - if you can't converge, go with the higher estimate and move on

For Affinity Estimation:

Lay out the scale columns
Place one reference story per column
Have team members silently place remaining stories
Walk through the groupings, discuss and adjust
Aim for 10-20 seconds per story

Step 4: Track Velocity (If Using Story Points)

After each Sprint, record the total story points completed (only stories meeting the Definition of Done). After 4-6 Sprints, you'll have enough data for reliable velocity-based forecasting.

Step 5: Calibrate in Retrospectives

In each Sprint Retrospective, spend 5 minutes reviewing estimation accuracy:

Were any stories significantly larger or smaller than estimated?
What caused the surprise?
Should any reference stories be updated?
Are there systematic patterns (e.g., "we always under-estimate integration work")?

Step 6: Refine Your Scale

After 6-10 Sprints, evaluate whether your scale still works:

If everything clusters at 3-5, your reference stories may be too coarse
If you're regularly using 13+ values, your team may need to split more aggressively
If velocity is unstable, investigate whether estimation consistency or external factors are the cause

Step 7: Integrate with Planning

Once velocity is stable (coefficient of variation below 25%), use it for:

Sprint Planning: Select roughly one Sprint's worth of velocity in story points
Release Planning: Divide remaining backlog points by average velocity to forecast completion
Capacity planning: Use velocity ranges (best/average/worst) for probabilistic forecasting

Relative vs Absolute Estimation: Detailed Comparison

Dimension	Relative Estimation	Absolute Estimation
Cognitive load	Low - pattern matching and comparison	High - requires simulating future execution
Speed	Fast - most items estimated in 1-3 minutes	Slow - detailed task decomposition required
Accuracy (individual item)	Low - any single estimate may be off by a Fibonacci level	Medium - hour estimates can be close for familiar work
Accuracy (aggregate)	High - over/under estimates cancel out across a Sprint	Low - errors compound rather than cancel
Team vs individual	Team-based - reduces individual bias	Often individual - one person's guess
Handles uncertainty	Well - coarse scale absorbs unknowns	Poorly - pressure for false precision
Learning curve	Moderate - requires 4-6 Sprints to calibrate	Low - everyone understands hours
Maintenance	Requires reference stories and calibration	Requires re-estimation when scope changes
Cross-team comparison	Not possible (team-specific units)	Possible but misleading (different capabilities)
Stakeholder communication	Requires translation to dates via velocity	Direct but often inaccurate

When to Use Relative vs Absolute Estimation

Use relative estimation when:

You need Sprint-level or release-level forecasting
The team is sizing work in the Product Backlog during refinement
Work items vary significantly in size
You want to surface risks and misunderstandings through team discussion
You need aggregate accuracy across many items

Use absolute estimation when:

You're breaking a story into implementation tasks during Sprint Planning
The work is highly familiar and predictable (e.g., "this migration script takes 2 hours")
Contractual obligations require time-based estimates
You're tracking time spent for billing or compliance purposes
Individual task assignments need time boundaries

Many teams use both: relative estimation for story-level sizing (story points for Sprint planning and forecasting) and absolute estimation for task-level planning (hours for individual work organization within a Sprint).

Industry Examples

SaaS Product Development

A SaaS team with 7 developers uses story points on a Fibonacci scale with Planning Poker for Sprint refinement. Their reference stories: 1-point config changes, 3-point feature tweaks, 5-point new features with API integration, 8-point integrations with third-party services. They run 2-week Sprints with stable velocity of 34 points, allowing them to forecast quarterly releases within 1 Sprint of accuracy.

Healthcare Software

A healthcare team building EHR integration software includes regulatory compliance in their relative estimates. Stories involving PHI (Protected Health Information) automatically get sized 1-2 Fibonacci levels higher than equivalent non-PHI stories because of required HIPAA documentation, audit logging, encryption verification, and security review. Their velocity (22 points per Sprint) is lower than non-regulated teams, but forecasts are accurate because compliance effort is embedded in the relative sizes.

Financial Services

A fintech team estimating payment processing features uses T-shirt sizing for initial roadmap discussions with stakeholders (S/M/L maps to 1-month/1-quarter/multi-quarter delivery) and converts to story points for Sprint-level work. PCI-DSS compliance requirements are captured in their reference stories - a "5-point payment feature" inherently includes the compliance testing that the team has learned accompanies every payment-related change.

E-commerce Platform

An e-commerce team tracks relative estimates across three work types: customer-facing features, performance optimization, and infrastructure. They maintain separate reference catalogs for each type because the effort/complexity/uncertainty profiles differ significantly. A 5-point customer feature involves UI work and API integration, while a 5-point infrastructure change involves Terraform modules and monitoring setup. Separate catalogs prevent cross-type estimation drift.

Government Software

A government contractor team uses relative estimation within the constraints of fixed-price contracts. They estimate the initial backlog using affinity estimation to produce a total story point count, divide by projected velocity to estimate the number of Sprints, and present the Sprint count (with confidence range) to the contracting officer. Internally, they track velocity and use it for Sprint Planning. The relative estimates allow them to reorder and re-scope within the fixed budget without re-estimating in hours.

EdTech Platform

An EdTech team building a learning management system uses relative estimation with a twist: they size accessibility work separately. Every feature gets two estimates - base functionality and accessibility compliance (WCAG 2.1 AA). A feature might be 5 points for base functionality and 3 points for accessibility, producing an 8-point total. This visibility helps the Product Owner understand the cost of accessibility compliance and plan accordingly, rather than treating it as invisible overhead.

Relative Estimation Maturity Model

Stage 1: Getting Started (Sprints 1-4)

Characteristics:

Team is new to relative estimation or transitioning from hours
Estimates feel arbitrary - "Is this a 3 or a 5? I have no idea"
No velocity data exists yet
Reference stories are being established
Estimation sessions run long (30+ minutes for 5-10 items)

What to focus on:

Pick 3-5 reference stories and physically display them during every session
Don't worry about accuracy - focus on consistency (always compare to the same references)
Track velocity but don't rely on it for planning yet
Use T-shirt sizing if story points feel too abstract initially

Expected outcome: By Sprint 4, the team should converge on estimates faster and feel comfortable with the relative scale.

Stage 2: Calibrating (Sprints 5-10)

Characteristics:

Velocity data exists but is noisy (high variance between Sprints)
Team agrees on estimates faster - most items converge in 1-2 rounds
Some stories still surprise (larger or smaller than estimated)
Reference catalog is being refined based on actual completion data
Estimation sessions take 15-20 minutes for 5-10 items

What to focus on:

Compare estimates to actuals in every Retrospective
Identify systematic patterns: "We always under-estimate stories that require [X]"
Begin using velocity for Sprint capacity planning (with buffer)
Update reference stories based on what you've learned

Expected outcome: By Sprint 10, velocity variance should be decreasing and Sprint completion rates improving.

Stage 3: Reliable (Sprints 11-20)

Characteristics:

Velocity is predictable within a 15-20% range
Estimation sessions are efficient - 10-15 minutes for 5-10 items
Carry-over is rare (fewer than 1 story per Sprint)
Team has an intuitive sense of relative sizing
Reference catalog is stable and updated quarterly

What to focus on:

Use velocity ranges (best/average/worst) for release forecasting
Coach new team members using the reference catalog
Refine your split threshold based on completion patterns
Track throughput alongside velocity for cross-validation

Expected outcome: Reliable release date forecasts within 1-2 Sprints of accuracy.

Stage 4: Optimized (Sprint 20+)

Characteristics:

Velocity coefficient of variation is under 15%
Estimation takes minimal time - team often agrees without discussion
Forecasts are accurate within 10-15%
The team may start questioning whether formal estimation adds enough value
Reference stories are rarely needed - the scale is internalized

What to focus on:

Consider lightweight alternatives: quick consensus without Planning Poker cards
Evaluate whether throughput-based forecasting (story count rather than points) works for your team
Focus estimation time only on high-uncertainty or high-risk stories
Use Monte Carlo simulation for probabilistic release planning

Expected outcome: Estimation becomes a lightweight, low-overhead practice that reliably supports planning.

10 Common Relative Estimation Mistakes

Mistake #1: Converting Relative Estimates to Hours

What happens: Management or the team establishes a conversion: "1 story point = 4 hours." A 5-point story is expected to take 20 hours.

Why it's harmful: This destroys the entire purpose of relative estimation. Points become a unit of time, reintroducing person-dependency, false precision, and schedule pressure. If you're going to convert to hours anyway, you might as well just estimate in hours.

Fix: Never define or allow a point-to-hour conversion. Use velocity for time-based planning: "Our average velocity is 28 points per Sprint. The remaining backlog is 84 points. That's about 3 Sprints."

Mistake #2: Estimating Without Reference Stories

What happens: Each estimation session starts from scratch with no shared anchor. Team members estimate based on their own private mental models.

Why it's harmful: Without reference stories, estimates drift over time. What was a 5 three months ago is now a 3, making velocity trends meaningless. Team members also diverge more because they're anchoring on different personal baselines.

Fix: Maintain a catalog of 5-7 reference stories at key scale values. Display them during every estimation session. Review and update the catalog quarterly.

Mistake #3: Having One Person Dominate Estimates

What happens: The senior developer or tech lead speaks first, and everyone else adjusts their estimate to match. Or the Product Owner suggests a size before the team estimates.

Why it's harmful: This introduces anchoring bias - the first number spoken becomes the gravitational center. It also silences junior team members who might have valuable perspective on testing complexity or uncertainty.

Fix: Use simultaneous reveal (Planning Poker) for every estimation. No one speaks their estimate before the reveal. The Product Owner presents the story and answers questions but never suggests a size.

Mistake #4: Spending Too Long on a Single Estimate

What happens: The team debates whether a story is a 5 or an 8 for 15-20 minutes, often going around in circles.

Why it's harmful: The precision difference between adjacent Fibonacci values is tiny across a Sprint. Spending 15 minutes debating saves zero forecasting accuracy. It also drains energy that should go toward surfacing risks and misunderstandings.

Fix: Set a time limit of 3-5 minutes per item. If the team can't converge after two rounds, go with the higher estimate and move on. If the debate reveals that the story is poorly understood, send it back for refinement rather than continuing to estimate.

Mistake #5: Comparing Estimates Across Teams

What happens: Management notices that Team A completes 40 points per Sprint and Team B completes 25, and concludes that Team B is underperforming.

Why it's harmful: Story points are team-specific. Team A's "5 points" and Team B's "5 points" represent different amounts of work - they calibrated against different reference stories with different team compositions. Comparing them is like comparing test scores from different exams.

Fix: If cross-team comparison is needed, use objective measures: throughput (stories completed per Sprint), cycle time (days from start to completion), or business value delivered. Never compare raw story point velocity.

Mistake #6: Using Relative Estimation for Individual Performance

What happens: Individual "velocity" is tracked: "Sarah completed 18 points, Carlos completed 12."

Why it's harmful: It creates perverse incentives. Developers inflate estimates to look more productive. Collaboration drops because helping a teammate doesn't increase your personal score. Pairing and mentoring become "velocity drains." Team trust erodes.

Fix: Relative estimation produces team-level data only. Individual performance should be assessed through qualitative measures: code review quality, knowledge sharing, mentoring, and contribution to team outcomes.

Mistake #7: Skipping Calibration

What happens: The team estimates every Sprint but never reviews whether their estimates were accurate. They never update reference stories or identify systematic biases.

Why it's harmful: Without calibration, estimation accuracy plateaus or degrades. The team misses learning opportunities. Velocity data becomes unreliable for forecasting because the relationship between points and actual work drifts.

Fix: Spend 5 minutes in each Sprint Retrospective reviewing estimation accuracy. Identify the biggest surprise (most over- or under-estimated story), discuss why, and update reference stories or estimation practices accordingly.

Mistake #8: Estimating Everything

What happens: The team estimates every type of work: features, bugs, technical debt, spikes, documentation, and meetings. Everything gets story points.

Why it's harmful: Bugs and spikes are inherently unpredictable - their "size" is unknowable until you start the work. Estimating them creates false precision and clutters velocity data. If bug points count toward velocity, teams are incentivized to create more bugs.

Fix: Estimate stories (features with defined acceptance criteria). Track bugs by count, not points. Timebox spikes (e.g., "spend 2 days researching") rather than estimating them. Reserve a capacity buffer for non-estimable work.

Mistake #9: Using the Wrong Scale

What happens: A team uses a linear scale (1, 2, 3, 4, 5, 6, 7, 8, 9, 10), allowing debates about the difference between 6 and 7.

Why it's harmful: Linear scales encourage false precision. The difference between a 6 and a 7 is not meaningfully distinguishable for most stories, but the scale's existence invites the debate. Time is wasted on precision that doesn't improve forecasting.

Fix: Use Fibonacci (1, 2, 3, 5, 8, 13, 21) or a similarly non-linear scale. The growing gaps force estimators to make meaningful, detectable distinctions while preventing meaningless precision at larger sizes.

Mistake #10: Abandoning Relative Estimation Too Early

What happens: After 3-4 Sprints of noisy velocity data, the team or management declares "story points don't work" and switches back to hours.

Why it's harmful: Relative estimation needs 4-6 Sprints of calibration to produce reliable velocity data. Judging it after 3 Sprints is like judging a diet after 3 days. The early noise is the calibration process working - the team is learning to estimate consistently.

Fix: Commit to at least 8 Sprints before evaluating whether relative estimation works for your team. Track velocity variance over time - it should decrease. If it doesn't decrease after 8 Sprints, investigate root causes (team instability, scope changes, poor refinement) rather than blaming the estimation approach.

Scaling Relative Estimation Across Teams

When multiple teams work on the same product, relative estimation needs coordination:

Shared reference stories: If teams need to compare or aggregate estimates (e.g., for PI planning), establish 3-5 shared reference stories that all teams calibrate against. This creates "normalized" story points that are roughly comparable across teams.

Independent velocity: Even with shared reference stories, each team maintains its own velocity. A 5-point story may take Team A one day and Team B three days - that's fine because each team's velocity reflects their specific pace.

Portfolio-level estimation: For roadmap and portfolio planning, use T-shirt sizing or affinity estimation rather than story points. These techniques are faster and don't require cross-team point normalization.

Feature-level aggregation: When a feature spans multiple teams, each team estimates their portion independently using their own scale. The total estimate is the sum of team-level estimates, converted to Sprints using each team's velocity. Don't add raw points across teams - add projected Sprint counts.

Coordination ceremonies: During PI planning or big-room planning, use affinity estimation to create a shared view of relative feature sizes. Then each team breaks their portion into stories and estimates with their own story point scale during Sprint-level planning.

Conclusion

Relative estimation works because it aligns with how human cognition actually operates. We're wired to compare, not to predict. We perceive proportional differences, not absolute ones. We make faster and more accurate judgments when we have concrete reference points.

Key takeaways:

Relative estimation sizes work by comparison ("Is this bigger or smaller than X?") rather than prediction ("How many hours will this take?")
The Weber-Fechner law explains why Fibonacci scales work - our brains perceive proportional differences, and the scale's growing gaps mirror this
Reference stories are the foundation - without them, estimates drift and velocity becomes meaningless
Story points, T-shirt sizing, affinity estimation, and Planning Poker are all relative estimation techniques - choose based on your situation
Relative and absolute estimation can coexist: story points for Sprint planning, hours for task breakdown
Never convert story points to hours, compare velocity across teams, or use relative estimates for individual performance
Calibration is essential - review estimation accuracy in every Retrospective and update reference stories quarterly
Relative estimation needs 4-6 Sprints of practice to produce reliable velocity data - don't abandon it prematurely

Quiz on

Your Score: 0/15

Question: According to the article, what is the core question that relative estimation asks?

How many hours will this take?How many developers are needed for this?Is this bigger or smaller than X?What is the business value of this work?

Story Points in AgileDeep dive into story points - the most popular unit for relative estimation - covering scales, velocity, and common mistakes.

Planning PokerLearn the consensus-driven estimation technique that uses simultaneous reveal to prevent anchoring bias in relative estimation.

T-Shirt Sizing EstimationExplore T-shirt sizing as a lightweight relative estimation technique for roadmap-level planning and large backlog sizing.

Affinity EstimationDiscover the rapid relative estimation technique for sizing 50-200 backlog items in under an hour.

Fibonacci Sequence ScaleUnderstand why the Fibonacci sequence is the standard scale for relative estimation and how its growing gaps reflect cognitive perception.

Release PlanningLearn how relative estimates and velocity data drive release date forecasting and multi-Sprint capacity planning.

Sprint PlanningUnderstand how relative estimation feeds into Sprint Planning for selecting the right amount of work each Sprint.

Product BacklogLearn about the Product Backlog where relative estimates are assigned during refinement to enable effective Sprint and release planning.

Frequently Asked Questions (FAQs) / People Also Ask (PAA)

How does relative estimation work with the #NoEstimates movement?

How does relative estimation work in SAFe (Scaled Agile Framework) at the Program level?

How do you facilitate relative estimation with remote or distributed teams?

How do you explain relative estimation to management that expects hour-based estimates?

Can AI or machine learning tools replace manual relative estimation?

How does relative estimation interact with DevOps and continuous delivery practices?

What estimation games or exercises can help teams learn relative estimation?

How do you handle relative estimation when team composition changes frequently?

Is relative estimation compatible with fixed-price or fixed-scope contracts?

How does relative estimation work for non-software work like marketing, design, or content creation?

What's the relationship between relative estimation and Scrum's empiricism?

How do you prevent story point inflation when using relative estimation long-term?

How does relative estimation address the needs of diverse teams with mixed experience levels?

Can relative estimation be used alongside OKRs (Objectives and Key Results)?

What data privacy considerations apply to sharing relative estimation data across organizations?

Story Points Ideal Days