Relative Estimation in Agile: The Complete Guide to Sizing Work Without Hours
Relative Estimation in Agile: The Complete Guide to Sizing Work Without Hours
Relative estimation is a technique where teams size work items by comparing them to each other rather than estimating in absolute units like hours or days. Instead of asking "How long will this take?", teams ask "Is this bigger or smaller than that other thing we did?" This shift in thinking is one of the most powerful - and most misunderstood - concepts in agile. Teams that embrace relative estimation consistently produce more accurate forecasts, spend less time estimating, and surface hidden risks earlier. This guide covers why relative estimation works, the techniques available, how to implement it, and the mistakes that derail teams.
Quick Answer: Relative vs Absolute Estimation
| Aspect | Relative Estimation | Absolute Estimation |
|---|---|---|
| What you estimate | Size compared to other work items | Hours, days, or calendar time |
| Core question | "Is this bigger or smaller than X?" | "How many hours will this take?" |
| Unit of measure | Story points, T-shirt sizes, or buckets | Hours, days, or person-days |
| Precision | Deliberately coarse (e.g., Fibonacci scale) | Falsely precise ("exactly 14 hours") |
| Person-dependent | No - the team estimates together | Yes - depends on who does the work |
| Accuracy over time | Improves as team calibrates velocity | Stays inconsistent regardless of practice |
| Best for | Sprint planning, release forecasting, backlog sizing | Task breakdown within a story, time tracking |
Table Of Contents-
- What Is Relative Estimation? - The Core Insight - A Simple Analogy - Why Relative Estimation Works Better Than Absolute - The Psychology: Weber-Fechner Law - Cognitive Science: Comparison vs Prediction - The Anchoring Advantage - Absorbing Uncertainty - The Reference Story Approach - Choosing Your Baseline - Building a Reference Catalog - Calibration Over Time - Relative Estimation Techniques - Story Points - T-Shirt Sizing - Fibonacci Sequence Scale - Affinity Estimation - Planning Poker - How to Implement Relative Estimation: Step-by-Step - Relative vs Absolute Estimation: Detailed Comparison - When to Use Relative vs Absolute Estimation - Industry Examples - Relative Estimation Maturity Model - 10 Common Relative Estimation Mistakes - Scaling Relative Estimation Across Teams - Conclusion
What Is Relative Estimation?
The Core Insight
Relative estimation is the practice of sizing work by comparing items against each other rather than assigning absolute time values. When a team uses relative estimation, they don't ask "How many hours will this story take?" They ask "How does this story compare to other stories we've already sized or completed?"
The output is a relative size - a number, a label, or a category that positions the item on a scale compared to everything else in the Product Backlog. A story estimated at 8 points isn't "8 hours" or "8 days" - it means "this story is roughly 8/5ths the size of our 5-point reference story."
Relative estimation is NOT in the Scrum Guide. The Scrum Guide doesn't prescribe any specific estimation technique. Relative estimation is a complementary practice widely used by Scrum teams because it works exceptionally well with velocity-based planning and empirical process control. On the PSM-1 exam, you need to understand the concept of sizing work relative to other work - not any specific estimation technique.
A Simple Analogy
Imagine sorting a stack of rocks by weight. You have two options:
- Absolute approach: Weigh each rock on a scale, write down the grams, and sort by number.
- Relative approach: Pick up two rocks - one in each hand - and feel which is heavier. Repeat with other rocks until you have them sorted light to heavy.
The relative approach is faster, requires no tools, and produces a useful ranking. You don't know the exact weight of each rock, but you know with confidence that Rock C is about twice as heavy as Rock A. For planning purposes - "Can I carry these rocks in one trip?" - relative comparison is usually sufficient.
Software estimation works the same way. You rarely need to know exact hours. You need to know relative size so you can answer: "Can we fit this work into the next Sprint?"
Why Relative Estimation Works Better Than Absolute
The Psychology: Weber-Fechner Law
The Weber-Fechner law, established in the 19th century, states that humans perceive differences in stimuli proportionally rather than absolutely. You can easily tell the difference between lifting a 1 kg weight and a 2 kg weight (100% difference). But telling the difference between a 50 kg weight and a 51 kg weight (2% difference) is much harder, even though both differences are exactly 1 kg.
This law explains why the Fibonacci sequence works so well for estimation. The gaps between values grow proportionally: 1-2-3-5-8-13-21. Each number is roughly 60% larger than the previous one. This mirrors how our brains actually process magnitude - we distinguish proportional differences, not absolute ones.
When teams estimate in hours, they're forced to make absolute judgments that their brains aren't wired for. When they estimate in relative sizes with growing intervals, they're working with their cognitive strengths rather than against them.
Cognitive Science: Comparison vs Prediction
Research in cognitive psychology consistently shows that humans are much better at comparing than predicting:
- Comparison (relative): "Is building this API endpoint bigger or smaller than the login feature we built last Sprint?" Your brain retrieves a concrete memory and makes a quick comparison. This activates recognition memory, which is fast and reliable.
- Prediction (absolute): "How many hours will this API endpoint take?" Your brain has to simulate the entire future execution of the task - every edge case, every interruption, every unknown. This activates constructive imagination, which is slow and unreliable.
Teams that switch from hours to relative estimation typically see their forecasting accuracy improve within 4-6 Sprints because they stop fighting their own cognitive architecture.
The Anchoring Advantage
Relative estimation gives teams a concrete anchor - a reference story - that makes estimation faster and more consistent. Without an anchor, each estimation session starts from scratch: "So... is this 16 hours?" With a reference story, the conversation is grounded: "Our 5-point reference story was the user profile API. This new story is similar in complexity but with more uncertainty from the third-party integration, so it's probably an 8."
Anchoring also reduces the spread of estimates within a team. When everyone compares against the same reference, their estimates naturally converge. Without a reference, each person anchors on their own private mental model, producing wider divergence.
Absorbing Uncertainty
Absolute estimates create pressure for false precision. Saying "14 hours" implies you know the task duration to a level of accuracy that software development rarely supports. When that 14-hour estimate becomes 22 hours, it feels like a failure.
Relative estimates embrace uncertainty by design. Saying "this is about the same size as that 8-point story" acknowledges that you don't know the exact hours - and you don't need to. The Fibonacci scale's growing gaps deliberately prevent false precision: you can't say "this is a 6" when your choices are 5 or 8, and that coarseness is a feature, not a bug.
The Reference Story Approach
Choosing Your Baseline
A reference story is a well-understood, completed piece of work that the team uses as a benchmark for all future estimates. Choosing the right reference story is critical - it becomes the ruler against which everything else is measured.
Good reference story characteristics:
- The team has completed it recently enough to remember the details
- It was a medium-sized piece of work (not the smallest thing ever, not the biggest)
- The effort, complexity, and uncertainty were all moderate
- Most team members worked on or are familiar with it
- It represents a typical type of work for the team
Assign your reference story a value in the middle of your scale - typically 3 or 5 on a Fibonacci scale.
Building a Reference Catalog
A single reference story isn't enough. Build a catalog of 5-7 reference stories that span your estimation scale:
| Points | Reference Story | Why This Size |
|---|---|---|
| 1 | Add a tooltip to existing button | Trivial effort, no complexity, no uncertainty |
| 2 | Add input validation to existing form field | Small effort, low complexity, no uncertainty |
| 3 | Create new API endpoint with standard CRUD | Moderate effort, low complexity, minimal uncertainty |
| 5 | Build user profile page with API integration | Significant effort, moderate complexity, some uncertainty |
| 8 | Integrate third-party payment gateway | Large effort, high complexity, notable uncertainty |
| 13 | Redesign notification system with real-time push | Very large effort, high complexity, significant uncertainty |
Review and update this catalog every quarter or when team composition changes significantly. New team members should study these reference stories before their first estimation session.
Calibration Over Time
Relative estimation improves through calibration - the process of comparing estimates to actual outcomes and adjusting:
- Sprint Retrospective review: "We estimated this at 5 points but it was clearly an 8 - what did we miss?"
- Pattern identification: "We consistently under-estimate stories involving database migrations by about one Fibonacci level."
- Reference update: "Our 5-point reference story no longer reflects what a 5 feels like. Let's pick a better one."
This calibration loop is the engine that makes relative estimation increasingly accurate over time. Teams that skip calibration get stuck with noisy estimates that never improve.
Relative Estimation Techniques
Story Points
Story points are the most widely used relative estimation unit. Each story point represents a blend of effort, complexity, and uncertainty - expressed as a single number on a relative scale.
Key characteristics:
- Team-specific: a 5-point story on Team A is not the same as a 5-point story on Team B
- Scale-based: typically uses Fibonacci (1, 2, 3, 5, 8, 13, 21) or Modified Fibonacci
- Velocity-connected: total story points completed per Sprint = velocity
- Not convertible to hours: there is no valid story-point-to-hour conversion
Story points are ideal for Sprint-level planning and release forecasting. They require an investment in reference stories, calibration, and team consistency - but the payoff is reliable velocity data within 4-6 Sprints.
T-Shirt Sizing
T-shirt sizing uses labels instead of numbers: XS, S, M, L, XL, XXL. It's the most accessible form of relative estimation because everyone intuitively understands that a "Large" is bigger than a "Small."
Best for:
- Initial backlog sizing when you have 50-200 items to estimate quickly
- Roadmap-level planning where precision isn't needed
- Teams new to relative estimation who find numbers intimidating
- Stakeholder communication (easier to explain than story points)
Limitation: T-shirt sizes don't aggregate into velocity. You can't add up "2 Mediums and 1 Large" to get a capacity number. Many teams start with T-shirt sizing and transition to story points once they're comfortable with relative thinking.
Fibonacci Sequence Scale
The Fibonacci sequence (1, 2, 3, 5, 8, 13, 21) is the most common scale for relative estimation because its growing gaps mirror how human perception works. The sequence forces estimators to make meaningful distinctions at the small end (is this a 2 or a 3?) while preventing false precision at the large end (there's no option between 13 and 21).
Why Fibonacci works for estimation:
- Gap growth matches the Weber-Fechner law of diminishing perception
- Prevents debates about meaningless differences ("is this a 14 or a 15?")
- Forces stories above 13 to be split - large items carry too much uncertainty
- Each value is roughly 60% larger than the previous, creating consistent proportional jumps
Some teams use Modified Fibonacci (1, 2, 3, 5, 8, 13, 20, 40, 100) which replaces 21 with 20 for easier mental math and adds 40 and 100 for backlog items that need splitting.
Affinity Estimation
Affinity estimation is a rapid technique for sizing large numbers of items. The team physically or virtually groups items into relative size categories by comparing them to each other - not by discussing each one in detail.
How it works:
- Lay out the scale (columns labeled 1, 2, 3, 5, 8, 13)
- Place the first item in the middle column as a starting reference
- Team members silently place remaining items into columns based on relative size
- Review the groupings, discuss disagreements, and adjust
Speed advantage: Affinity estimation can size 50-100 items in 30-60 minutes - 10x faster than Planning Poker for the same number of items.
Best for: Initial sizing of a large backlog, PI planning in scaled frameworks, and any situation where you need rough estimates for many items quickly.
Planning Poker
Planning Poker is the gold standard for detailed relative estimation. Each Developer selects a card with their estimate simultaneously, preventing anchoring bias. When estimates diverge, the team discusses - and these discussions are often the most valuable part of the process.
How it works:
- Product Owner presents the story and answers questions
- Each Developer privately selects a Fibonacci card
- All cards are revealed simultaneously
- If estimates converge (e.g., all show 5 or 8), consensus is reached quickly
- If estimates diverge (e.g., one shows 3 and another shows 13), outliers explain their reasoning
- Re-vote after discussion, typically converging within 2-3 rounds
Best for: Sprint-level refinement of 5-15 items where detailed discussion and risk surfacing matter.
How to Implement Relative Estimation: Step-by-Step
Step 1: Choose Your Technique
Select the technique that matches your current need:
| Situation | Recommended Technique |
|---|---|
| New team, first time estimating | T-shirt sizing (low barrier to entry) |
| Large backlog needs initial sizing (50+ items) | Affinity estimation |
| Sprint-level refinement (5-15 items) | Planning Poker with story points |
| Roadmap or PI planning | T-shirt sizing or affinity estimation |
| Mature team, stable backlog | Story points with quick consensus |
Step 2: Establish Reference Stories
Before your first estimation session, identify 3-5 completed stories that span your scale. Present them to the team and agree on their relative sizes. Write them down - these become your calibration anchor for every future session.
Step 3: Run Your First Session
For Planning Poker:
- Start with the reference stories visible on a board or shared screen
- Present each new story, allow questions about scope and acceptance criteria
- Have each Developer select a card privately, then reveal simultaneously
- Discuss divergence, then re-vote
- Aim for 2-5 minutes per story - if you can't converge, go with the higher estimate and move on
For Affinity Estimation:
- Lay out the scale columns
- Place one reference story per column
- Have team members silently place remaining stories
- Walk through the groupings, discuss and adjust
- Aim for 10-20 seconds per story
Step 4: Track Velocity (If Using Story Points)
After each Sprint, record the total story points completed (only stories meeting the Definition of Done). After 4-6 Sprints, you'll have enough data for reliable velocity-based forecasting.
Step 5: Calibrate in Retrospectives
In each Sprint Retrospective, spend 5 minutes reviewing estimation accuracy:
- Were any stories significantly larger or smaller than estimated?
- What caused the surprise?
- Should any reference stories be updated?
- Are there systematic patterns (e.g., "we always under-estimate integration work")?
Step 6: Refine Your Scale
After 6-10 Sprints, evaluate whether your scale still works:
- If everything clusters at 3-5, your reference stories may be too coarse
- If you're regularly using 13+ values, your team may need to split more aggressively
- If velocity is unstable, investigate whether estimation consistency or external factors are the cause
Step 7: Integrate with Planning
Once velocity is stable (coefficient of variation below 25%), use it for:
- Sprint Planning: Select roughly one Sprint's worth of velocity in story points
- Release Planning: Divide remaining backlog points by average velocity to forecast completion
- Capacity planning: Use velocity ranges (best/average/worst) for probabilistic forecasting
Relative vs Absolute Estimation: Detailed Comparison
| Dimension | Relative Estimation | Absolute Estimation |
|---|---|---|
| Cognitive load | Low - pattern matching and comparison | High - requires simulating future execution |
| Speed | Fast - most items estimated in 1-3 minutes | Slow - detailed task decomposition required |
| Accuracy (individual item) | Low - any single estimate may be off by a Fibonacci level | Medium - hour estimates can be close for familiar work |
| Accuracy (aggregate) | High - over/under estimates cancel out across a Sprint | Low - errors compound rather than cancel |
| Team vs individual | Team-based - reduces individual bias | Often individual - one person's guess |
| Handles uncertainty | Well - coarse scale absorbs unknowns | Poorly - pressure for false precision |
| Learning curve | Moderate - requires 4-6 Sprints to calibrate | Low - everyone understands hours |
| Maintenance | Requires reference stories and calibration | Requires re-estimation when scope changes |
| Cross-team comparison | Not possible (team-specific units) | Possible but misleading (different capabilities) |
| Stakeholder communication | Requires translation to dates via velocity | Direct but often inaccurate |
When to Use Relative vs Absolute Estimation
Use relative estimation when:
- You need Sprint-level or release-level forecasting
- The team is sizing work in the Product Backlog during refinement
- Work items vary significantly in size
- You want to surface risks and misunderstandings through team discussion
- You need aggregate accuracy across many items
Use absolute estimation when:
- You're breaking a story into implementation tasks during Sprint Planning
- The work is highly familiar and predictable (e.g., "this migration script takes 2 hours")
- Contractual obligations require time-based estimates
- You're tracking time spent for billing or compliance purposes
- Individual task assignments need time boundaries
Many teams use both: relative estimation for story-level sizing (story points for Sprint planning and forecasting) and absolute estimation for task-level planning (hours for individual work organization within a Sprint).
Industry Examples
SaaS Product Development
A SaaS team with 7 developers uses story points on a Fibonacci scale with Planning Poker for Sprint refinement. Their reference stories: 1-point config changes, 3-point feature tweaks, 5-point new features with API integration, 8-point integrations with third-party services. They run 2-week Sprints with stable velocity of 34 points, allowing them to forecast quarterly releases within 1 Sprint of accuracy.
Healthcare Software
A healthcare team building EHR integration software includes regulatory compliance in their relative estimates. Stories involving PHI (Protected Health Information) automatically get sized 1-2 Fibonacci levels higher than equivalent non-PHI stories because of required HIPAA documentation, audit logging, encryption verification, and security review. Their velocity (22 points per Sprint) is lower than non-regulated teams, but forecasts are accurate because compliance effort is embedded in the relative sizes.
Financial Services
A fintech team estimating payment processing features uses T-shirt sizing for initial roadmap discussions with stakeholders (S/M/L maps to 1-month/1-quarter/multi-quarter delivery) and converts to story points for Sprint-level work. PCI-DSS compliance requirements are captured in their reference stories - a "5-point payment feature" inherently includes the compliance testing that the team has learned accompanies every payment-related change.
E-commerce Platform
An e-commerce team tracks relative estimates across three work types: customer-facing features, performance optimization, and infrastructure. They maintain separate reference catalogs for each type because the effort/complexity/uncertainty profiles differ significantly. A 5-point customer feature involves UI work and API integration, while a 5-point infrastructure change involves Terraform modules and monitoring setup. Separate catalogs prevent cross-type estimation drift.
Government Software
A government contractor team uses relative estimation within the constraints of fixed-price contracts. They estimate the initial backlog using affinity estimation to produce a total story point count, divide by projected velocity to estimate the number of Sprints, and present the Sprint count (with confidence range) to the contracting officer. Internally, they track velocity and use it for Sprint Planning. The relative estimates allow them to reorder and re-scope within the fixed budget without re-estimating in hours.
EdTech Platform
An EdTech team building a learning management system uses relative estimation with a twist: they size accessibility work separately. Every feature gets two estimates - base functionality and accessibility compliance (WCAG 2.1 AA). A feature might be 5 points for base functionality and 3 points for accessibility, producing an 8-point total. This visibility helps the Product Owner understand the cost of accessibility compliance and plan accordingly, rather than treating it as invisible overhead.
Relative Estimation Maturity Model
Stage 1: Getting Started (Sprints 1-4)
Characteristics:
- Team is new to relative estimation or transitioning from hours
- Estimates feel arbitrary - "Is this a 3 or a 5? I have no idea"
- No velocity data exists yet
- Reference stories are being established
- Estimation sessions run long (30+ minutes for 5-10 items)
What to focus on:
- Pick 3-5 reference stories and physically display them during every session
- Don't worry about accuracy - focus on consistency (always compare to the same references)
- Track velocity but don't rely on it for planning yet
- Use T-shirt sizing if story points feel too abstract initially
Expected outcome: By Sprint 4, the team should converge on estimates faster and feel comfortable with the relative scale.
Stage 2: Calibrating (Sprints 5-10)
Characteristics:
- Velocity data exists but is noisy (high variance between Sprints)
- Team agrees on estimates faster - most items converge in 1-2 rounds
- Some stories still surprise (larger or smaller than estimated)
- Reference catalog is being refined based on actual completion data
- Estimation sessions take 15-20 minutes for 5-10 items
What to focus on:
- Compare estimates to actuals in every Retrospective
- Identify systematic patterns: "We always under-estimate stories that require [X]"
- Begin using velocity for Sprint capacity planning (with buffer)
- Update reference stories based on what you've learned
Expected outcome: By Sprint 10, velocity variance should be decreasing and Sprint completion rates improving.
Stage 3: Reliable (Sprints 11-20)
Characteristics:
- Velocity is predictable within a 15-20% range
- Estimation sessions are efficient - 10-15 minutes for 5-10 items
- Carry-over is rare (fewer than 1 story per Sprint)
- Team has an intuitive sense of relative sizing
- Reference catalog is stable and updated quarterly
What to focus on:
- Use velocity ranges (best/average/worst) for release forecasting
- Coach new team members using the reference catalog
- Refine your split threshold based on completion patterns
- Track throughput alongside velocity for cross-validation
Expected outcome: Reliable release date forecasts within 1-2 Sprints of accuracy.
Stage 4: Optimized (Sprint 20+)
Characteristics:
- Velocity coefficient of variation is under 15%
- Estimation takes minimal time - team often agrees without discussion
- Forecasts are accurate within 10-15%
- The team may start questioning whether formal estimation adds enough value
- Reference stories are rarely needed - the scale is internalized
What to focus on:
- Consider lightweight alternatives: quick consensus without Planning Poker cards
- Evaluate whether throughput-based forecasting (story count rather than points) works for your team
- Focus estimation time only on high-uncertainty or high-risk stories
- Use Monte Carlo simulation for probabilistic release planning
Expected outcome: Estimation becomes a lightweight, low-overhead practice that reliably supports planning.
10 Common Relative Estimation Mistakes
Mistake #1: Converting Relative Estimates to Hours
What happens: Management or the team establishes a conversion: "1 story point = 4 hours." A 5-point story is expected to take 20 hours.
Why it's harmful: This destroys the entire purpose of relative estimation. Points become a unit of time, reintroducing person-dependency, false precision, and schedule pressure. If you're going to convert to hours anyway, you might as well just estimate in hours.
Fix: Never define or allow a point-to-hour conversion. Use velocity for time-based planning: "Our average velocity is 28 points per Sprint. The remaining backlog is 84 points. That's about 3 Sprints."
Mistake #2: Estimating Without Reference Stories
What happens: Each estimation session starts from scratch with no shared anchor. Team members estimate based on their own private mental models.
Why it's harmful: Without reference stories, estimates drift over time. What was a 5 three months ago is now a 3, making velocity trends meaningless. Team members also diverge more because they're anchoring on different personal baselines.
Fix: Maintain a catalog of 5-7 reference stories at key scale values. Display them during every estimation session. Review and update the catalog quarterly.
Mistake #3: Having One Person Dominate Estimates
What happens: The senior developer or tech lead speaks first, and everyone else adjusts their estimate to match. Or the Product Owner suggests a size before the team estimates.
Why it's harmful: This introduces anchoring bias - the first number spoken becomes the gravitational center. It also silences junior team members who might have valuable perspective on testing complexity or uncertainty.
Fix: Use simultaneous reveal (Planning Poker) for every estimation. No one speaks their estimate before the reveal. The Product Owner presents the story and answers questions but never suggests a size.
Mistake #4: Spending Too Long on a Single Estimate
What happens: The team debates whether a story is a 5 or an 8 for 15-20 minutes, often going around in circles.
Why it's harmful: The precision difference between adjacent Fibonacci values is tiny across a Sprint. Spending 15 minutes debating saves zero forecasting accuracy. It also drains energy that should go toward surfacing risks and misunderstandings.
Fix: Set a time limit of 3-5 minutes per item. If the team can't converge after two rounds, go with the higher estimate and move on. If the debate reveals that the story is poorly understood, send it back for refinement rather than continuing to estimate.
Mistake #5: Comparing Estimates Across Teams
What happens: Management notices that Team A completes 40 points per Sprint and Team B completes 25, and concludes that Team B is underperforming.
Why it's harmful: Story points are team-specific. Team A's "5 points" and Team B's "5 points" represent different amounts of work - they calibrated against different reference stories with different team compositions. Comparing them is like comparing test scores from different exams.
Fix: If cross-team comparison is needed, use objective measures: throughput (stories completed per Sprint), cycle time (days from start to completion), or business value delivered. Never compare raw story point velocity.
Mistake #6: Using Relative Estimation for Individual Performance
What happens: Individual "velocity" is tracked: "Sarah completed 18 points, Carlos completed 12."
Why it's harmful: It creates perverse incentives. Developers inflate estimates to look more productive. Collaboration drops because helping a teammate doesn't increase your personal score. Pairing and mentoring become "velocity drains." Team trust erodes.
Fix: Relative estimation produces team-level data only. Individual performance should be assessed through qualitative measures: code review quality, knowledge sharing, mentoring, and contribution to team outcomes.
Mistake #7: Skipping Calibration
What happens: The team estimates every Sprint but never reviews whether their estimates were accurate. They never update reference stories or identify systematic biases.
Why it's harmful: Without calibration, estimation accuracy plateaus or degrades. The team misses learning opportunities. Velocity data becomes unreliable for forecasting because the relationship between points and actual work drifts.
Fix: Spend 5 minutes in each Sprint Retrospective reviewing estimation accuracy. Identify the biggest surprise (most over- or under-estimated story), discuss why, and update reference stories or estimation practices accordingly.
Mistake #8: Estimating Everything
What happens: The team estimates every type of work: features, bugs, technical debt, spikes, documentation, and meetings. Everything gets story points.
Why it's harmful: Bugs and spikes are inherently unpredictable - their "size" is unknowable until you start the work. Estimating them creates false precision and clutters velocity data. If bug points count toward velocity, teams are incentivized to create more bugs.
Fix: Estimate stories (features with defined acceptance criteria). Track bugs by count, not points. Timebox spikes (e.g., "spend 2 days researching") rather than estimating them. Reserve a capacity buffer for non-estimable work.
Mistake #9: Using the Wrong Scale
What happens: A team uses a linear scale (1, 2, 3, 4, 5, 6, 7, 8, 9, 10), allowing debates about the difference between 6 and 7.
Why it's harmful: Linear scales encourage false precision. The difference between a 6 and a 7 is not meaningfully distinguishable for most stories, but the scale's existence invites the debate. Time is wasted on precision that doesn't improve forecasting.
Fix: Use Fibonacci (1, 2, 3, 5, 8, 13, 21) or a similarly non-linear scale. The growing gaps force estimators to make meaningful, detectable distinctions while preventing meaningless precision at larger sizes.
Mistake #10: Abandoning Relative Estimation Too Early
What happens: After 3-4 Sprints of noisy velocity data, the team or management declares "story points don't work" and switches back to hours.
Why it's harmful: Relative estimation needs 4-6 Sprints of calibration to produce reliable velocity data. Judging it after 3 Sprints is like judging a diet after 3 days. The early noise is the calibration process working - the team is learning to estimate consistently.
Fix: Commit to at least 8 Sprints before evaluating whether relative estimation works for your team. Track velocity variance over time - it should decrease. If it doesn't decrease after 8 Sprints, investigate root causes (team instability, scope changes, poor refinement) rather than blaming the estimation approach.
Scaling Relative Estimation Across Teams
When multiple teams work on the same product, relative estimation needs coordination:
Shared reference stories: If teams need to compare or aggregate estimates (e.g., for PI planning), establish 3-5 shared reference stories that all teams calibrate against. This creates "normalized" story points that are roughly comparable across teams.
Independent velocity: Even with shared reference stories, each team maintains its own velocity. A 5-point story may take Team A one day and Team B three days - that's fine because each team's velocity reflects their specific pace.
Portfolio-level estimation: For roadmap and portfolio planning, use T-shirt sizing or affinity estimation rather than story points. These techniques are faster and don't require cross-team point normalization.
Feature-level aggregation: When a feature spans multiple teams, each team estimates their portion independently using their own scale. The total estimate is the sum of team-level estimates, converted to Sprints using each team's velocity. Don't add raw points across teams - add projected Sprint counts.
Coordination ceremonies: During PI planning or big-room planning, use affinity estimation to create a shared view of relative feature sizes. Then each team breaks their portion into stories and estimates with their own story point scale during Sprint-level planning.
Conclusion
Relative estimation works because it aligns with how human cognition actually operates. We're wired to compare, not to predict. We perceive proportional differences, not absolute ones. We make faster and more accurate judgments when we have concrete reference points.
Key takeaways:
- Relative estimation sizes work by comparison ("Is this bigger or smaller than X?") rather than prediction ("How many hours will this take?")
- The Weber-Fechner law explains why Fibonacci scales work - our brains perceive proportional differences, and the scale's growing gaps mirror this
- Reference stories are the foundation - without them, estimates drift and velocity becomes meaningless
- Story points, T-shirt sizing, affinity estimation, and Planning Poker are all relative estimation techniques - choose based on your situation
- Relative and absolute estimation can coexist: story points for Sprint planning, hours for task breakdown
- Never convert story points to hours, compare velocity across teams, or use relative estimates for individual performance
- Calibration is essential - review estimation accuracy in every Retrospective and update reference stories quarterly
- Relative estimation needs 4-6 Sprints of practice to produce reliable velocity data - don't abandon it prematurely
Quiz on
Your Score: 0/15
Question: According to the article, what is the core question that relative estimation asks?
Story Points in AgileDeep dive into story points - the most popular unit for relative estimation - covering scales, velocity, and common mistakes.
Planning PokerLearn the consensus-driven estimation technique that uses simultaneous reveal to prevent anchoring bias in relative estimation.
T-Shirt Sizing EstimationExplore T-shirt sizing as a lightweight relative estimation technique for roadmap-level planning and large backlog sizing.
Affinity EstimationDiscover the rapid relative estimation technique for sizing 50-200 backlog items in under an hour.
Fibonacci Sequence ScaleUnderstand why the Fibonacci sequence is the standard scale for relative estimation and how its growing gaps reflect cognitive perception.
Release PlanningLearn how relative estimates and velocity data drive release date forecasting and multi-Sprint capacity planning.
Sprint PlanningUnderstand how relative estimation feeds into Sprint Planning for selecting the right amount of work each Sprint.
Product BacklogLearn about the Product Backlog where relative estimates are assigned during refinement to enable effective Sprint and release planning.
Frequently Asked Questions (FAQs) / People Also Ask (PAA)
How does relative estimation work with the #NoEstimates movement?
How does relative estimation work in SAFe (Scaled Agile Framework) at the Program level?
How do you facilitate relative estimation with remote or distributed teams?
How do you explain relative estimation to management that expects hour-based estimates?
Can AI or machine learning tools replace manual relative estimation?
How does relative estimation interact with DevOps and continuous delivery practices?
What estimation games or exercises can help teams learn relative estimation?
How do you handle relative estimation when team composition changes frequently?
Is relative estimation compatible with fixed-price or fixed-scope contracts?
How does relative estimation work for non-software work like marketing, design, or content creation?
What's the relationship between relative estimation and Scrum's empiricism?
How do you prevent story point inflation when using relative estimation long-term?
How does relative estimation address the needs of diverse teams with mixed experience levels?
Can relative estimation be used alongside OKRs (Objectives and Key Results)?
What data privacy considerations apply to sharing relative estimation data across organizations?