Relative Estimation in Agile: The Complete Guide to Sizing Work Without Hours

Relative Estimation in Agile: The Complete Guide to Sizing Work Without HoursRelative Estimation in Agile: The Complete Guide to Sizing Work Without Hours

Relative estimation is a technique where teams size work items by comparing them to each other rather than estimating in absolute units like hours or days. Instead of asking "How long will this take?", teams ask "Is this bigger or smaller than that other thing we did?" This shift in thinking is one of the most powerful - and most misunderstood - concepts in agile. Teams that embrace relative estimation consistently produce more accurate forecasts, spend less time estimating, and surface hidden risks earlier. This guide covers why relative estimation works, the techniques available, how to implement it, and the mistakes that derail teams.

Quick Answer: Relative vs Absolute Estimation

AspectRelative EstimationAbsolute Estimation
What you estimateSize compared to other work itemsHours, days, or calendar time
Core question"Is this bigger or smaller than X?""How many hours will this take?"
Unit of measureStory points, T-shirt sizes, or bucketsHours, days, or person-days
PrecisionDeliberately coarse (e.g., Fibonacci scale)Falsely precise ("exactly 14 hours")
Person-dependentNo - the team estimates togetherYes - depends on who does the work
Accuracy over timeImproves as team calibrates velocityStays inconsistent regardless of practice
Best forSprint planning, release forecasting, backlog sizingTask breakdown within a story, time tracking

Table Of Contents-

What Is Relative Estimation?

The Core Insight

Relative estimation is the practice of sizing work by comparing items against each other rather than assigning absolute time values. When a team uses relative estimation, they don't ask "How many hours will this story take?" They ask "How does this story compare to other stories we've already sized or completed?"

The output is a relative size - a number, a label, or a category that positions the item on a scale compared to everything else in the Product Backlog. A story estimated at 8 points isn't "8 hours" or "8 days" - it means "this story is roughly 8/5ths the size of our 5-point reference story."

Relative estimation is NOT in the Scrum Guide. The Scrum Guide doesn't prescribe any specific estimation technique. Relative estimation is a complementary practice widely used by Scrum teams because it works exceptionally well with velocity-based planning and empirical process control. On the PSM-1 exam, you need to understand the concept of sizing work relative to other work - not any specific estimation technique.

A Simple Analogy

Imagine sorting a stack of rocks by weight. You have two options:

  1. Absolute approach: Weigh each rock on a scale, write down the grams, and sort by number.
  2. Relative approach: Pick up two rocks - one in each hand - and feel which is heavier. Repeat with other rocks until you have them sorted light to heavy.

The relative approach is faster, requires no tools, and produces a useful ranking. You don't know the exact weight of each rock, but you know with confidence that Rock C is about twice as heavy as Rock A. For planning purposes - "Can I carry these rocks in one trip?" - relative comparison is usually sufficient.

Software estimation works the same way. You rarely need to know exact hours. You need to know relative size so you can answer: "Can we fit this work into the next Sprint?"

Why Relative Estimation Works Better Than Absolute

The Psychology: Weber-Fechner Law

The Weber-Fechner law, established in the 19th century, states that humans perceive differences in stimuli proportionally rather than absolutely. You can easily tell the difference between lifting a 1 kg weight and a 2 kg weight (100% difference). But telling the difference between a 50 kg weight and a 51 kg weight (2% difference) is much harder, even though both differences are exactly 1 kg.

This law explains why the Fibonacci sequence works so well for estimation. The gaps between values grow proportionally: 1-2-3-5-8-13-21. Each number is roughly 60% larger than the previous one. This mirrors how our brains actually process magnitude - we distinguish proportional differences, not absolute ones.

When teams estimate in hours, they're forced to make absolute judgments that their brains aren't wired for. When they estimate in relative sizes with growing intervals, they're working with their cognitive strengths rather than against them.

Cognitive Science: Comparison vs Prediction

Research in cognitive psychology consistently shows that humans are much better at comparing than predicting:

  • Comparison (relative): "Is building this API endpoint bigger or smaller than the login feature we built last Sprint?" Your brain retrieves a concrete memory and makes a quick comparison. This activates recognition memory, which is fast and reliable.
  • Prediction (absolute): "How many hours will this API endpoint take?" Your brain has to simulate the entire future execution of the task - every edge case, every interruption, every unknown. This activates constructive imagination, which is slow and unreliable.

Teams that switch from hours to relative estimation typically see their forecasting accuracy improve within 4-6 Sprints because they stop fighting their own cognitive architecture.

The Anchoring Advantage

Relative estimation gives teams a concrete anchor - a reference story - that makes estimation faster and more consistent. Without an anchor, each estimation session starts from scratch: "So... is this 16 hours?" With a reference story, the conversation is grounded: "Our 5-point reference story was the user profile API. This new story is similar in complexity but with more uncertainty from the third-party integration, so it's probably an 8."

Anchoring also reduces the spread of estimates within a team. When everyone compares against the same reference, their estimates naturally converge. Without a reference, each person anchors on their own private mental model, producing wider divergence.

Absorbing Uncertainty

Absolute estimates create pressure for false precision. Saying "14 hours" implies you know the task duration to a level of accuracy that software development rarely supports. When that 14-hour estimate becomes 22 hours, it feels like a failure.

Relative estimates embrace uncertainty by design. Saying "this is about the same size as that 8-point story" acknowledges that you don't know the exact hours - and you don't need to. The Fibonacci scale's growing gaps deliberately prevent false precision: you can't say "this is a 6" when your choices are 5 or 8, and that coarseness is a feature, not a bug.

The Reference Story Approach

Choosing Your Baseline

A reference story is a well-understood, completed piece of work that the team uses as a benchmark for all future estimates. Choosing the right reference story is critical - it becomes the ruler against which everything else is measured.

Good reference story characteristics:

  • The team has completed it recently enough to remember the details
  • It was a medium-sized piece of work (not the smallest thing ever, not the biggest)
  • The effort, complexity, and uncertainty were all moderate
  • Most team members worked on or are familiar with it
  • It represents a typical type of work for the team

Assign your reference story a value in the middle of your scale - typically 3 or 5 on a Fibonacci scale.

Building a Reference Catalog

A single reference story isn't enough. Build a catalog of 5-7 reference stories that span your estimation scale:

PointsReference StoryWhy This Size
1Add a tooltip to existing buttonTrivial effort, no complexity, no uncertainty
2Add input validation to existing form fieldSmall effort, low complexity, no uncertainty
3Create new API endpoint with standard CRUDModerate effort, low complexity, minimal uncertainty
5Build user profile page with API integrationSignificant effort, moderate complexity, some uncertainty
8Integrate third-party payment gatewayLarge effort, high complexity, notable uncertainty
13Redesign notification system with real-time pushVery large effort, high complexity, significant uncertainty

Review and update this catalog every quarter or when team composition changes significantly. New team members should study these reference stories before their first estimation session.

Calibration Over Time

Relative estimation improves through calibration - the process of comparing estimates to actual outcomes and adjusting:

  1. Sprint Retrospective review: "We estimated this at 5 points but it was clearly an 8 - what did we miss?"
  2. Pattern identification: "We consistently under-estimate stories involving database migrations by about one Fibonacci level."
  3. Reference update: "Our 5-point reference story no longer reflects what a 5 feels like. Let's pick a better one."

This calibration loop is the engine that makes relative estimation increasingly accurate over time. Teams that skip calibration get stuck with noisy estimates that never improve.

Relative Estimation Techniques

Story Points

Story points are the most widely used relative estimation unit. Each story point represents a blend of effort, complexity, and uncertainty - expressed as a single number on a relative scale.

Key characteristics:

  • Team-specific: a 5-point story on Team A is not the same as a 5-point story on Team B
  • Scale-based: typically uses Fibonacci (1, 2, 3, 5, 8, 13, 21) or Modified Fibonacci
  • Velocity-connected: total story points completed per Sprint = velocity
  • Not convertible to hours: there is no valid story-point-to-hour conversion

Story points are ideal for Sprint-level planning and release forecasting. They require an investment in reference stories, calibration, and team consistency - but the payoff is reliable velocity data within 4-6 Sprints.

T-Shirt Sizing

T-shirt sizing uses labels instead of numbers: XS, S, M, L, XL, XXL. It's the most accessible form of relative estimation because everyone intuitively understands that a "Large" is bigger than a "Small."

Best for:

  • Initial backlog sizing when you have 50-200 items to estimate quickly
  • Roadmap-level planning where precision isn't needed
  • Teams new to relative estimation who find numbers intimidating
  • Stakeholder communication (easier to explain than story points)

Limitation: T-shirt sizes don't aggregate into velocity. You can't add up "2 Mediums and 1 Large" to get a capacity number. Many teams start with T-shirt sizing and transition to story points once they're comfortable with relative thinking.

Fibonacci Sequence Scale

The Fibonacci sequence (1, 2, 3, 5, 8, 13, 21) is the most common scale for relative estimation because its growing gaps mirror how human perception works. The sequence forces estimators to make meaningful distinctions at the small end (is this a 2 or a 3?) while preventing false precision at the large end (there's no option between 13 and 21).

Why Fibonacci works for estimation:

  • Gap growth matches the Weber-Fechner law of diminishing perception
  • Prevents debates about meaningless differences ("is this a 14 or a 15?")
  • Forces stories above 13 to be split - large items carry too much uncertainty
  • Each value is roughly 60% larger than the previous, creating consistent proportional jumps

Some teams use Modified Fibonacci (1, 2, 3, 5, 8, 13, 20, 40, 100) which replaces 21 with 20 for easier mental math and adds 40 and 100 for backlog items that need splitting.

Affinity Estimation

Affinity estimation is a rapid technique for sizing large numbers of items. The team physically or virtually groups items into relative size categories by comparing them to each other - not by discussing each one in detail.

How it works:

  1. Lay out the scale (columns labeled 1, 2, 3, 5, 8, 13)
  2. Place the first item in the middle column as a starting reference
  3. Team members silently place remaining items into columns based on relative size
  4. Review the groupings, discuss disagreements, and adjust

Speed advantage: Affinity estimation can size 50-100 items in 30-60 minutes - 10x faster than Planning Poker for the same number of items.

Best for: Initial sizing of a large backlog, PI planning in scaled frameworks, and any situation where you need rough estimates for many items quickly.

Planning Poker

Planning Poker is the gold standard for detailed relative estimation. Each Developer selects a card with their estimate simultaneously, preventing anchoring bias. When estimates diverge, the team discusses - and these discussions are often the most valuable part of the process.

How it works:

  1. Product Owner presents the story and answers questions
  2. Each Developer privately selects a Fibonacci card
  3. All cards are revealed simultaneously
  4. If estimates converge (e.g., all show 5 or 8), consensus is reached quickly
  5. If estimates diverge (e.g., one shows 3 and another shows 13), outliers explain their reasoning
  6. Re-vote after discussion, typically converging within 2-3 rounds

Best for: Sprint-level refinement of 5-15 items where detailed discussion and risk surfacing matter.

How to Implement Relative Estimation: Step-by-Step

Step 1: Choose Your Technique

Select the technique that matches your current need:

SituationRecommended Technique
New team, first time estimatingT-shirt sizing (low barrier to entry)
Large backlog needs initial sizing (50+ items)Affinity estimation
Sprint-level refinement (5-15 items)Planning Poker with story points
Roadmap or PI planningT-shirt sizing or affinity estimation
Mature team, stable backlogStory points with quick consensus

Step 2: Establish Reference Stories

Before your first estimation session, identify 3-5 completed stories that span your scale. Present them to the team and agree on their relative sizes. Write them down - these become your calibration anchor for every future session.

Step 3: Run Your First Session

For Planning Poker:

  • Start with the reference stories visible on a board or shared screen
  • Present each new story, allow questions about scope and acceptance criteria
  • Have each Developer select a card privately, then reveal simultaneously
  • Discuss divergence, then re-vote
  • Aim for 2-5 minutes per story - if you can't converge, go with the higher estimate and move on

For Affinity Estimation:

  • Lay out the scale columns
  • Place one reference story per column
  • Have team members silently place remaining stories
  • Walk through the groupings, discuss and adjust
  • Aim for 10-20 seconds per story

Step 4: Track Velocity (If Using Story Points)

After each Sprint, record the total story points completed (only stories meeting the Definition of Done). After 4-6 Sprints, you'll have enough data for reliable velocity-based forecasting.

Step 5: Calibrate in Retrospectives

In each Sprint Retrospective, spend 5 minutes reviewing estimation accuracy:

  • Were any stories significantly larger or smaller than estimated?
  • What caused the surprise?
  • Should any reference stories be updated?
  • Are there systematic patterns (e.g., "we always under-estimate integration work")?

Step 6: Refine Your Scale

After 6-10 Sprints, evaluate whether your scale still works:

  • If everything clusters at 3-5, your reference stories may be too coarse
  • If you're regularly using 13+ values, your team may need to split more aggressively
  • If velocity is unstable, investigate whether estimation consistency or external factors are the cause

Step 7: Integrate with Planning

Once velocity is stable (coefficient of variation below 25%), use it for:

  • Sprint Planning: Select roughly one Sprint's worth of velocity in story points
  • Release Planning: Divide remaining backlog points by average velocity to forecast completion
  • Capacity planning: Use velocity ranges (best/average/worst) for probabilistic forecasting

Relative vs Absolute Estimation: Detailed Comparison

DimensionRelative EstimationAbsolute Estimation
Cognitive loadLow - pattern matching and comparisonHigh - requires simulating future execution
SpeedFast - most items estimated in 1-3 minutesSlow - detailed task decomposition required
Accuracy (individual item)Low - any single estimate may be off by a Fibonacci levelMedium - hour estimates can be close for familiar work
Accuracy (aggregate)High - over/under estimates cancel out across a SprintLow - errors compound rather than cancel
Team vs individualTeam-based - reduces individual biasOften individual - one person's guess
Handles uncertaintyWell - coarse scale absorbs unknownsPoorly - pressure for false precision
Learning curveModerate - requires 4-6 Sprints to calibrateLow - everyone understands hours
MaintenanceRequires reference stories and calibrationRequires re-estimation when scope changes
Cross-team comparisonNot possible (team-specific units)Possible but misleading (different capabilities)
Stakeholder communicationRequires translation to dates via velocityDirect but often inaccurate

When to Use Relative vs Absolute Estimation

Use relative estimation when:

  • You need Sprint-level or release-level forecasting
  • The team is sizing work in the Product Backlog during refinement
  • Work items vary significantly in size
  • You want to surface risks and misunderstandings through team discussion
  • You need aggregate accuracy across many items

Use absolute estimation when:

  • You're breaking a story into implementation tasks during Sprint Planning
  • The work is highly familiar and predictable (e.g., "this migration script takes 2 hours")
  • Contractual obligations require time-based estimates
  • You're tracking time spent for billing or compliance purposes
  • Individual task assignments need time boundaries

Many teams use both: relative estimation for story-level sizing (story points for Sprint planning and forecasting) and absolute estimation for task-level planning (hours for individual work organization within a Sprint).

Industry Examples

SaaS Product Development

A SaaS team with 7 developers uses story points on a Fibonacci scale with Planning Poker for Sprint refinement. Their reference stories: 1-point config changes, 3-point feature tweaks, 5-point new features with API integration, 8-point integrations with third-party services. They run 2-week Sprints with stable velocity of 34 points, allowing them to forecast quarterly releases within 1 Sprint of accuracy.

Healthcare Software

A healthcare team building EHR integration software includes regulatory compliance in their relative estimates. Stories involving PHI (Protected Health Information) automatically get sized 1-2 Fibonacci levels higher than equivalent non-PHI stories because of required HIPAA documentation, audit logging, encryption verification, and security review. Their velocity (22 points per Sprint) is lower than non-regulated teams, but forecasts are accurate because compliance effort is embedded in the relative sizes.

Financial Services

A fintech team estimating payment processing features uses T-shirt sizing for initial roadmap discussions with stakeholders (S/M/L maps to 1-month/1-quarter/multi-quarter delivery) and converts to story points for Sprint-level work. PCI-DSS compliance requirements are captured in their reference stories - a "5-point payment feature" inherently includes the compliance testing that the team has learned accompanies every payment-related change.

E-commerce Platform

An e-commerce team tracks relative estimates across three work types: customer-facing features, performance optimization, and infrastructure. They maintain separate reference catalogs for each type because the effort/complexity/uncertainty profiles differ significantly. A 5-point customer feature involves UI work and API integration, while a 5-point infrastructure change involves Terraform modules and monitoring setup. Separate catalogs prevent cross-type estimation drift.

Government Software

A government contractor team uses relative estimation within the constraints of fixed-price contracts. They estimate the initial backlog using affinity estimation to produce a total story point count, divide by projected velocity to estimate the number of Sprints, and present the Sprint count (with confidence range) to the contracting officer. Internally, they track velocity and use it for Sprint Planning. The relative estimates allow them to reorder and re-scope within the fixed budget without re-estimating in hours.

EdTech Platform

An EdTech team building a learning management system uses relative estimation with a twist: they size accessibility work separately. Every feature gets two estimates - base functionality and accessibility compliance (WCAG 2.1 AA). A feature might be 5 points for base functionality and 3 points for accessibility, producing an 8-point total. This visibility helps the Product Owner understand the cost of accessibility compliance and plan accordingly, rather than treating it as invisible overhead.

Relative Estimation Maturity Model

Stage 1: Getting Started (Sprints 1-4)

Characteristics:

  • Team is new to relative estimation or transitioning from hours
  • Estimates feel arbitrary - "Is this a 3 or a 5? I have no idea"
  • No velocity data exists yet
  • Reference stories are being established
  • Estimation sessions run long (30+ minutes for 5-10 items)

What to focus on:

  • Pick 3-5 reference stories and physically display them during every session
  • Don't worry about accuracy - focus on consistency (always compare to the same references)
  • Track velocity but don't rely on it for planning yet
  • Use T-shirt sizing if story points feel too abstract initially

Expected outcome: By Sprint 4, the team should converge on estimates faster and feel comfortable with the relative scale.

Stage 2: Calibrating (Sprints 5-10)

Characteristics:

  • Velocity data exists but is noisy (high variance between Sprints)
  • Team agrees on estimates faster - most items converge in 1-2 rounds
  • Some stories still surprise (larger or smaller than estimated)
  • Reference catalog is being refined based on actual completion data
  • Estimation sessions take 15-20 minutes for 5-10 items

What to focus on:

  • Compare estimates to actuals in every Retrospective
  • Identify systematic patterns: "We always under-estimate stories that require [X]"
  • Begin using velocity for Sprint capacity planning (with buffer)
  • Update reference stories based on what you've learned

Expected outcome: By Sprint 10, velocity variance should be decreasing and Sprint completion rates improving.

Stage 3: Reliable (Sprints 11-20)

Characteristics:

  • Velocity is predictable within a 15-20% range
  • Estimation sessions are efficient - 10-15 minutes for 5-10 items
  • Carry-over is rare (fewer than 1 story per Sprint)
  • Team has an intuitive sense of relative sizing
  • Reference catalog is stable and updated quarterly

What to focus on:

  • Use velocity ranges (best/average/worst) for release forecasting
  • Coach new team members using the reference catalog
  • Refine your split threshold based on completion patterns
  • Track throughput alongside velocity for cross-validation

Expected outcome: Reliable release date forecasts within 1-2 Sprints of accuracy.

Stage 4: Optimized (Sprint 20+)

Characteristics:

  • Velocity coefficient of variation is under 15%
  • Estimation takes minimal time - team often agrees without discussion
  • Forecasts are accurate within 10-15%
  • The team may start questioning whether formal estimation adds enough value
  • Reference stories are rarely needed - the scale is internalized

What to focus on:

  • Consider lightweight alternatives: quick consensus without Planning Poker cards
  • Evaluate whether throughput-based forecasting (story count rather than points) works for your team
  • Focus estimation time only on high-uncertainty or high-risk stories
  • Use Monte Carlo simulation for probabilistic release planning

Expected outcome: Estimation becomes a lightweight, low-overhead practice that reliably supports planning.

10 Common Relative Estimation Mistakes

Mistake #1: Converting Relative Estimates to Hours

What happens: Management or the team establishes a conversion: "1 story point = 4 hours." A 5-point story is expected to take 20 hours.

Why it's harmful: This destroys the entire purpose of relative estimation. Points become a unit of time, reintroducing person-dependency, false precision, and schedule pressure. If you're going to convert to hours anyway, you might as well just estimate in hours.

Fix: Never define or allow a point-to-hour conversion. Use velocity for time-based planning: "Our average velocity is 28 points per Sprint. The remaining backlog is 84 points. That's about 3 Sprints."

Mistake #2: Estimating Without Reference Stories

What happens: Each estimation session starts from scratch with no shared anchor. Team members estimate based on their own private mental models.

Why it's harmful: Without reference stories, estimates drift over time. What was a 5 three months ago is now a 3, making velocity trends meaningless. Team members also diverge more because they're anchoring on different personal baselines.

Fix: Maintain a catalog of 5-7 reference stories at key scale values. Display them during every estimation session. Review and update the catalog quarterly.

Mistake #3: Having One Person Dominate Estimates

What happens: The senior developer or tech lead speaks first, and everyone else adjusts their estimate to match. Or the Product Owner suggests a size before the team estimates.

Why it's harmful: This introduces anchoring bias - the first number spoken becomes the gravitational center. It also silences junior team members who might have valuable perspective on testing complexity or uncertainty.

Fix: Use simultaneous reveal (Planning Poker) for every estimation. No one speaks their estimate before the reveal. The Product Owner presents the story and answers questions but never suggests a size.

Mistake #4: Spending Too Long on a Single Estimate

What happens: The team debates whether a story is a 5 or an 8 for 15-20 minutes, often going around in circles.

Why it's harmful: The precision difference between adjacent Fibonacci values is tiny across a Sprint. Spending 15 minutes debating saves zero forecasting accuracy. It also drains energy that should go toward surfacing risks and misunderstandings.

Fix: Set a time limit of 3-5 minutes per item. If the team can't converge after two rounds, go with the higher estimate and move on. If the debate reveals that the story is poorly understood, send it back for refinement rather than continuing to estimate.

Mistake #5: Comparing Estimates Across Teams

What happens: Management notices that Team A completes 40 points per Sprint and Team B completes 25, and concludes that Team B is underperforming.

Why it's harmful: Story points are team-specific. Team A's "5 points" and Team B's "5 points" represent different amounts of work - they calibrated against different reference stories with different team compositions. Comparing them is like comparing test scores from different exams.

Fix: If cross-team comparison is needed, use objective measures: throughput (stories completed per Sprint), cycle time (days from start to completion), or business value delivered. Never compare raw story point velocity.

Mistake #6: Using Relative Estimation for Individual Performance

What happens: Individual "velocity" is tracked: "Sarah completed 18 points, Carlos completed 12."

Why it's harmful: It creates perverse incentives. Developers inflate estimates to look more productive. Collaboration drops because helping a teammate doesn't increase your personal score. Pairing and mentoring become "velocity drains." Team trust erodes.

Fix: Relative estimation produces team-level data only. Individual performance should be assessed through qualitative measures: code review quality, knowledge sharing, mentoring, and contribution to team outcomes.

Mistake #7: Skipping Calibration

What happens: The team estimates every Sprint but never reviews whether their estimates were accurate. They never update reference stories or identify systematic biases.

Why it's harmful: Without calibration, estimation accuracy plateaus or degrades. The team misses learning opportunities. Velocity data becomes unreliable for forecasting because the relationship between points and actual work drifts.

Fix: Spend 5 minutes in each Sprint Retrospective reviewing estimation accuracy. Identify the biggest surprise (most over- or under-estimated story), discuss why, and update reference stories or estimation practices accordingly.

Mistake #8: Estimating Everything

What happens: The team estimates every type of work: features, bugs, technical debt, spikes, documentation, and meetings. Everything gets story points.

Why it's harmful: Bugs and spikes are inherently unpredictable - their "size" is unknowable until you start the work. Estimating them creates false precision and clutters velocity data. If bug points count toward velocity, teams are incentivized to create more bugs.

Fix: Estimate stories (features with defined acceptance criteria). Track bugs by count, not points. Timebox spikes (e.g., "spend 2 days researching") rather than estimating them. Reserve a capacity buffer for non-estimable work.

Mistake #9: Using the Wrong Scale

What happens: A team uses a linear scale (1, 2, 3, 4, 5, 6, 7, 8, 9, 10), allowing debates about the difference between 6 and 7.

Why it's harmful: Linear scales encourage false precision. The difference between a 6 and a 7 is not meaningfully distinguishable for most stories, but the scale's existence invites the debate. Time is wasted on precision that doesn't improve forecasting.

Fix: Use Fibonacci (1, 2, 3, 5, 8, 13, 21) or a similarly non-linear scale. The growing gaps force estimators to make meaningful, detectable distinctions while preventing meaningless precision at larger sizes.

Mistake #10: Abandoning Relative Estimation Too Early

What happens: After 3-4 Sprints of noisy velocity data, the team or management declares "story points don't work" and switches back to hours.

Why it's harmful: Relative estimation needs 4-6 Sprints of calibration to produce reliable velocity data. Judging it after 3 Sprints is like judging a diet after 3 days. The early noise is the calibration process working - the team is learning to estimate consistently.

Fix: Commit to at least 8 Sprints before evaluating whether relative estimation works for your team. Track velocity variance over time - it should decrease. If it doesn't decrease after 8 Sprints, investigate root causes (team instability, scope changes, poor refinement) rather than blaming the estimation approach.

Scaling Relative Estimation Across Teams

When multiple teams work on the same product, relative estimation needs coordination:

Shared reference stories: If teams need to compare or aggregate estimates (e.g., for PI planning), establish 3-5 shared reference stories that all teams calibrate against. This creates "normalized" story points that are roughly comparable across teams.

Independent velocity: Even with shared reference stories, each team maintains its own velocity. A 5-point story may take Team A one day and Team B three days - that's fine because each team's velocity reflects their specific pace.

Portfolio-level estimation: For roadmap and portfolio planning, use T-shirt sizing or affinity estimation rather than story points. These techniques are faster and don't require cross-team point normalization.

Feature-level aggregation: When a feature spans multiple teams, each team estimates their portion independently using their own scale. The total estimate is the sum of team-level estimates, converted to Sprints using each team's velocity. Don't add raw points across teams - add projected Sprint counts.

Coordination ceremonies: During PI planning or big-room planning, use affinity estimation to create a shared view of relative feature sizes. Then each team breaks their portion into stories and estimates with their own story point scale during Sprint-level planning.

Conclusion

Relative estimation works because it aligns with how human cognition actually operates. We're wired to compare, not to predict. We perceive proportional differences, not absolute ones. We make faster and more accurate judgments when we have concrete reference points.

Key takeaways:

  1. Relative estimation sizes work by comparison ("Is this bigger or smaller than X?") rather than prediction ("How many hours will this take?")
  2. The Weber-Fechner law explains why Fibonacci scales work - our brains perceive proportional differences, and the scale's growing gaps mirror this
  3. Reference stories are the foundation - without them, estimates drift and velocity becomes meaningless
  4. Story points, T-shirt sizing, affinity estimation, and Planning Poker are all relative estimation techniques - choose based on your situation
  5. Relative and absolute estimation can coexist: story points for Sprint planning, hours for task breakdown
  6. Never convert story points to hours, compare velocity across teams, or use relative estimates for individual performance
  7. Calibration is essential - review estimation accuracy in every Retrospective and update reference stories quarterly
  8. Relative estimation needs 4-6 Sprints of practice to produce reliable velocity data - don't abandon it prematurely

Quiz on

Your Score: 0/15

Question: According to the article, what is the core question that relative estimation asks?

Frequently Asked Questions (FAQs) / People Also Ask (PAA)

How does relative estimation work with the #NoEstimates movement?

How does relative estimation work in SAFe (Scaled Agile Framework) at the Program level?

How do you facilitate relative estimation with remote or distributed teams?

How do you explain relative estimation to management that expects hour-based estimates?

Can AI or machine learning tools replace manual relative estimation?

How does relative estimation interact with DevOps and continuous delivery practices?

What estimation games or exercises can help teams learn relative estimation?

How do you handle relative estimation when team composition changes frequently?

Is relative estimation compatible with fixed-price or fixed-scope contracts?

How does relative estimation work for non-software work like marketing, design, or content creation?

What's the relationship between relative estimation and Scrum's empiricism?

How do you prevent story point inflation when using relative estimation long-term?

How does relative estimation address the needs of diverse teams with mixed experience levels?

Can relative estimation be used alongside OKRs (Objectives and Key Results)?

What data privacy considerations apply to sharing relative estimation data across organizations?