Amazon's Software Developer Performance Management Ecosystem

Amazon’s approach to managing and motivating its software development engineers has become one of the most discussed and debated systems in the tech industry. Behind its reputation for innovation lies a performance management model designed not just to measure results – but to continuously “raise the bar.”

At its best, this model cultivates elite engineering performance and cultural alignment. At its worst, it creates sustained internal pressure and competition. Understanding how this system works offers valuable insight for any organization striving to balance high standards with long-term team health.

This article breaks down how Amazon structures performance management for its software engineers – from formal review frameworks and leadership principles to the controversial calibration mechanisms that underpin its “bar-raiser” ethos. More importantly, we’ll look at what tech leaders and engineering managers can learn from Amazon’s model – both what to emulate and what to approach with caution.

Page Contents

1. The Formal Evaluation Framework: Forte, OLR, and the Primacy of Principles
2. Measuring the “What”: How Amazon Quantifies Engineering Performance
3. Measuring the “How”: The Unquantifiable, Yet Critical Contributions
4. The Engine of Attrition: Stack Ranking, URA, and the PIP Gauntlet
Conclusion: High Performance Meets Purposeful Darwinism

1. The Formal Evaluation Framework: Forte, OLR, and the Primacy of Principles

Amazon’s performance management framework is a layered and deeply data-driven system that blends developmental feedback with organizational calibration. On paper, it’s a model of structure and accountability. In practice, it’s a finely tuned machine for ranking and differentiating talent – one that reflects Amazon’s cultural DNA.

1.1 The Forte and OLR Process: A Two-Part System

Amazon’s evaluation process is split between two core mechanisms:

Forte Review: The visible, employee-facing feedback cycle.
Organizational Leadership Review (OLR): The closed-door calibration session where real decisions are made.

The Forte Review: Building the Narrative

Each year, software engineers at Amazon undergo the Forte process – an extensive peer feedback cycle that runs from November to March. Engineers collect input from colleagues, managers, and stakeholders across projects. While employees don’t see who wrote which comments, managers have full transparency into all feedback sources.

Officially, Forte exists to promote growth and self-awareness. It encourages engineers to “level up” by highlighting their strengths and developmental areas. However, for many employees, Forte functions more as a communication tool than a decision-making one – a narrative built to support conclusions already formed elsewhere.

The OLR: Where Ratings Are Decided

Behind the scenes, the Organizational Leadership Review (OLR) drives the actual evaluation outcome. Senior leaders meet to discuss and “calibrate” performance ratings across teams – ensuring they fit into a forced distribution curve (often referred to as stack ranking).

This process isn’t merely administrative; it’s a negotiation. Managers advocate for their team members, and ratings are often adjusted to meet quotas – both for top performers and those at the bottom. Ultimately, an engineer’s fate can depend as much on their manager’s advocacy skills as on their individual contributions.

1.2 Leadership Principles as a Scored Metric

Amazon’s 16 Leadership Principles (LPs) – such as Customer Obsession, Dive Deep, and Deliver Results – have long guided internal decision-making. In 2025, Amazon formally integrated these principles into its employee evaluation process, turning them into quantifiable performance dimensions.

Engineers are now rated across three axes:

Job Performance – What was delivered.
Leadership Principles – How it was delivered.
Potential – Readiness for future roles.

The LPs, once philosophical guidelines, are now part of a competitive scoring system. Only a small fraction of employees – around 5% – can achieve a “Role Model” rating, regardless of overall team performance. For senior developers, specific principles like Think Big and Customer Obsession carry additional weight, reinforcing a culture where alignment with Amazon’s values becomes a measurable and limiting – differentiator.

1.3 The Overall Value (OV) Score: Synthesizing Performance into a Single Metric

The three pillars of evaluation – performance, LPs, and potential – are synthesized into a single, comprehensive “Overall Value” (OV) score. This score is the ultimate determinant of an employee’s fate, directly influencing salary increases, promotion decisions, and placement on performance improvement plans.

The system is characterized by a significant opacity, with two distinct layers of ratings. The employee receives a qualitative, summary rating in their Forte review, such as “Exceeds High Bar,” “Meets High Bar,” or “Needs Improvement”.However, the internal rating used during the OLR calibration is far more granular and consequential, consisting of tiers like “Top Tier” (TT), “High Value” (HV3, HV2, HV1), and “Least Effective” (LE).

The final OV rating is derived from a complex matrix combining scores for performance review and potential. This can lead to counterintuitive outcomes that are confusing for employees. For example, an engineer who receives an “Exceeds” rating on their Forte summary (based on a high performance score) may ultimately have a lower OV rating than a colleague who received “Meets” but was assigned a higher potential score.This opaque, 28-combination matrix effectively disconnects the feedback an employee sees from the internal calculus that truly drives their career progression, as detailed in the table below.

Table 1: The Amazon Performance Rating System: OLR vs. Forte

Internal OLR Rating	Employee-Facing Forte Rating	Target Distribution (%)	Implication
TT (Top Tier)	Exceeds High Bar	20% (in large teams)	Rockstar performer, exceeding expectations. Generally required for promotion.
HV3 (High Value 3)	Exceeds High Bar	N/A (Part of middle 75%)	Doing extremely well. Strong candidate for promotion.
HV2 (High Value 2)	Meets High Bar	N/A (Part of middle 75%)	Doing well, exceeding some expectations. Solid performer.
HV1 (High Value 1)	Meets High Bar	35-40%	Average rating, meeting expectations. The largest bucket of employees.
LE (Least Effective)	Needs Improvement	5-10%	Not meeting expectations. At high risk for a Performance Improvement Plan (PIP).

2. Measuring the “What”: How Amazon Quantifies Engineering Performance

Amazon’s engineering culture is famously data-obsessed – and that extends to how it measures performance. Yet contrary to popular belief, Amazon’s best metrics aren’t about counting lines of code or commits. Instead, they’re designed to capture something deeper: the reliability, velocity, and business impact of the systems engineers build.

For leaders managing large-scale software teams, Amazon’s measurement philosophy offers a window into how metrics can drive excellence without reducing people to numbers – when used wisely.

2.1 Code-Level Metrics: Separating Myths from Meaning

Among developers, there’s long been a fear of simplistic, output-based metrics – lines of code written, commits made, or reviews completed. These numbers are easy to track but dangerously misleading.

Amazon’s internal documentation and management training explicitly discourage using these raw figures as performance indicators. As one engineer noted, when a manager began tracking “average revisions per code review,” developers started avoiding necessary edits just to optimize the number – an example of metrics creating the wrong incentives.

Instead, Amazon treats these indicators as diagnostic signals, not performance scores. A drop in commit frequency, for instance, isn’t automatically a red flag – it’s a cue for a conversation. It might indicate that an engineer is stuck on a complex issue, mentoring a teammate, or deep in design work rather than code output. The point is not to measure volume, but to understand context.

Key Takeaway: Metrics should start conversations, not end them.

2.2 Operational Excellence: The Real Performance Metric

At the heart of Amazon’s engineering philosophy is the concept of Operational Excellence – a focus on how well systems perform, not just how fast developers code.

Instead of tracking individual output, teams are measured on the health, reliability, and velocity of the services they own. Amazon’s engineering teams often adopt the DORA metrics framework (popularized in DevOps research) as their primary performance compass:

Deployment Frequency: How often changes reach production.
Lead Time for Changes: How quickly code moves from commit to deploy.
Change Failure Rate: How often a deployment introduces a defect.
Time to Restore Service: How fast the team recovers from production issues.

These metrics shift focus from activity to outcomes – rewarding developers who make a measurable impact on quality, stability, and efficiency. For instance, an engineer who deploys a small code change that cuts latency by 15% adds far more value than one who commits thousands of lines that increase error rates.

Amazon complements these team-level measures with Service Level Objectives (SLOs), tracked through systems like Amazon CloudWatch Application Signals. This enables real-time visibility into whether services meet performance and business goals – and ties individual work back to measurable customer outcomes.

Even the most qualitative aspects of engineering, such as on-call performance, are viewed through this operational lens. How an engineer responds to critical incidents, learns from outages, and contributes to preventing recurrence is a defining marker of maturity and reliability.

Key Takeaway: The highest-performing teams measure success not by how much code they write – but by how effectively their systems run.

2.3 The Next Frontier: AI-Powered Developer Analytics

The rise of AI-driven engineering tools is adding a new dimension to performance tracking – and Amazon is leading the charge.

Through services like Amazon CodeGuru and Amazon Q Developer (formerly CodeWhisperer), the company now has access to an unprecedented layer of granular, real-time developer data. These tools don’t just support developers – they observe them, generating insights on efficiency, code quality, and even cost impact.

Amazon CodeGuru automatically reviews code and identifies performance bottlenecks – flagging the “most expensive lines of code” in terms of CPU or latency, with estimated cost savings in dollars. It also records metrics like MeteredLinesOfCodeCount (lines of non-comment code analyzed), creating a structured performance dataset.
Amazon Q Developer goes even further. It tracks how much code is AI-generated, providing visibility into AI adoption and productivity gains across teams. Dashboards show metrics like “lines of code written by CodeWhisperer” and even daily CSV exports detailing individual developer interactions – effectively building a real-time, per-user productivity map.

While these tools are currently positioned for tracking adoption and improving productivity, the capability to generate per-user activity reports creates a direct pathway to incorporating “AI utilization effectiveness” as a formal, individualized performance metric in the future. This aligns perfectly with Amazon’s core philosophy of using data-driven approaches to evaluate employee performance.²⁷

Table 2: Key Developer Metrics at Amazon: Official vs. Anecdotal

Metric Category	Specific Metric	Role in Evaluation	Supporting Evidence
Code Production	Lines of Code (LoC), Commits, CRs	Anecdotal/Indicator: Not a primary metric. Used by managers as a signal to investigate anomalies.	Source
System Performance	Deployment Frequency, Lead Time, Change Failure Rate, Time to Restore (DORA)	Official/Team-Level: Core metrics for measuring team and service health.	AWS Prescriptive Guidance
System Performance	Latency (e.g., P90, TM95), Success Rate, SLO/SLI Adherence	Official/Team-Level: Critical metrics for service operational excellence.	Operational Excellence – AWS Well-Architected Framework
Operational Health	Time to Resolve Vulnerabilities, Ticket Queue Depth	Official/Team-Level: Measures the team’s effectiveness in maintaining a healthy service.	Metrics for software component management – DevOps Guidance – AWS Documentation
AI Productivity	Lines of Code Generated by Q, Percentage of Code Written by Q	Emerging/Organizational: Currently used to measure AI adoption and impact at a high level.	Introducing Amazon CodeWhisperer Dashboard and CloudWatch Metrics
AI Productivity	Per-User AI Usage Reports (Active Users, Interaction Trends)	Emerging/Individual: Granular data on individual developer interaction with AI tools.	Unlocking the power of Amazon Q Developer: Metrics-driven strategies for better AI coding

Engineering performance is entering the era of continuous, AI-assisted measurement.
Used ethically, it can illuminate productivity patterns and skill gaps. Used carelessly, it risks creating new kinds of pressure and surveillance. The balance between empowerment and oversight will define the next phase of developer performance management.

Key Takeaways for Software Leaders

Don’t measure activity – measure outcomes. Code volume says little about value. Focus on reliability, quality, and delivery velocity.
Build context into your metrics. Use data to inform discussions, not dictate judgments.
Adopt DORA metrics as team-level KPIs. They’re industry-proven and align engineering effort with business performance.
Be cautious with AI analytics. As developer monitoring becomes more granular, ethical use and transparency will be essential to maintain trust.

3. Measuring the “How”: The Unquantifiable, Yet Critical Contributions

While Amazon excels at quantifying engineering outcomes, it is equally rigorous in evaluating the intangibles – the behaviors, influence, and leadership qualities that define a high-impact software engineer. Technical output alone is not enough; the system rewards engineers who can think broadly, communicate effectively, and scale their impact across teams.

3.1 Writing as a Core Engineering Skill

Amazon’s famously document-driven culture elevates writing to a hard skill. Meetings often begin not with slides, but with silent reading of a detailed narrative – usually a six-pager or Press Release/FAQ (PR/FAQ). These documents are more than administrative artifacts: they are performance proxies.

For developers, the design document is a central measure of impact. A strong document demonstrates:

Clear problem definition and goal alignment
Thorough consideration of alternatives
Transparent assumptions
Operational and observability plans

Defending the document requires data, foresight, and confidence. Poorly structured or under-defended documents signal weak thought processes, directly affecting credibility and influence. Tools like Amazon’s Capital Project Collaboration Platform (CP2) streamline document exchange and review, emphasizing the importance placed on written communication as a core competency for engineering success.

3.2 Evaluating Mentorship, On-Call, and Influence

The path to career progression at Amazon, particularly to senior levels, is gated by contributions that extend beyond individual coding tasks. The system is intentionally designed to filter for individuals who can scale their impact through leadership and influence.

Mentorship is an explicit expectation and a key differentiator for promotion. While SDE Is are expected to learn, SDE IIs are formally tasked with mentoring junior engineers. This responsibility grows with seniority, becoming a core function for Principal and Distinguished Engineers.³⁴ An engineer’s promotion packet is heavily scrutinized for evidence of this kind of influence. A common reason for an SDE II’s promotion to SDE III to be blocked is not a lack of technical execution, but a failure to demonstrate sufficient scope and influence, which are often proven through activities like mentorship and leading smaller teams.

On-call performance is evaluated through the lens of the “Operational Excellence” pillar. Success is not merely defined by fixing an outage quickly. It encompasses a broader set of behaviors: learning from the failure, conducting a thorough root cause analysis, improving operational procedures to prevent recurrence, and communicating effectively with stakeholders throughout the incident. The ultimate goal is to contribute to the team’s ability to reduce the “Time to Restore Service” DORA metric, turning a reactive event into a proactive improvement.

Influence and Scope are the currency of senior engineering roles. To be promoted, an engineer must demonstrate impact that transcends their immediate team. This is assessed through contributions to cross-team projects, shaping the technical direction of larger components, and the overall complexity and ambiguity of the problems they solve. The promotion process requires extensive documentation – a packet of 5+ pages for an SDE II promotion and a staggering 15+ pages for an SDE III – that explicitly details the candidate’s scope, influence, and impact on the business.

3.3 Peer Feedback as a Cultural Mirror

Amazon’s Forte review is the formal channel for capturing these intangible contributions. Peer feedback evaluates qualities such as:

Communication effectiveness
Teamwork and collaboration
Responsiveness under pressure
Alignment with Leadership Principles

While feedback is anonymous to the recipient, managers see the authorship. This design encourages honesty, but also creates a tension: peers may hesitate to offer blunt criticism, producing overly polite feedback that reduces its practical utility.

Nevertheless, peer commentary remains a critical cultural signal. Being respected, collaborative, and “easy to work with” can significantly affect how engineers are perceived and rated – often as much as their technical accomplishments. For managers, this underscores the importance of actively fostering a culture of constructive, candid feedback.

4. The Engine of Attrition: Stack Ranking, URA, and the PIP Gauntlet

Beneath Amazon’s structured evaluation framework lies a powerful – and controversial – mechanism designed to enforce high performance and manage employee churn. This system combines stack ranking, Unregretted Attrition (URA) targets, and the Performance Improvement Plan (PIP) process into a tightly interwoven machine that shapes both careers and culture.

4.1 Stack Ranking: Calibration at a Cost

Despite official denials, extensive employee accounts confirm that the Organizational Leadership Review (OLR) functions as a stack ranking system. Employees are evaluated not only against role expectations, but relative to their peers.

Managers receive explicit forced distribution targets:

Top Tier (TT): ~20% of employees in larger teams
Least Effective (LE): 5–10%, depending on team size and context

This creates a paradox: competent engineers can receive low ratings simply to satisfy the quota. One manager described the dilemma vividly:

“Whomever is the least best has to go. It’s like firing someone who got an A- when the rest of the class got A’s.”

The system’s critics warn that this model fosters a cutthroat, predatory culture, undermining collaboration and knowledge sharing – key components of high-performing software teams. Yet, from Amazon’s perspective, stack ranking ensures the bar continually rises, maintaining a workforce composed of top-tier talent.

4.2 Unregretted Attrition (URA): Institutionalizing Churn

Stack ranking is fueled by a corporate mandate known as Unregretted Attrition (URA) – a target percentage of employees that the company expects to leave each year, voluntarily or otherwise. Reports suggest this target hovers around 6% annually.

URA transforms attrition from a byproduct into a strategic lever. Managers are incentivized to identify underperforming employees or those deemed misaligned with cultural expectations. To preserve their strongest team members while meeting URA goals, managers may intentionally hire candidates likely to underperform – creating a “buffer” group that fills the Least Effective bucket and is eventually managed out.

In essence, URA answers the “why” behind stack ranking. It formalizes churn as a business process, ensuring that perceived bottom-tier performers are regularly cycled out. This approach aligns with Amazon’s core philosophy: the bar is always rising, requiring constant adjustment of the workforce to make room for new talent.

4.3 The Performance Improvement Plan (PIP): The Exit Pathway

Stack ranking and URA culminate in Amazon’s Performance Improvement Plan (PIP) process, a structured – and notoriously difficult – program designed to manage underperforming employees.

Phase 1: “Focus” or the “Dev List”

The first stage is often clandestine, known internally as “Focus” or the “dev list.” Employees flagged as Least Effective during the OLR may be added without formal notification. While officially framed as coaching or support, this stage is widely understood as the first step toward eventual exit.

Secrecy is a key feature: employees may continue receiving positive feedback through Forte, even as OLR ratings suggest otherwise. This discrepancy creates continuous pressure, forcing employees to consistently prove their worth. Internally, this tension aligns with Amazon’s philosophy of “purposeful Darwinism” – driving high performance through calculated uncertainty.

Phase 2: The Formal PIP or “Pivot” Program

If an employee’s performance does not “improve” while on Focus, they are escalated to a formal PIP, a program officially known as “Pivot”. Upon entering Pivot, the employee is presented with a stark choice:

Option 1: Leave Immediately. The employee can accept a severance package (the “Tier 1” offer) and resign.
Option 2: Attempt the PIP. The employee can choose to stay and attempt to meet a set of demanding, often unrealistic, performance goals within a short timeframe, typically 30 to 90 days.

The success rate for those who choose to fight the PIP is exceedingly low. It is widely regarded as a formality designed to document cause for termination. Estimates suggest that while about 10% of Amazon’s corporate employees may be put on a PIP, the vast majority do not successfully complete it. Should an employee fail the PIP, they may be offered a smaller severance package or have the option to appeal the decision to a panel of peers, though this appeal process also has a low success rate. The severance offered systematically decreases at each stage, incentivizing an early exit.

This entire sequence – from the URA target to the Pivot program – functions as a single, self-perpetuating attrition machine. The business objective for churn (URA) drives the evaluation method (stack ranking), which in turn creates a “need” for low performers (fueling “hire to fire”). These identified employees are then funneled into a PIP process that is designed to result in their departure, thus fulfilling the original URA quota. It is a closed-loop system of human capital logistics.

Table 3: The Amazon PIP Process: From Focus to Pivot

Stage	Description	Typical Timeline	Key Outcomes & Options
1. Identification	Manager labels an employee as an underperformer, typically as a result of an “LE” rating in the OLR stack ranking.	Annual (Post-OLR)	Employee is designated for performance management.
2. Focus Program	Employee is enrolled in the “Focus” or “dev list” program for coaching. This is often done without the employee’s knowledge.	Ongoing	Employee is under scrutiny. Failure to “improve” leads to the next stage. Internal transfers are often blocked.
3. Pivot Program Entry	If performance is deemed unsatisfactory, the employee is formally placed into the “Pivot” program.	N/A	Employee is presented with a formal choice.
4. The Pivot Choice	Employee must choose between two options.	1-2 Months	Option A: Leave Amazon immediately with a Tier 1 severance package. Option B: Stay and attempt the formal Performance Improvement Plan (PIP).
5. PIP Execution	If Option B is chosen, the employee works to complete difficult tasks and meet specific goals outlined in the PIP document.	30-90 Days	High probability of failure. The plan’s goals are often considered unrealistic.
6. PIP Outcome	Manager determines if the employee passed or failed the PIP.	End of PIP Period	Pass: Employee exits the Pivot program and keeps their job. Fail: Employee is moved to the final stage before termination.
7. Post-Failure Options	An employee who fails the PIP has a final set of choices.	N/A	Option A: Leave with a smaller, Tier 2 severance package. Option B: Appeal the failure to a jury of peers. If the appeal is lost, the employee is fired with minimal severance.

Conclusion: High Performance Meets Purposeful Darwinism

Amazon’s performance management system is a carefully engineered ecosystem combining a formal evaluation framework, metrics focused on system health, and a quota-driven attrition engine. This dual approach produces elite performers while maintaining intense pressure, embodying what insiders call “purposeful Darwinism.”

While Forte reviews and Leadership Principles create a developmental narrative, career outcomes are largely determined by OLR calibrations and forced rankings tied to corporate attrition targets. Success requires not only technical excellence but also strong writing, mentorship, and cross-team influence.

Ultimately, Amazon’s system is a human capital strategy: it drives operational excellence and innovation, but at the cost of a high-pressure, opaque culture. For engineering leaders, the key lesson is that metrics, culture, and incentives profoundly shape behavior, and balancing rigor with human-centric practices is essential to sustain high-performing teams.

At Developex, we help tech companies build high-performing engineering teams by combining rigorous processes with a people-first approach. If you want to create a culture that drives results without sacrificing collaboration and long-term retention, let’s connect.