The Problem with Measuring Engineering Productivity
Engineering productivity is one of the most important and least well-understood aspects of software organisations. Leaders know intuitively that some teams deliver more value than others, but quantifying why, and intervening effectively, remains elusive.
of engineering leaders say they can effectively measure developer productivity.
McKinsey Digital, 2023The difficulty stems from a fundamental characteristic: software engineering productivity is multidimensional. It cannot be captured by any single metric, any single framework, or any single perspective. A team that ships fast but breaks production constantly is not productive. A team that writes perfect code but never delivers is not productive. A team that delivers well today but burns out its engineers is not sustainably productive.
Despite this, organisations repeatedly default to simplistic measures. Lines of code, commit counts, story points completed, tickets closed. These are easy to extract from tools, easy to put on dashboards, and easy to report upward. They are also deeply misleading.
Productivity itself is one of the hardest things to measure in software engineering. The most common mistake is to treat a complex, multidimensional concept as if it were simple and one-dimensional.
The industry needs a model that captures the full dimensionality of engineering performance without creating perverse incentives. That model must be grounded in observable behaviours, backed by established research, and practical enough to drive real improvement.
Why Activity Metrics Are Dangerous
Activity metrics measure motion, not progress. They count how much a person or team did, without assessing whether what they did was valuable, sustainable, or well-executed.
You can't measure the productivity of a developer by lines of code any more than you can measure the productivity of an aircraft factory by weight.
The dangers are well-documented in research and widely observed in practice:
- They confuse output with outcome. A developer who commits 50 times per day may be thrashing. A developer who commits twice may have solved a complex architectural problem that unblocks the entire team.
- They penalise senior work. Design, architecture, mentoring, code review, and technical strategy produce enormous value but generate few countable artefacts. Activity metrics systematically undervalue the most experienced engineers.
- They incentivise gaming. When commits are counted, people split commits. When PRs are counted, people submit trivial PRs. When story points are counted, estimates inflate. The metric becomes the target, and the underlying goal is lost.
- They create anxiety without improvement. Teams measured by activity feel watched, not supported. The result is presenteeism in code, people optimising for visible activity rather than meaningful contribution.
of developers say productivity metrics make them feel surveilled rather than supported.
Stack Overflow Developer Survey, 2024When a measure becomes a target, it ceases to be a good measure.
The Elevate Framework deliberately avoids pure activity metrics. Instead, it focuses on behavioural patterns that correlate with genuine team health and delivery performance, patterns that are difficult to game because gaming them would actually require doing the right thing.
Why DORA and SPACE Are Useful but Incomplete Alone
The DORA framework (DevOps Research and Assessment) represents the most rigorously validated approach to measuring software delivery performance. Its four metrics, deployment frequency, lead time for changes, change failure rate, and mean time to recovery, have been shown through years of research to predict both organisational performance and team wellbeing.
The SPACE framework (Satisfaction, Performance, Activity, Communication, Efficiency) broadened the conversation by arguing that developer productivity must be understood across multiple dimensions, and that no single metric or framework suffices.
more frequently: that is how often elite performers deploy compared to low performers.
DORA State of DevOps Report, 2023Both are valuable. Neither is complete for operational use:
- DORA focuses on the delivery pipeline. It tells you how fast and safely code moves from commit to production. It does not measure code quality, team collaboration patterns, onboarding effectiveness, or individual growth. A team can have elite DORA metrics while building features nobody uses, accumulating technical debt, or losing its best engineers.
- SPACE is conceptual rather than operational. It describes the dimensions that matter but does not prescribe specific metrics, thresholds, or improvement mechanisms. It is a thinking framework, not an implementation framework.
- Neither drives daily action. Knowing your lead time is 3 days tells you where you are. It does not tell you what specific behaviour to change tomorrow to improve it. The gap between measurement and improvement remains unfilled.
No single metric or framework can capture developer productivity. You need to look at multiple dimensions.
Elevate builds on both frameworks. It takes DORA's rigour and SPACE's breadth, then adds three things: additional dimensions they don't cover, a direct connection from metrics to behaviours, and an operational model for turning measurement into improvement.
The Six Elevate Pillars
The Elevate Framework measures engineering performance across six interconnected pillars. Each captures a distinct dimension of team health. Together, they provide a comprehensive view that no single framework achieves alone.
Velocity & Throughput
Speed and efficiency of software delivery. How fast teams move from idea to production while maintaining sustainable pace.
Code Quality
Maintainability, reliability, and technical excellence of the codebase. The leading indicator of future delivery performance.
Operational Resilience
System reliability, failure handling, and recovery speed. How well the team handles the unexpected.
Collaboration & Flow
Team effectiveness and process smoothness. How well people work together and how work moves through the system.
Onboarding & Enablement
How quickly new team members become productive and how well the team supports continuous learning.
Progression by Craft
Individual growth, skill development, and engineering excellence over time. Sustainable performance through career development.
Mapping Pillars to Observable Behaviours
A framework is only useful if it connects to things teams can actually observe and change. Each Elevate pillar decomposes into specific, measurable behaviours that can be tracked through data signals, primarily from version control systems like GitHub.
The distinction between behaviours and outcomes is critical:
- Behaviours are what people do: submit small PRs, review within 4 hours, maintain CI green, distribute review load.
- Outcomes are what results: faster delivery, fewer defects, better resilience, shorter onboarding.
You do not rise to the level of your goals. You fall to the level of your systems.
Elevate measures behaviours because they are actionable. You cannot directly “improve lead time”. It is an outcome. But you can “review PRs within 4 hours” or “keep PRs under 400 lines”. Those are behaviours that, when adopted consistently, produce the outcome.
improvement in delivery metrics within 3 months when teams adopt small, specific behaviour changes.
Industry research aggregateThis is the core innovation of the Elevate approach: converting abstract performance dimensions into concrete, daily actions that teams can adopt incrementally.
How Teams Improve Through Small Habit Changes
Large transformation programmes fail because they demand too much change at once. They overwhelm teams, create resistance, and rarely sustain beyond the initial push. The research on habit formation is clear: lasting change comes from small, consistent adjustments that compound over time.
The most dangerous kind of waste is the waste we do not recognise.
The Elevate model applies this principle to engineering improvement:
- Identify the highest-leverage behaviour. Of all the things a team could improve, which one change would produce the most benefit right now? For a team with 48-hour review latency, the answer might be “reduce time to first review to under 5 hours.”
- Make it specific and time-bound. Not “improve code review” but “this week, respond to every PR within 4 hours during working hours.”
- Track it automatically. Improvement goals must be measured without manual effort. If tracking requires overhead, it won't sustain.
- Compound over iterations. Once a behaviour is established, move to the next highest-leverage change. Over weeks and months, these small shifts produce large cumulative improvement.
This approach works because each change is small enough to be achievable within a single sprint, meaningful enough to move the needle, and measurable enough to know whether it happened.
How Poggle Operationalises the Elevate Framework
Poggle is the AI product that turns the Elevate Framework into daily practice. It connects to GitHub, analyses team behaviour against the six pillars, and generates personalised goals for individuals and teams.
- Connect and baseline. Poggle analyses historical GitHub activity to establish where the team currently sits across all six pillars.
- Identify leverage points. AI identifies the specific behaviours where improvement would produce the most benefit, given the team's current profile and context.
- Generate goals. Specific, measurable goals are created for individuals and teams. Each goal targets a single behaviour change, is achievable within a sprint, and maps to a specific Elevate pillar.
- Track automatically. Progress toward goals is measured continuously through ongoing GitHub analysis. No manual reporting required.
- Adapt and progress. As behaviours improve, goals evolve. The system continuously identifies the next highest-leverage improvement, creating a compounding cycle of incremental betterment.
of engineering improvement programmes fail because they rely on manual tracking and one-off initiatives rather than continuous, automated feedback.
McKinsey Technology Trends Outlook, 2024This is not a dashboard. It is not a reporting tool. It is a system that actively drives improvement by translating framework-level principles into individual-level actions.
Limitations and Responsible Use
No measurement system is perfect. The Elevate Framework has known limitations that users should understand:
It is wrong to suppose that if you can't measure it, you can't manage it. The most important things cannot be measured.
- GitHub signals are proxies, not ground truth. PR size does not directly measure code quality. Review latency does not directly measure collaboration health. These signals correlate with the underlying dimensions, but the correlation is imperfect.
- Correlation is not causation. Improved metrics do not guarantee improved outcomes. A team that reduces PR merge time might do so by rubber-stamping reviews, which would hurt quality despite improving a velocity signal.
- Context matters enormously. What constitutes “good” varies by team size, product domain, regulatory environment, growth stage, and dozens of other factors. Benchmarks are guides, not absolutes.
- This is not a performance management tool. The framework measures team and individual behaviours to support improvement. It should never be used for punitive comparison, stack ranking, or performance evaluation.
- Known blind spots exist. The framework cannot see architecture decisions, product quality, user satisfaction, market fit, or interpersonal dynamics that don't manifest in code patterns. It measures one important dimension of engineering health, not all of it.
Responsible use means: interpreting signals in context, combining quantitative data with qualitative understanding, never using the framework as a weapon, and always acknowledging what it cannot tell you.
References
- Forsgren, N., Humble, J., Kim, G. (2018). Accelerate: The Science of Lean Software and DevOps. IT Revolution Press.
- Forsgren, N., Storey, M.A., Maddila, C., Zimmermann, T., Houck, B., Butler, J. (2021). “The SPACE of Developer Productivity.” Queue, 19(1).
- DORA Team, Google Cloud. Accelerate State of DevOps Reports(2018-2024). Available at dora.dev.
- Murphy-Hill, E., et al. (2019). “What Predicts Software Developers' Productivity?” IEEE Transactions on Software Engineering.
- Graziotin, D., Fagerholm, F., Wang, X., Abrahamsson, P. (2018). “What happens when software developers are (un)happy.” Journal of Systems and Software, 140, 32-47.
- Sadowski, C., Zimmermann, T. (Eds.) (2019). Rethinking Productivity in Software Engineering. Apress.
- Clear, J. (2018). Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones. Avery.
- McKinsey & Company (2023). “Yes, you can measure software developer productivity.” McKinsey Digital.