Engineering Metrics for Beginners
Get an understanding of table-stakes engineering metrics.
I see far too much mis-use of metrics in the industry. From people using Story Points as a functional evaluation tool to lines of code as a productivity metric. Much of it is innocent mis-use by misinformed people who follow frameworks like Scrum or DORA without taking the additional steps of understanding their root principles. Others use it with a bit more of a unethical tilt to trick others.
If you’re an engineering leader, you owe it to your team to have a framework for the metrics you are tracking.
Start with why
The first thing to look at isn’t the metric itself, but why you are looking at them. It generally boils down to a few things:
To follow a leadership or management directive
To inform a narrative or decision
To evaluate and judge
To satisfy a curiosity
To improve over time
What metrics you look at and how you use them depends on why you’re looking at them.
If you’re following a leadership or management directive, you ought to follow the rules of your leadership first. Then, dig in to why they are asking and how they are using them. The reason is likely one of the below.
If you’re collecting metrics to inform a narrative or decision, first ask yourself - do the deciders actually care about the metrics? There’s a lot of lip service to “data-driven” decisions, but the fact is data doesn’t matter in many decisions at many companies. Don’t waste your time if the deciders don’t care - find out what they do care about and do that instead.
If you’re using them to evaluate and judge - you’re probably using metrics incorrectly. You can use them to inform decisions, but engineering is often highly dependent on context, and it’s quite inaccurate to judge an individual engineer or even a team’s performance purely using metrics.
If you’re just trying to satisfy a curiosity, I’d say metrics are a waste of your time. Only collect metrics if you’re actually intending on doing something with them - otherwise it’s just a waste of attention and busy-work from people collecting them. It’s nice to be able to say you have data backing up your decisions, but if the data wasn’t a true factor you’re just playing data-theater.
Finally, if you’re using metrics to inform improvements over time, then that’s pretty much exactly what you should be using metrics for - read on.
What are you trying to improve?
Metrics indicate things - attributes about the thing they are measuring. If you’re an engineering leader, you probably care about the following:
Delivery - speed, quality, quantity, efficiency, predictability
Service - uptime, quality
Capability - resourcing needs, tech capability, team capability
People - happiness, growth, retention
Impact - operational cost
If you’re trying to improve Delivery, you’ll look at a different set of metrics than if you were trying to improve Service. At some point, after the low-hanger are gone, metrics start becoming tradeoffs - improving one decreases the other. For example, you might increase speed, only to start suffering quality issues, or you might increase uptime only to suffer morale issues from longer on-calls.
Tradeoffs are real - and it’s your job to articulate them while creating new capabilities that reduce or remove the trade-offs.
Where to begin with metrics?
The easiest things first:
How healthy is your delivery pipeline?
How sustainable is your system operations?
How observable is your product?
How healthy is your delivery pipeline?
The initial view of engineers is - they build things. To build things, you need to have a pipeline from Idea to Production. That pipeline has stages that ultimately can be measured, each with nuanced areas that can become bottlenecks.
A delivery pipeline should be able to sustainably take a code change to production.
Sustainably means:
It can be repeated indefinitely.
It processes changes at the rate those changes are being made.
It completes successfully within an acceptable failure rate.
The metrics that indicate the health of the delivery pipeline are:
Deployment Frequency
Deploy Failure Rate
Change Volume Rate
Change Failure Rate
Cycle Time
Deployment Frequency is the number of deploys the team makes per day.
Deploy Failure Rate is the number of deploys the team made that had an issue.
Change Volume Rate is the number of changes delivered per deploy, on average - often a Pull Request.
Change Failure Rate is the number of changes that had an issue (eg. defect).
Cycle Time is the amount of time a change spent in an engineering phase, end-to-end.
Balancing the metrics
You might think “Oh, we should just work to increase deployment frequency and change volume rate” but that’s not necessarily the case. While higher numbers are better and generally indicate a healthier pipeline, they aren’t the goal.
The goal is sustainable delivery. Your pipeline should be able to deploy changes at the frequency and rate at which they are needed. That’s your target. You should aim to have a ceiling on Change Failure Rate and a floor on Deployment Frequency that is proportional to your Change Volume.
Put another way - you need to deploy changes as often and in as small increments as needed to deliver all your changes while within your target change failure rate. Any more and it’s waste. Any less and it’s a bottleneck.
Indicators
You’ll have a baseline. Most companies without metrics are shocked when they first calculate them and see change failure rates of 20%+.
Over time, you can use changes to the baseline as indicators of whether a particular process or change has helped or not. Implement a QA process? You can see how that affects your cycle time. Increase automated test coverage? See how your change failure rate is impacted.
General rules of thumb
Increase the number of deploys to reduce your change volume rate. The more deploys you do, the fewer changes you need per deployment, which also decreases your change failure rate and makes it easier to identify the source of issues due to the smaller batch size.
If your change failure rate significantly increases - slow down your deployment frequency and change volume. You need to fix the problem upstream.
Ensure you benefit from extra effort. Going from 3 deploys per day to 4 deploys per day may mean absolutely nothing if you only create 3 change units (eg. a Pull Request) per day. That extra deploy is actually meaningless. Whereas, going from 3 deploys to 4 per day means the difference between a bottlenecked process or smooth delivery if you create 4 changes per day.
Larger batch sizes have higher levels of risk. They’re harder to debug if there’s issues, they often have to roll back in a single unit, and it increases the monitoring surface area post-deploy.
How do you measure it? Tools like Jellyfish are excellent for this, but you can also be scrappy about it. Use basic Github PR counts, or have an LLM write a script to count merges and cycle times from your git history. Create a simple webhook for deployments that stores the data in a table that your deployment script calls.
Lines of code isn’t a delivery metric. Your most impactful engineers might write 10 lines of code. Your worst engineers might write 100,000. Lines of code mean nothing.
How sustainable are your system operations?
Uptime
In most cases, the goal is not 100% uptime.
99% uptime is ~7 hours of monthly downtime.
99.9% uptime is is ~43 minutes of monthly downtime.
99.99% uptime is ~4 minutes of monthly downtime.
99.999% uptime is ~26 seconds of monthly downtime.
If your application is used only in a specific region during the 9-5, then you can have a completely successful operations with abysmal uptime.
Time ranges of the downtime matter. 99% uptime can be sufficient as long as you have 100% uptime during business hours.
Securing that fifth 9 is actually quite difficult, requiring an exponential increases in cost through investments in system resilience - whether that’s additional availability zones, replicas, failovers, etc. For many companies, that fifth nine is not worth it.
How do you measure it? Set up a tool like Pingdom - just a few bucks a month. Point it at your system, make it run once a minute.
Mean Time to Restore
A lot of leaders optimize for not having incidents, but it’s more effective to optimize for fast recovery from incidents.
Not having incidents can ultimately be luck and over-caution: incidents are actually fine for many business cases, provided they don’t cause lasting damage. The cost of having no incidents can outweigh the benefit (unless you’re working on mission-critical or life-critical projects).
Your time to restore is dependent on:
How fast you detect an issue
How fast your team or systems responds
How fast that response can restore service
These are impacted by:
Your level of observability and alerting
Your level of standard practice and process
Your level of team education and training
Your level of technical controls and capabilities
Your goal as an engineering leader should be:
Quickly detect an incident is occurring using automations, and not people.
Have robust damage mitigation (eg. backups, staged releases) and rollback levers (eg. feature flags).
Educate your team on incident response, including playbooks, runbooks, and tools.
How do you measure it? Ensure every single incident has a postmortem report written that has exact timestamps of incident start and resolution. Then, add the numbers up. Categorize the incidents - outage, workflow breaks, data loss, etc. A manual process can go a long way.
How observable is your product?
If you’re company makes money through a business event, you need to track that business event. For example, if you’re an eCommerce company, your event might be a Sale of an Item.
You should be able to tell, in real-time:
How many of those business events have occurred in the past minutes, hours, days, and weeks.
How that trend has changed over a period of time, including the past 5 minutes.
From there, you can add properties to those events to identify more granular segments, such as:
What specific user types are performing those events
Where are those users located
How valuable are those events over time
This doesn’t have to be complicated. You can attach a product analytics tool, you can write a record into your database, or you can log it an create a dashboard off of your log stream.
From there, you can then monitor user interaction events - clicks, views, etc. to see what specific funnels are performing well, etc. You can add events for user signups, acquisition, churn - all things you can track day-to-day to get a heartbeat for your product.
Ultimately, the important part is to know it’s happening and watch it for changes. If your success rate suddenly drops - that’s an incident you should look into. You system might be running fine, but there’s an outcome issue that needs to be resolved.
Moving forward
Once you have the basics, you can start tracking other important metrics:
Team morale - you can use surveys, retention rate, etc.
Costs - $ spent on vendors, headcount, etc.
Just remember: metrics are indicators. They tell you something has happened, but not why it happened or whether it’s good or bad. The goal is not ‘numbers go up’ - the goal is ‘understand what has changed, and why’.
It might be a perfectly acceptable outcome to see a doubling in change failure if it results in a tripling in delivery throughput. It might be perfectly reasonable to see a halving on delivery throughput if half the team was at an offsite.
Don’t judge purely off of numbers - use them to indicate.


