Metrics for first-time managers (and beyond)

Márcio Azevedo
7 min readJun 7, 2020

When I was responsible for managing a team for the first time, I had a lot of doubts on how to measure the team’s performance and what should be the things that I should value but soon I’ve found out that actually is very common for first-time managers (and even experienced ones) to feel those same doubts. Decide on which metrics you want to measure is very important because, as in physics (Observer effect), that will influence the way the team improves (it will, most likely, be used for setting goals) and sends a clear message of what is really important for the company so you should know why you are measuring it. Just take this (bad) example, if you start measuring the number of lines of code that are pushed into a code repository per week and you set a goal to increase that number you’ll end up with programs that have more lines of code than it would be really needed, and more lines of code leads to more complexity (it becomes harder to read code, there are more points of failure, etc) and increased difficulty to maintain the software and so on and so forth. So, before discussing and setting metrics with the team you should know what’s more important for the company and for the team. Should you value complex features that follow strictly what was defined in long-detailed requirements or have a more lean approach and iterate on each feature until it achieves a business goal? Is quality important? In my personal experience, I tend to gather those metrics into 3 categories:

  • Application metrics — or service level indicators, are metrics that measure the performance of your application.
  • Business metrics — metrics that measure the performance of the business.
  • Team metrics — metrics that measure the team’s performance.

Application Metrics (or service level indicators)

These metrics allow (ideally) realtime monitoring and automatic alerts. For example, consider measuring the average response time of a specific endpoint of your REST API, if it degrades above a certain threshold you might have a performance issue that can translate into a bad experience for your users/customers and you want your team to act as soon as possible, so alerts should be in place (there’s a lot of tools that enable this, just take a look to New Relic for example). This might require a scale-up of your infrastructure or even re-think the way your application is built. Also having this kind of metrics allows you to do some forecasting and anticipate future problems, like knowing what is the normal behaviour (average response time) and what can be the impact when you have an increase of 50x the website’s average load. There are a lot of service level indicators used across industry and this is important to help to define the SLAs and the SLOs of the application (SLA vs SLO VS SLI), here are a few examples:

  • Request Count
  • Request Error Ratio
  • 95th Request Percentile Latency (milliseconds)

Business Metrics

Business metrics are particularly important to measure the impact of new features and to assess the current ones. For example, you may want to measure the number of users that start the registration process every day, how much time (on average) they take to complete it or how many of them quit before completing it successfully. This allows you to draw some conclusions and to iterate on the registration process to make it, for example, shorter or improve the performance of the underlying APIs and see how that reflects on the % of users that complete successfully the registration process. This also allows to establish some important relations between business and application metrics, like increasing the average response time of some of the APIs can directly impact the business and it will help the team (and the company) to prioritise between improvements/efficiencies of existing services/processes over a new feature.

One very important thing is to make sure that the business metrics monitored by the team are aligned with the Product and/or the Business area that relates to, to avoid any conflict on potential impacts of the changes that are being done in the application/product and on the business side.

Team metrics

In this last category, I tend to include not only the metrics that are specific to the team and to their processes but also the metrics that establish the relation between the scope of the team (their services, applications, etc) and their performance (I’ll get to that in a moment). So, first, let’s start by the team processes.

Today is hard to find a team that isn’t doing agile, but the reality is that true-agility is hard, it depends on lots of things (technical-wise and team/organisation-wise) and most of the teams struggle to achieve it. The lead indicators to measure this are lead time and cycle time (check this for detailed definitions). If a team can release code to production in a couple of days instead of a couple of weeks it’s a very good sign of agility, especially in the long-run (where small apps tend to grow to monoliths or when startups scale their teams) — so this is something you can measure directly with the cycle time. Additionally, lead time allows you to see if the backlog is growing at a higher pace than the team’s throughput. This can lead to a situation where the team can start to be a bottleneck for the product/business teams and that’s a sign to start thinking about ways of scaling, either by growing the team, splitting the team and growing or simply by hiring another team from scratch. Also, these are some of the few indicators that can be used to compare teams, namely their maturity in terms of processes, for example, team A can have a cycle time of 4 days (on average) just because it invested a lot on automation (tests, deployments, etc) or in their refinement process while team B still has 12 days of average cycle time.

Adding throughput (number of work items that are released to production per week if you’re doing Kanban or per sprint if you’re using Scrum) and deployment frequency allows the team to measure not only their improvements week-over-week (typically cycle time can be decreased by breaking work items is smaller ones, this leads to few changes in production, easy to manage) but also helps doing some forecasting and providing an easy way to estimate and plan future initiatives (bare in mind throughput, unlike lead and cycle time shouldn’t be used to compare teams). Adding deployment frequency and promoting more and more deployments creates the right mindset by sending smaller and smaller changes into production.

So, if on one hand, cycle time, lead time and throughput promote pace and velocity you may want to balance that with metrics that ensure good quality of what’s being delivered like adding the number of open bugs and/or the number of issues/incidents happening every week to ensure that quality is as important as pace and that quality is built-in in every new feature from the beginning (the same line of thinking can and should be applied to security, etc). On top of that, every team should also measure how much of their time/backlog is related with maintenance (bugs, support or simply doing minor repeating tasks like cleaning up some database table or changing a configuration of a marketing campaign in production) versus product development tasks — maintenance % should be as small as possible and shouldn’t be higher than 20%. Tech initiatives (including refactors and re-architectures) should be part of the product development but that’s another topic for another post. :-)

Another set of metrics that I’ve found particularly interesting and that relates to this last point of maintenance % are the failure metrics (MTTF, MTTR, MTBF). An engineering team should invest enough to ensure that MTTR is as short as possible for obvious reasons (if you have an incident affecting millions of customers you want it to be solved as soon as possible, just think of downtime in the middle of a Black Friday :-/ ) or that MTTF and MTBF are as high as possible. Not only it should invest upfront in quality (especially in automated tests) but also leveraging all the well-known release practices, like canary deployments, releasing a new feature/version to a small set of customers before expanding it to all customer base.

Most of these metrics are objective metrics and straightforward to get, but there’s also some subjective things that you should measure, that are really important and influence the team’s performance namely, knowing the motivation levels of the team and of each individual, likelihood of staying within the team and the company (because in software, teams tend to improve over time and stability is key). These are hard to measure, but there are some things that might help and that should be a complement to your regular one-on-ones, like the Employee Net Promoter Score (or just eNPS) or even leverage some existing tools that provide some surveys and insights for you to act on like Humu that measures mostly Happiness and Retention (likelihood to stay in the team/company). These are particularly helpful if you’re a manager of managers and a good tool that can trigger your skip-level one-on-ones to provide you with more insights (quoting Saving Private Ryan: “There’s a chain of command: gripes go up, not down.” ;-) ). And with these metrics you can balance the team processes and practices with their well-being and motivation, promoting a healthy and challenging environment.

Finally, the missing piece to ensure that every team will leverage these metrics and will improve overtime is choosing the goal-setting framework. These frameworks should enable continuous improvement and promote alignment across the company so that an improvement of a metric does not conflict with an improvement of another. For example, let’s say, a new feature will require a scale-up of the infrastructure doubling its costs and the same team has set a goal to decrease that same cost. In my particular experience, the OKRs framework is a good framework for this as long as it is used properly and that the organisation is effectively aligned.

Depending on what type of organisation you’re at (tech-driven, sales-driven, product-driven, etc) you’ll find different metrics being used across different companies and, because metrics are incentives, the better way to improve something is by measuring it. In this particular times (at the time I’m writing this blogpost Humanity is facing the Covid-19 global pandemic) I see a lot of managers struggling with this, especially with team metrics and how to follow up their performance, creating more and more meetings and increasing video-conference meetings fatigue.

--

--