(Download the cards & instructions as or ) (Translations of this article: , )
A lot of companies experiment with ways of measuring and visualizing how their teams are doing. They’re usually called “maturity models”, and involve some sort of progression through different levels.
The intent of these types of models is usually benign – for example managers or coaches in larger organizations who want to get a sense of where they should focus their improvement efforts, spot systemic problems, and help teams become more self-aware so they can focus their improvement efforts too.
We prefer to use other terms like “health check model”, because “maturity” sounds a bit… well…. patronizing. Plus, most of our models don’t involve progressing through different levels, and the primary audience is the team itself rather than management.
Organizational improvement work is very much a guessing game (how do you know what needs to be improved, and how will you know if it’s improving?). A systemic approach with clear visualization can reduce some of the guessiness.
It varies. Sometimes a model like this can be really helpful. Sometimes it’s more like “meh” – people dutifully do the workshops or surveys or whatever, and the data is then ignored.
Beware though. At some companies we’ve seen models like this become a monster, a systemic tool of oppression causing suboptimization and fear as managers use the “maturity model” to judge the teams and pit them against each other, and teams hide their problems to look good. Not a pretty picture!
Here’s a radically generalized “chance of success” pie-chart based on what we’ve seen so far at various companies:
However, although the potential damage is worse than the potential gain, there IS a potential gain, and there are ways to avoid the disaster scenario.
At Spotify we’ve done careful experimentation with this for several years, and found some ways that work fairly OK (as in more gain than pain). At best Helpful, at worst Meh, and so far never a Disaster. We’ve introduced this health check model to several other companies as well and heard similar results, so we figured it’s time to write an article :o)
When checking the health of a (our term for a small, cross-functional, self-organizing development team) there’s really two stakeholders:
The first step in solving a problem is to be aware of it. And this type of visualization makes it harder for everyone to ignore the problem.
We do basically three things:
We have several variants, one is simply called the “squad health check model”, others are called things like “fluent@agile game” and “quarterly reflection” (maybe later articles on that). The health check model is an improved version of the old “autonomous squads” quarterly survey mentioned in the 2012 article .
Here’s a real-life example of health check output for one tribe:
It shows how 7 different squads in a tribe see their own situation. Color is current state (green = good, yellow = some problems, red = really bad). Arrow is the trend (is this generally improving or getting worse?).
Stare at it for a minute, and you start seeing some interesting things:
This is, of course, just an approximation of reality (“all models are wrong, but some are useful” – George Box). So it’s worth double checking things before taking action.
Is Squad 4 really in such great shape, or are they just optimistic and not seeing their own problems? Most squads think they are delivering good value to their customers – but how do they know? Is that based on wishful thinking or real customer feedback?
In this particular case, squad 4 was actually formed just a week before the health check and they were definitely in the forming phase, or “on honeymoon”. So both squad 2 and squad 4 needed a lot of support.
“Easy to release” was clearly a major issue, so this led to a bigger focus on things like continuous delivery, and we’ve seen some good progress there.
Note that this is a self-assessment model, all based on the honesty and subjective opinions of the people in the squads. So it only works in a high-trust environment, where people trust their managers and colleagues to act in their best interest. The data is easy to game, so the key is to make sure there is no incentive to do so.
Fortunately Spotify is a pretty high-trust environment and the managers and coaches are very careful to show that this is a support tool, not a judgement tool.
We’ve found that online surveys suck for this type of thing. Mainly because it cuts out the conversation, and that’s the biggest part of the value. The squad members gain insights while having the discussion, and the coach gains insights on how to effectively help the squads. The data alone gives you only a small part of the story, which could be misleading.
So we (usually agile coaches) organize workshops with the squads, facilitating a face-2-face conversation around the different health indicators. One or two hours is usually enough.
To facilitate this we have a physical deck of “Awesome Cards”, each card is one health indicator with an “Example of Awesome” and “Example of Crappy”.
(Download the cards as or – thx Martin Österberg for designing the card layout)
The deck typically has around 10 cards, here is an example of a complete deck:
For each question, the squad is asked to discuss if they are closer to “awesome” or closer to “crappy”, and we use basic workshop techniques (dot voting, etc) to help them reach consensus about which color to choose for that indicator, and what the trend is (stable, improving, or getting worse).
We like keeping it at three levels (green/yellow/red) to keep it simple. The exact definition of the colors will vary, but something like this:
Yes, this is subjective data. In theory the squad may choose to involve hard data (cycle time, defect count, velocity, etc), but few do so. Because, even with hard data, the squad needs to interpret the data and decide if it means we have a problem or not. So at the end of the day, everything is subjective anyway. If something feels like a problem, that in itself is a problem.
Sometimes we combine this with retrospectives, for example vote on one card and decide on actions to improve that area.
If you look at the various examples above, you’ll see that the actual questions vary a little. Guiding principles:
We make sure the questions are about the environment in which the squad operates, and not about hard output (such as velocity). That makes the survey less threatening, and reinforces the notion that the survey is about support and improvement, not judgement.
Our assumption (true or not) is that a squad intrinsically wants to succeed, and will perform as well as it can under given circumstances.
As mentioned, we have a number of different models in play so it varies a lot. We haven’t really converged on any specific “perfect time-interval” for these things (and probably never will).
Quarterly seems to be a good starting point though. Every month seems too often (people get fed up with it, and the data doesn’t change fast enough to warrant it). Bi-annually seems too seldom (too much happens within that period). But, again, it varies.
A model like this CAN help boost and focus your improvement efforts. But it can also totally screw up your culture if used inappropriately! So tread with care.
Here are some guidelines to improve your likelihood of success:
See also , very insightful.
OK, that’s it. Hope this article was useful! Have you also been experimenting with this kind of stuff? Please share your experiences (positive or negative) in the blog comments!
- Henrik Kniberg & Kristian Lindwall, Sep 2014.