Rachel Kroll

More than five whys and "layer eight" problems

I saw a post about a year ago talking about the "five whys" technique of trying to figure out what caused something to fail. It was using a car scenario for an example, and it went something like this:

The car didn't start... because the battery is dead... because the alternator wasn't charging it... because the alternator belt broke... because the belt was beyond its useful life but wasn't replaced... because it wasn't maintained according to recommended schedule.

That's about five levels, and it pretty much stopped there. I figure, well, you can go beyond that, and in the case of the infra stuff at a big enough company, you probably need to if you intend to actually try to fix something.

So, that's been my life: trying to roll back through the series of actions (or lack of actions) to see how things happened, and then trying to do something about it. The problem is that if you do this long enough, eventually the problems start leaving the tech realm and enter the squishy human realm.

Perhaps you've heard of the OSI model of networking, where you have seven layers as a way to talk about what's going on in the "stack". I've seen some brilliantly snarky T-shirts that talk about "layer eight" and sometimes beyond as things like "corporate politics" and "management" and all of that good stuff.

It turns out that when you start doing this root-cause analysis and really keep after it, the "squishy human realm" is actually the no-longer-hypothetical "layer eight" from those T-shirts.

In our "car" example, you might discover that management is forcing people to ignore the maintenance schedule while saying things like "it'll work, trust me". Or, they're doing even worse things, like ignoring safety codes that have been written in blood.

For those of us in tech, we tend to get off much more lightly than people who do Actual Stuff in the Real World (like cars). Chasing down our problems means you start getting into things like "empire-building manager is hiring anyone with a pulse in order to look more important by having more direct reports". Maybe you chase that one down and you get to "manager of manager is also into this whole thing, and benefits from the equation".

That might lead into "the entire company is obsessed with hiring even though the tech equivalent of the Drake equation says there is no way they can find anywhere near that many qualified people in the entire world".

What that does that look like? Well, some people have no business working on certain kinds of systems, whether as a transient situation, or a permanent one. Transient situations are a lack of training. Permanent ones might come from attitudes or a genuine lack of ability for whatever reason. Having the wrong person on the job is supposed to be noticed and handled by the manager. If they don't, that's a failure.

Now, the team's manager (M1) also has a manager (M2) of some kind, and M2 is supposed to be making sure M1 can actually, well, manage! If they can't tell if that's happening or not, that too is a failure.

In some situations, you come to realize that a whole bunch of bad things happen due to non-technical causes, and they are some of the hardest things that you might ever need to remove from an organization. Unlike the line workers, management is in a whole different world in which the "reality distortion field" matters most. You either generate a big enough one yourself, or you slot into someone else's. If you are opposed to it, you are rejected.

I guess this is my way of warning anyone who fancies themselves a troubleshooter and who really, truly, wants to get to the bottom of things. If you do this long enough, expect to start discovering truly unsatisfying situations that cannot be resolved.

Also, I will remind anyone who wants to try to tilt at such a windmill that if you are given responsibility without the power to make any changes, then you have just become the scapegoat. I said this in a post way back in February 2013, and I *still* fell into that damn trap in 2017 within a particularly broken organization.

Finally, in this same vein, I wanted to share something that a reader sent to me a while back, and that I found to be brilliant and amazing (I still do, but I did then, too): People can read their manager's mind.

In particular, pay attention to where it says corollary 1 and starts talking about the "insane employee". The whole "personal offense" thing? Yeah, if you have the ability to not become that person, try to avoid it. Alternatively, if you're cursed with the tendency to fall into those things, try not to give yourself a hard time when someone terrible takes advantage of you for the nth time.

Hang in there.