October 27, 2014
The Bureaucratic Failure Mode Pattern
When we try to take purposeful action within an organization (or even in our lives more generally), we often find ourselves blocked or slowed by various bits of seemingly unrelated process that must first be satisfied before we are allowed to move forward. Some of these were put in place very deliberately, while others just grew more or less organically, but what they often have in common, aside from increasing the friction of activity, is that they seem disconnected from our ultimate purpose. If I want to drive my car to work, having to register my car with the DMV seems like a mechanically unnecessary step (regardless of what the real underlying reason for it may be).
Note that I’m not talking about the intrinsic difficulty or inconvenience of the process itself (car registration might entail waiting around for several hours in the DMV office or it might be 30 seconds online with a web page, for example), but the cost imposed by the mere existence of the need to report information or get permission or put things in some particular way just so or align or coordinate with some other thing (and the concomitant need to know that you are supposed to do whatever it is, and the need to know or find out how). Each of these is a friction factor; the competence or user-friendliness of whatever necessary procedure is involved may influence the magnitude of the inconvenience, but not the fact of it. (Other recursive friction factors embedded in the organizations or processes behind these things may well figure into why many of them are in fact incompetently executed or needlessly complex or time consuming, but that is a separate matter.)
Over time, organizations tend to acquire these bits of process, the way ships accumulate barnacles, with the accompanying increase in drag that makes forward progress increasingly difficult and expensive. However, barnacles are purely parasitic. They attach themselves to the hull for their own benefit, while the ship gains nothing of value in return. But even though organizational cynics enjoy characterizing these bits of process as also being purely parasitic, each of those bits of operational friction was usually put there for some purpose, presumably a purpose of some value. It may be that the cost-benefit analysis involved was flawed, but the intent was generally positive. (I’m ignoring here for a moment those things that were put in place for malicious reasons or to deliberately impede one person’s actions for the benefit of someone else. These kinds of counter-productive interventions do happen from time to time, and while they tend to loom large in people’s institutional mythologies, I believe such evil behavior is actually comparatively rare – perhaps not that uncommon in absolute terms, but still dwarfed by the truly vast number of ordinary, well-intentioned process elements that slow us down every day.)
Because I’m analyzing this from a premise of benign intent, I’m going to avoid characterizing these things with a loaded word like “barnacles”, even though they often have a similar effect. Instead, let’s refer to them as “checkpoints” – gates or control points or tests that you have to pass in order to move forward. They are annoying and progress-impeding but not necessarily valueless.
We are forced to pass through checkpoints all the time – having to swipe your badge past a reader to get into the office (or having to unlock the door to your own home, for that matter), entering a user name and password dozens of times per day to access various network services, getting approval from your boss to take a vacation day, having to fill out an expense report form (with receipts!) to get reimbursed for expenses you have incurred, all of the various layers of review and approval to push a software change into production, having to get approval from someone in the legal department before you can adopt a new piece of open source software; the list is potentially endless.
Note that while these vary wildly in terms of how much drag they introduce, for many of them the actual amount is very little, and this is a key point. The vast majority of these were motivated by some real world problem that called for some tiny addition to the process flow to prevent (or at least inhibit) whatever the problem was from happening again. No doubt some were the result of bad dealing or of an underemployed lawyer or administrator trying to preempt something purely hypothetical, but I think these latter kinds of checkpoint are the exception, and we weaken our campaign to reduce friction by paying too much attention to them – that is, by focusing too much on the unjustified bureaucracy, we distract attention from the far larger (and therefore far more problematic) volume of justified bureaucracy.
Let’s just presume, for the purpose of argument, that each of the checkpoints that we encounter is actually well motivated: that it exists for a reason, that the reason can be clearly articulated, that the reason is real, that it is more or less objective, that people, when presented with the argument for the checkpoint, will find it basically convincing. Let’s further presume that the friction imposed by the checkpoint is relatively modest – that the friction that results is not because the checkpoint is badly implemented but simply because it is there. And yes, I am trying, for purposes of argument, to cast things in a light that is as favorable to the checkpoints as possible. The reason I’m being so kind hearted towards them is because I think that, even given the most generous concessions to process, we still have a problem: the “death of a thousand cuts” phenomenon.
Checkpoints tend to accumulate over time. Organizations usually start out simple and only introduce new checkpoints as problems are encountered – most checkpoints are the product of actual experience. Checkpoints tend to accumulate with scale. As an organization grows, it finds itself doing each particular operation it does more often, which means that the frequency of actually encountering any particular low probability problem goes up. As an organization grows, it finds itself doing a greater variety of things, and this variety in turn implies greater variety of opportunities to encounter whole new species of problems. Both of these kinds of scale-driven problem sources motivate the introduction of additional checkpoints. What’s more, the greater variety of activities also means a greater number of permutations and combinations of activities that can be problematic when they interact with each other.
Checkpoints, once in place, tend to be sticky – they tend not to go away. Partly this is because if the checkpoint is successful at addressing its motivating problem, it’s hard to tell if the problem later ceases to exist – either way you don’t see it. In general, it is much easier for organizations to start doing things than it is for them to stop doing things.
The problem with checkpoints is their cumulative cost. In part, this is because the small cost of each makes them seductive. If the cost of checkpoint A is close to zero, it is not too painful, and there is little motivation or, really, little actual reason to do anything about it. Unfortunately, this same logic applies to checkpoint B, and to checkpoint C, and indeed to all of them. But the sum of a large number of values near zero is not necessarily itself a value near zero. It can, instead, be very large indeed. However, as we stipulated in our premises above, each one of them is individually justified and defensible. It is merely their aggregate that is indefensible – there is nothing to tell you, “here, this one, this is the problem” because there isn’t any one which is the problem. The problem is an emergent phenomenon.
Any specific checkpoint may be one that you encounter only rarely, or perhaps only once. Consider, for example, all the various procedures we make new hires go through. When you hit such a checkpoint, it may be tedious and annoying, but once you’ve passed it it’s done with. Thereafter you really have no incentive at all to do anything about it, because you’ll never encounter it again. But if we make a large number of people each go through it once, there’s still a large multiplier, and we’ve still burdened our organization with the cumulative cost.
A problem of particular note is that, because checkpoints tend to be specialized, they are often individually not well known. Plus, a larger total number of checkpoints increases the odds in general that you will encounter checkpoints that are unknown or mysterious to you, even if they are well known to others. Thus it becomes easy for somebody without the relevant specialized knowledge to get into trouble by violating a rule that they didn’t even know to exist.
Unknown or poorly understood checkpoints increase friction disproportionately. They trigger various kinds of remedial responses from the organization, in the form of compliance monitoring, mandatory training sessions, emailed warning messages and other notices that everyone has to read, and so on. Each such checkpoint thus generates a whole new set of additional checkpoints, meaning that the cumulative frictions multiply instead of just adding.
Violation of a checkpoint may visit sanctions or punishment on the transgressor, even if the transgression was inadvertent. The threat of this makes the environment more hostile. It trains people to be become more timid and risk averse. It encourages them to limit their actions to those areas where they are confident they know all the rules, lest they step on some unfamiliar procedural landmine, thus making the organization more insular and inflexible. It gives people incentives to spend their time and effort on defensive measures at the expense of forward progress.
When I worked at Electric Communities, we had (as most companies do) a bulletin board in our break room where we displayed all the various mandatory notices required by a profusion of different government agencies, including arms of the federal government, three states (though we were a California company, we had employees who commuted from Arizona and Oregon and so we were subject to some of those states’ rules too), a couple of different regional agencies, and the City of Cupertino. I called it The Wall Of Bureaucracy. At one point I counted 34 different such notices (and employees, of course, were expected to read them, hence the requirement that they be posted in a prominent, common location, though of course I suspect few people actually bothered). If you are required to post one notice, it’s pretty easy to know that you are in compliance: either you posted it or you didn’t. But if you are required to post 34 different notices, it’s nearly impossible to know that the number shouldn’t be 35 or 36 or that some of the ones you have are out of date or otherwise mistaken. Until, of course, some government inspector from some agency you never heard from before happens to wander in and issue you a citation and a fine (and often accuse you of being a bad person while they’re at it). As Alan Perlis once said, talking about programming, “If you have a procedure with ten parameters, you probably missed some.”
In the extreme case, the cumulative costs of all the checkpoints within an organization can exceed the working resources the organization has available, and forward progress becomes impossible. When this happens, the organization generally dies. From an external perspective – from the outside, or even from one part of the organization looking at another – this appears insane and self-destructive, but from the local perspective governing any particular piece of it, it all makes sense and so nothing is done to fix it until the inexorable laws of arithmetic put a stop to the whole thing. A famous example of this was Atari, where by 1984 the combined scleroses effecting the product development process became so extreme that no significant new products were able to make it out the door because the decision making and approval process managed to kill them all before they could ship, even though a vast quantity of time and money and effort was spent on developing products, many of them with great potential. Few organizations manage to achieve this kind of epic self-absorption, though some do seem to approach it as an asymptote (e.g., General Motors). In practice, however, what seems to keep the problem under control, here in Silicon Valley anyway, is that the organization reaches a level of dysfunction where it is no longer able to compete effectively and it is supplanted in the marketplace by nimbler and generally younger rivals whose sclerosis is not as advanced.
The challenge, of course, is how to deal with this problem. The most common pathway, as alluded to above, is for a newer organization to supplant the older one. This works, not because the one organization is intrinsically more immune to the phenomenon than the other but simply due to the fact that because it is younger and smaller it has not yet developed as many internal checkpoints. From the perspective of society, this is a fine way of handling things; this is Schumpeter’s “creative destruction” at work. It is less fine from the perspective of the people whose money or lives are invested in the organization being creatively destroyed.
Another path out of the dilemma is strong leadership that is prepared to ride roughshod over the sound justifications supporting all these checkpoints and simply do away with them by fiat. Leaders like this will disregard the relevant constituencies and just cut, even if crudely. Such leaders also tend to be authoritarian, megalomaniacal, visionary, insensitive, and arguably insane – and, disturbingly often, right – i.e., they are Steve Jobs. They also tend to be a bit rough on their subordinates. This kind of willingness to disrespect procedure can also sometimes be engendered by dire necessity, enabling even the most hidebound bureaucracies to manifest surprising bursts of speed and effectiveness. A well known and much studied example of this phenomenon is the military, ordinarily among the stuffiest and most procedure bound of institutions, which can become radically more effective in times of actual war. In the first three weeks of American involvement in World War II, when we weren’t yet really doing anything serious, Army Chief of Staff George Marshall merely started carefully asking people questions and half the generals in the US Army found themselves retired or otherwise displaced.
A more user-friendly way to approach the problem is to foster an institutional culture that sees the avoidance of checkpoints as a value unto itself. This is very hard to do, and I am hard pressed to think of any examples of organizations that have managed to do this consistently over the long term. Even in the short term, examples are few, and tend to be smaller organizations embedded within much larger, more traditional ones. Examples might include Bell Labs during AT&T’s pre-breakup years, Xerox PARC during its heyday, the Lucasfilm Computer Division during the early 1980s, or the early years of the Apollo program. Each of these examples, by the way, benefited from a generous surplus of externally provided resources, which allowed them to trade a substantial amount of resource inefficiency for effective productivity. Surplus resources, however, tend also to engender actual parasitism, which ultimately ends the golden age, as all these examples attest.
The foregoing was expressed in terms of people and organizations, but essentially the same analysis applies almost without modification to software systems. Each of the myriad little inefficiencies, rough edges, performance draining extra steps, needless added layers of indirection, and bits of accumulated cruft that plague mature software is like an organizational checkpoint.