Posts from October, 2014

October 27, 2014

The Bureaucratic Failure Mode Pattern

When we try to take purposeful action within an organization (or even in our lives more generally), we often find ourselves blocked or slowed by various bits of seemingly unrelated process that must first be satisfied before we are allowed to move forward. Some of these were put in place very deliberately, while others just grew more or less organically, but what they often have in common, aside from increasing the friction of activity, is that they seem disconnected from our ultimate purpose. If I want to drive my car to work, having to register my car with the DMV seems like a mechanically unnecessary step (regardless of what the real underlying reason for it may be).

Note that I’m not talking about the intrinsic difficulty or inconvenience of the process itself (car registration might entail waiting around for several hours in the DMV office or it might be 30 seconds online with a web page, for example), but the cost imposed by the mere existence of the need to report information or get permission or put things in some particular way just so or align or coordinate with some other thing (and the concomitant need to know that you are supposed to do whatever it is, and the need to know or find out how). Each of these is a friction factor; the competence or user-friendliness of whatever necessary procedure is involved may influence the magnitude of the inconvenience, but not the fact of it. (Other recursive friction factors embedded in the organizations or processes behind these things may well figure into why many of them are in fact incompetently executed or needlessly complex or time consuming, but that is a separate matter.)

Over time, organizations tend to acquire these bits of process, the way ships accumulate barnacles, with the accompanying increase in drag that makes forward progress increasingly difficult and expensive. However, barnacles are purely parasitic. They attach themselves to the hull for their own benefit, while the ship gains nothing of value in return. But even though organizational cynics enjoy characterizing these bits of process as also being purely parasitic, each of those bits of operational friction was usually put there for some purpose, presumably a purpose of some value. It may be that the cost-benefit analysis involved was flawed, but the intent was generally positive. (I’m ignoring here for a moment those things that were put in place for malicious reasons or to deliberately impede one person’s actions for the benefit of someone else. These kinds of counter-productive interventions do happen from time to time, and while they tend to loom large in people’s institutional mythologies, I believe such evil behavior is actually comparatively rare – perhaps not that uncommon in absolute terms, but still dwarfed by the truly vast number of ordinary, well-intentioned process elements that slow us down every day.)

Because I’m analyzing this from a premise of benign intent, I’m going to avoid characterizing these things with a loaded word like “barnacles”, even though they often have a similar effect. Instead, let’s refer to them as “checkpoints” – gates or control points or tests that you have to pass in order to move forward. They are annoying and progress-impeding but not necessarily valueless.

We are forced to pass through checkpoints all the time – having to swipe your badge past a reader to get into the office (or having to unlock the door to your own home, for that matter), entering a user name and password dozens of times per day to access various network services, getting approval from your boss to take a vacation day, having to fill out an expense report form (with receipts!) to get reimbursed for expenses you have incurred, all of the various layers of review and approval to push a software change into production, having to get approval from someone in the legal department before you can adopt a new piece of open source software; the list is potentially endless.

Note that while these vary wildly in terms of how much drag they introduce, for many of them the actual amount is very little, and this is a key point. The vast majority of these were motivated by some real world problem that called for some tiny addition to the process flow to prevent (or at least inhibit) whatever the problem was from happening again. No doubt some were the result of bad dealing or of an underemployed lawyer or administrator trying to preempt something purely hypothetical, but I think these latter kinds of checkpoint are the exception, and we weaken our campaign to reduce friction by paying too much attention to them – that is, by focusing too much on the unjustified bureaucracy, we distract attention from the far larger (and therefore far more problematic) volume of justified bureaucracy.

Let’s just presume, for the purpose of argument, that each of the checkpoints that we encounter is actually well motivated: that it exists for a reason, that the reason can be clearly articulated, that the reason is real, that it is more or less objective, that people, when presented with the argument for the checkpoint, will find it basically convincing. Let’s further presume that the friction imposed by the checkpoint is relatively modest – that the friction that results is not because the checkpoint is badly implemented but simply because it is there. And yes, I am trying, for purposes of argument, to cast things in a light that is as favorable to the checkpoints as possible. The reason I’m being so kind hearted towards them is because I think that, even given the most generous concessions to process, we still have a problem: the “death of a thousand cuts” phenomenon.

Checkpoints tend to accumulate over time. Organizations usually start out simple and only introduce new checkpoints as problems are encountered – most checkpoints are the product of actual experience. Checkpoints tend to accumulate with scale. As an organization grows, it finds itself doing each particular operation it does more often, which means that the frequency of actually encountering any particular low probability problem goes up. As an organization grows, it finds itself doing a greater variety of things, and this variety in turn implies greater variety of opportunities to encounter whole new species of problems. Both of these kinds of scale-driven problem sources motivate the introduction of additional checkpoints. What’s more, the greater variety of activities also means a greater number of permutations and combinations of activities that can be problematic when they interact with each other.

Checkpoints, once in place, tend to be sticky – they tend not to go away. Partly this is because if the checkpoint is successful at addressing its motivating problem, it’s hard to tell if the problem later ceases to exist – either way you don’t see it. In general, it is much easier for organizations to start doing things than it is for them to stop doing things.

The problem with checkpoints is their cumulative cost. In part, this is because the small cost of each makes them seductive. If the cost of checkpoint A is close to zero, it is not too painful, and there is little motivation or, really, little actual reason to do anything about it. Unfortunately, this same logic applies to checkpoint B, and to checkpoint C, and indeed to all of them. But the sum of a large number of values near zero is not necessarily itself a value near zero. It can, instead, be very large indeed. However, as we stipulated in our premises above, each one of them is individually justified and defensible. It is merely their aggregate that is indefensible – there is nothing to tell you, “here, this one, this is the problem” because there isn’t any one which is the problem. The problem is an emergent phenomenon.

Any specific checkpoint may be one that you encounter only rarely, or perhaps only once. Consider, for example, all the various procedures we make new hires go through. When you hit such a checkpoint, it may be tedious and annoying, but once you’ve passed it it’s done with. Thereafter you really have no incentive at all to do anything about it, because you’ll never encounter it again. But if we make a large number of people each go through it once, there’s still a large multiplier, and we’ve still burdened our organization with the cumulative cost.

A problem of particular note is that, because checkpoints tend to be specialized, they are often individually not well known. Plus, a larger total number of checkpoints increases the odds in general that you will encounter checkpoints that are unknown or mysterious to you, even if they are well known to others. Thus it becomes easy for somebody without the relevant specialized knowledge to get into trouble by violating a rule that they didn’t even know to exist.

Unknown or poorly understood checkpoints increase friction disproportionately. They trigger various kinds of remedial responses from the organization, in the form of compliance monitoring, mandatory training sessions, emailed warning messages and other notices that everyone has to read, and so on. Each such checkpoint thus generates a whole new set of additional checkpoints, meaning that the cumulative frictions multiply instead of just adding.

Violation of a checkpoint may visit sanctions or punishment on the transgressor, even if the transgression was inadvertent. The threat of this makes the environment more hostile. It trains people to be become more timid and risk averse. It encourages them to limit their actions to those areas where they are confident they know all the rules, lest they step on some unfamiliar procedural landmine, thus making the organization more insular and inflexible. It gives people incentives to spend their time and effort on defensive measures at the expense of forward progress.

When I worked at Electric Communities, we had (as most companies do) a bulletin board in our break room where we displayed all the various mandatory notices required by a profusion of different government agencies, including arms of the federal government, three states (though we were a California company, we had employees who commuted from Arizona and Oregon and so we were subject to some of those states’ rules too), a couple of different regional agencies, and the City of Cupertino. I called it The Wall Of Bureaucracy. At one point I counted 34 different such notices (and employees, of course, were expected to read them, hence the requirement that they be posted in a prominent, common location, though of course I suspect few people actually bothered). If you are required to post one notice, it’s pretty easy to know that you are in compliance: either you posted it or you didn’t. But if you are required to post 34 different notices, it’s nearly impossible to know that the number shouldn’t be 35 or 36 or that some of the ones you have are out of date or otherwise mistaken. Until, of course, some government inspector from some agency you never heard from before happens to wander in and issue you a citation and a fine (and often accuse you of being a bad person while they’re at it). As Alan Perlis once said, talking about programming, “If you have a procedure with ten parameters, you probably missed some.”

In the extreme case, the cumulative costs of all the checkpoints within an organization can exceed the working resources the organization has available, and forward progress becomes impossible. When this happens, the organization generally dies. From an external perspective – from the outside, or even from one part of the organization looking at another – this appears insane and self-destructive, but from the local perspective governing any particular piece of it, it all makes sense and so nothing is done to fix it until the inexorable laws of arithmetic put a stop to the whole thing. A famous example of this was Atari, where by 1984 the combined scleroses effecting the product development process became so extreme that no significant new products were able to make it out the door because the decision making and approval process managed to kill them all before they could ship, even though a vast quantity of time and money and effort was spent on developing products, many of them with great potential. Few organizations manage to achieve this kind of epic self-absorption, though some do seem to approach it as an asymptote (e.g., General Motors). In practice, however, what seems to keep the problem under control, here in Silicon Valley anyway, is that the organization reaches a level of dysfunction where it is no longer able to compete effectively and it is supplanted in the marketplace by nimbler and generally younger rivals whose sclerosis is not as advanced.

The challenge, of course, is how to deal with this problem. The most common pathway, as alluded to above, is for a newer organization to supplant the older one. This works, not because the one organization is intrinsically more immune to the phenomenon than the other but simply due to the fact that because it is younger and smaller it has not yet developed as many internal checkpoints. From the perspective of society, this is a fine way of handling things; this is Schumpeter’s “creative destruction” at work. It is less fine from the perspective of the people whose money or lives are invested in the organization being creatively destroyed.

Another path out of the dilemma is strong leadership that is prepared to ride roughshod over the sound justifications supporting all these checkpoints and simply do away with them by fiat. Leaders like this will disregard the relevant constituencies and just cut, even if crudely. Such leaders also tend to be authoritarian, megalomaniacal, visionary, insensitive, and arguably insane – and, disturbingly often, right – i.e., they are Steve Jobs. They also tend to be a bit rough on their subordinates. This kind of willingness to disrespect procedure can also sometimes be engendered by dire necessity, enabling even the most hidebound bureaucracies to manifest surprising bursts of speed and effectiveness. A well known and much studied example of this phenomenon is the military, ordinarily among the stuffiest and most procedure bound of institutions, which can become radically more effective in times of actual war. In the first three weeks of American involvement in World War II, when we weren’t yet really doing anything serious, Army Chief of Staff George Marshall merely started carefully asking people questions and half the generals in the US Army found themselves retired or otherwise displaced.

A more user-friendly way to approach the problem is to foster an institutional culture that sees the avoidance of checkpoints as a value unto itself. This is very hard to do, and I am hard pressed to think of any examples of organizations that have managed to do this consistently over the long term. Even in the short term, examples are few, and tend to be smaller organizations embedded within much larger, more traditional ones. Examples might include Bell Labs during AT&T’s pre-breakup years, Xerox PARC during its heyday, the Lucasfilm Computer Division during the early 1980s, or the early years of the Apollo program. Each of these examples, by the way, benefited from a generous surplus of externally provided resources, which allowed them to trade a substantial amount of resource inefficiency for effective productivity. Surplus resources, however, tend also to engender actual parasitism, which ultimately ends the golden age, as all these examples attest.

The foregoing was expressed in terms of people and organizations, but essentially the same analysis applies almost without modification to software systems. Each of the myriad little inefficiencies, rough edges, performance draining extra steps, needless added layers of indirection, and bits of accumulated cruft that plague mature software is like an organizational checkpoint.

October 19, 2014

Map of The Habitat World

By now a lot of you may have heard about the initiative at Oakland’s Museum of Digital Arts & Entertainment to resurrect Habitat on the web using C64 emulators and vintage server hardware. If not, you can read more about it here (there’s also been a modest bit of coverage in the game press, for example at Wired, Joystiq, and Gamasutra).

Part of this effort has had me digging through my archives, looking at old source files to answer questions that people had and to refresh my own memory of how things worked. It’s been pretty nostalgic, actually. One of the cooler things I stumbled across was the Habitat world map, something which relatively few people have ever seen because when Habitat was finally released to the public it got rebranded (as “Club Caribe”) with an entirely different set of publicity materials. I had a big printout of this decorating my office at Skywalker Ranch and later at American Information Exchange, but not very many people will have been in either of those places. Now, however, thanks to the web, I can share it publicly for the first time.

We wanted to have a map because we thought we would need a plan for enlarging the world as the user population grew. The idea was to have a framework into which we could plug new population centers and new places for stories and adventures.

The specific map we ended up with came about because I was playing around writing code to generate plausible topographic surfaces using fractal techniques (and, of course, lots and lots and LOTS of random numbers). The little program I wrote to do this was quite a CPU hog, but I could run it on a bunch of different computers in parallel and combine the results (sort of like modern MapReduce techniques, only by hand!). One night I grabbed every Unix machine on the Lucasfilm network that I could lay my hands on (two or three Vax minicomputers and six or eight Sun workstations) and let the thing cook for an epic all-nighter of virtual die rolling. In the morning I was left with this awesome height field, in the form of a file containing a big matrix of altitude numbers. Then, of course, the question was what to do with it, and in particular, how to look at it. Remember that in those days, computers didn’t have much in the way of image display capability; everything was either low resolution or low color fidelity or both (the Pixar graphics guys had some high end display hardware, but I didn’t have access to it and anyway I’d have to write more code to do something with the file I had, which wasn’t in any kind of standard image format). Then I realized that we had these new Apple LaserWriter printers. Although they were 1-bit per pixel monochrome devices, they printed at 300 DPI, which meant you could get away with dithering for grayscale. And you fed stuff to them using PostScript, a newfangled Forth-like programming language. So I ordered Adobe’s book on PostScript and went to work.

I wrote a little C program that took my big height field and reduced it to a 500×100 image at 4 bits per pixel, and converted this to a file full of ASCII hexadecimal values. I then wrapped this in a little bit of custom PostScript that would interpret the hex dump as an image and print it, and voilá, out of the printer comes a lovely grayscale topographic map. Another little quick filter and I clipped all the topography below a reasonable altitude to establish “sea level”, and I had some pretty sweet looking landscape. At this point, you could make out a bunch of obvious geographic features, so we picked locations for cities, and drew some lines for roads between them, and suddenly it was a world. A little bit more PostScript hacking and I was able to actually draw nicely rendered roads and city labels directly on the map. Then I blew it up to a much larger size and printed it over several pages which I trimmed and taped together to yield a six and a half foot wide map suitable for posting on the wall.

As I was going through my archives in conjunction with the project to reboot Habitat, I encountered the original PostScript source for the map. I ran it through GhostScript and rendered it into a 22,800×4,560 pixel TIFF image which I could open in Photoshop and wallow around in. This immediately tempted me to do a bit more embellishment with Photoshop, so a little bit more hacking on the PostScript and I could split the various components of the image (the topographic relief, the roads, the city labels, etc.) into separate images which could then be individually manipulated as layers. I colorized the topography, put it through a Gaussian blur to reduce the chunkiness, and did a few other little bits of cosmetic tweaking, and the result is the image you see here (clicking on the picture will take you to a much larger version):

Habitat map

(Also, if you care to fiddle with this in other formats, the PostScript for the raw map can be gotten here. Beware that depending on what kind of configuration your browser has, your browser may just attempt to render the PostScript, which might not have exactly the results you want or expect. Have fun.)

There a number of interesting details here worth mentioning. Note that the Habitat world is cylindrical. This lets us encompass several different interesting storytelling possibilities: Going around the cylinder lets you circumnavigate the world; obviously, the first avatar to do this would be famous. The top edge is bounded by a wall, the bottom edge by a cliff. This means that you can fall of the edge of the world, or explore the wall for mysterious openings. By the way, the top edge is West. Habitat compasses point towards the West Pole, which was endlessly confusing for nearly everyone.

We had all kinds of plans for what to do with this, which obviously we never had a chance to follow through on. One of my favorites was the notion that if you walked along the top (west) wall enough, eventually you’d find a door, and if you went through this door you’d find yourself in a control room of some kind, with all kinds of control panels and switches and whatnot. What these switches would do would not be obvious, but in fact they’d control things like the lights and the day/night cycle in different parts of the world, the color palette in various places, the prices of things, etc. Also, each of the cities had a little backstory that explained its name and what kinds of things you might expect to find there. If I run across that document I’ll post it here too.