Things have failure modes. A failure mode is when a thing has a typical way that it breaks down. Certain parts of things are often not as well engineered as other parts and, planned obsolescence aside; they then typically break first producing a common failure mode. For example, old Sony Trinitron TVs were known for their excellent picture. That meant that the Trinitron picture tube was over engineered and pretty much never broke. Consequently, the old Sony’s would blow their sound card instead and you ended up with a $100 repair bill anyway. Thus, the failure mode of old Sony Trinitrons was to blow their sound card. This does not mean that every Sony TV blew its sound card. It just meant that if your TV was going to break down, if it was a Sony it was way more likely that the sound card would go first. And that is the essence of failure modes.
Interestingly, it is actually possible to make a design that takes the inevitability of an eventual breakdown into account and engineer in a preferred failure mode. The well known example is indeed planned obsolescence. But, lesser known is the possibility of engineering for a graceful degradation of the system. Graceful degradation is when a complex system tends to fail in such a way that its breakdown is gentle and not catastrophic. Hence, with graceful degradation the breakdown of the system may actually be managed, and more importantly, usually survived. This is because graceful degradation is most important in complex systems where people’s lives are on the line. No one dies if their television suddenly completely stops working. But, if both engines on an airplane fail at the same time and if the plane has not been engineered to make a pretty good glider as well as a great powered aircraft, well then you have a problem and people die. A plane falling out of the sky like a rock is decidedly not an example of graceful degradation. An airplane engineered to make a pretty good un-powered glider even though that means it is slightly less efficient in powered flight is an example of graceful degradation. Aerospace engineers know this and care about it, CEOs of airline companies do not.
Regardless, I have been thinking about graceful degradation ever since my second year at university. What prompted me to become aware of this issue as an important thing was something I experienced in each of the first two years of trying to register for fall classes. In two consecutive years of trying to register for classes I experienced a delay in my ability to get registered due to technical problems with the university computerized registration system. The first year we had to go to the university to register in person by lining up in the gymnasium where they had set up a half dozen or so registration booths with a computer equipped registrar. The procedure was that you waited in line until your turn with a real human being to whom you handed your registration paperwork and then made your payment to. As part of processing each student’s registration, the registrar would enter your course choices into the computer while you waited and provide you with your timetable right then and there. That day was memorable because it was a sweltering hot summer day and the heat in the gymnasium was stifling. Perhaps it was the heat that caused it, but, not long after I joined the long line up all the computers at the temporarily erected workstations went down. Nevertheless, an announcement was quickly made to the frustrated students in line who were rapidly becoming unruly, that registrations could still proceed without the computers. Because the payment terminals still worked and the registrations could be done in person with the paperwork, the registrars could still register us for our classes and then enter the student’s class choices later when the system came back up. What made this experience memorable at the time was my realization that even though registrations still continued, the line, which was quite slow before, now slowed to a snails pace – all in near unbearable heat and close air. The thought of abandoning the process and returning later was precluded by the system of registration where you are told to come on a certain day and time, and not before, to register based on your name or student number or some such arbitrary grouping system. If you were worried about getting into any popular classes with limited enrolment, which we all apparently were, then it was unthinkable that we should abandon the line then and try later. So, for over an hour I stood in the gym inching imperceptibly forward toward my goal and thinking this is my big welcome to university life – hurry up and get in line – a memorable day indeed.
But, it was the experience that I had the next year when trying to register for fall classes again that really made me think. In the intervening year, the university had implemented a computerized telephone-based touch-tone registration system as mandatory for everyone. Now, this was interesting from the start for me because even though it was still the late 80s I actually already owned my own telephone. I had purchased the phone of my dreams on sale that year and was extremely happy to be the proud owner of a bright red touch-tone Contempra telephone. I loved the sleek design with the angled handset and the touch-tone buttons built into it. And, as an aside, I’ll mention that I actually had the privilege of meeting the designers of that cool phone years later when I did my ethnographic fieldwork at Nortel’s Corporate Design Group. The very stuff of phone-geek dreams, it was great. But, alas, there was a problem with my new phone – well known to today’s cell-phone toting legions – how ever was I to simultaneously listen to the computer voice generated instructions at the same time as, monkey like, punching in my choices using the coded alpha-numeric telephone keypad? I was quite likely one of the first people in the world to experience this particular dilemma.
So, the task of registering for school forced me up out of my basement lair to the upstairs dining room table where a normal touch-tone phone could be pressed into service for the task of getting registered. I still remember getting all my paperwork and the phone set up just right on the allotted day and well before the now exact time specified for when I would first be able to register for my class choices. Phew. Breathlessly, as the big moment arrived I picked up the handset and punched in the number for the computerized registration system. Well, it was not sweltering hot there in the dining room but, I did have to endure delay after delay as the clearly overloaded system was swamped with calls from legions of students also concerned about getting into the classes they wanted. However, things were at least progressing along toward my goal slowly until at some point the entire system went down – just like it had the year before.
Well, multiple calls later, it was obvious even to me that no one was getting through to the new computerized phone registration system for a while. And, that recognition sparked an epiphany. I suddenly realized in a flash that the failure mode of digital devices like computers was essentially non-graceful and catastrophic by their very nature. In my first year when there were still analog humans and the medium of paper forms still in the registration system loop, when the computer part of the complex system crashed the whole system did not cease working. The line-ups got a lot longer and moved a bit slower, but eventually, we all did get registered. But, the next year, sitting in the comfort of my home at the dining room table trying to register by the computerized telephone registration system, when the system crashed no one got registered at all. The entire process simply stopped for the duration of the system crash. On – off, that’s it. No slowing down. No slightly longer lines. Everything ground instantly to a halt. And, that realization has stayed with me ever since. Digital devices don’t do graceful degradation. But digital does catastrophic failure really well.
Sure, it is possible to engineer in what looks like graceful degradation into digital systems using multiple redundancies. But, that makes them more expensive and no one wants to pay for that. And, there is also another little factor that then pops into my mind. It is the rule that as the mean time between failures increases, the degree of the resulting failure also tends to increase. When things break down all the time, it is often easily repairable or non-critical things that break and the repair is fast or easy, or you can MacGyver a temporary fix and still limp home. But, when things become super reliable, last a long time, and don’t break very often, then when they do break it is often a major breakdown that stops everything and takes a long time or a lot of money to fix. This is one reason why we don’t even try to repair many items today and simply replace the whole unit with a new one. DVD player stops working? Buy a new one. Coffee maker won’t make coffee? Buy a new one.
This response to the increased reliability of things leading to catastrophic failures has an interesting consequence when it comes to digital devices though. When they were unreliable and broke down a lot, we all had good habits and systems in place to minimize the damage and data loss. Now, when they are super fast, reliable, and do not break down often at all we have gotten slack and idle about maintaining those good habits. I used to manually save text in Word Perfect 5.1 every five minutes or so if I was working on a school paper to a deadline I could not miss. And, I’d save it to a floppy – remember those – every hour or so. But now, well…. I still do manual saves, but certainly no where as often. Although, I have to admit I am typing this blog entry in MS Word so I can cut and paste it later into the cheesy online editor for the blog program. An old lesson painfully relearned when I lost half of my first post when I tried to place a picture into it.
So, who actually does a backup of a half a terabyte of data once a week? Who copies important folders to a USB key every few days? To two keys? Yeah right, who has time for that? Offsite backups? “What the hell are those,” people now say to me. And, that is why graceful degradation is important now. So many of civilization’s mission critical complex systems – and there are actually a lot of them – now rely on computers, that the possibility of a serious cascading system failure at some point in the future is a near certainty. More worrisome still are the telltale signs of smaller-scale failures in these systems cropping up with increasing frequency and increasing degrees of severity. Items used to be rarely out of stock at the grocery store. Municipal water systems used to work without fail to deliver clean tap water. Electric grids used to not lose power to a whole coast just because one component in a minor substation failed. And, nuclear reactors did not have “events” on a semi-regular basis.
Analog systems, instead, tend toward graceful degradation. An analog phone used to get static and crackly but you could still talk through it easily and the natural language repair reflexes we all have still worked without thought to deal with that kind of noise signal. Now, with digital radio-telephones – sorry – cell phones, when they crap out its like trying to talk to your girlfriend from the artic circle on a satellite phone – agonizing. TV used to get a bit of snow when the solar flares acted up. But, you could still watch the action pretty easily. Now, the picture freezes, pixilates, and then jumps forward when that kind of interference disturbs the free flow of digital packets. And, I don’t know about you, but it bugs me each and every time because it breaks the frame. Maybe I’m just being a crank, but I’d like to see a brainwave study comparing the response to analog static vs digital pixilation noise artefacts. Oh right, those studies never get done.
Finally, there is one more realization I made long ago that is relevant to this discussion. It was the realization that computer-based digital systems, compared to their analog counterparts, are essentially antithetical to capitalism. And, the more painful realization I made almost immediately as a graduate student trying to warn people about this was that no one wants to hear about it. I’ve lived with that knowledge, pretty much in silence for, lets say, about 25 years, but I no longer care to censor my self based on what other people might think. Now that we are actually well into the process of collapse of our computerized, financialized, globalized, world I will remain silent no more. If you think ones and zeros stored in a computer someplace are actually money then I can’t help you.
Why is all this important? Because I am concerned about engineering – social engineering – a failure mode for the project of civilization that exhibits graceful degradation rather than catastrophic collapse. And, my answer to that problem is, of course, The Cloister Initiative.