Wednesday, November 25, 2009

And Today's Word Is...

kludge or kluge (klo̵̅o̅j)
  1. a piece of computer hardware or software, or a computer system, that is clumsily designed or improvised from mismatched parts
  2. any poorly designed device or system
I've been watching the Climategate saga with a certain amount of amusement. I have to confess to a certain amount of schadenfreude on this whole thing. It's not that I don't think that anthropogenic global warming isn't real, and it's not even that I don't think that it's a cause for a certain amount of concern. But it always warms the cockles of my heart to see any group whose fundamental identity is based on being a bunch of self-righteous, holier-than-thou--worse, smarter-than-thou--scolds get their comeuppance.

But there is a real lesson in this for both scientists and non-scientists alike, and today I'd like to concentrate on the psychology of computer simulation.

From CBS's Declan McCullagh, who seems to have a much stronger tolerance for wading through the HARRY_READ_ME.txt file than I do, we learn:
In addition to e-mail messages, the roughly 3,600 leaked documents posted on sites including and include computer code and a description of how an unfortunate programmer named "Harry" -- possibly the CRU's Ian "Harry" Harris -- was tasked with resuscitating and updating a key temperature database that proved to be problematic. Some excerpts from what appear to be his notes, emphasis added:
I am seriously worried that our flagship gridded data product is produced by Delaunay triangulation - apparently linear as well. As far as I can see, this renders the station counts totally meaningless. It also means that we cannot say exactly how the gridded data is arrived at from a statistical perspective - since we're using an off-the-shelf product that isn't documented sufficiently to say that. Why this wasn't coded up in Fortran I don't know - time pressures perhaps? Was too much effort expended on homogenisation, that there wasn't enough time to write a gridding procedure? Of course, it's too late for me to fix it too. Meh.

I am very sorry to report that the rest of the databases seem to be in nearly as poor a state as Australia was. There are hundreds if not thousands of pairs of dummy stations, one with no WMO and one with, usually overlapping and with the same station name and very similar coordinates. I know it could be old and new stations, but why such large overlaps if that's the case? Aarrggghhh! There truly is no end in sight... So, we can have a proper result, but only by including a load of garbage!

One thing that's unsettling is that many of the assigned WMo codes for Canadian stations do not return any hits with a web search. Usually the country's met office, or at least the Weather Underground, show up – but for these stations, nothing at all. Makes me wonder if these are long-discontinued, or were even invented somewhere other than Canada!

Knowing how long it takes to debug this suite - the experiment endeth here. The option (like all the anomdtb options) is totally undocumented so we'll never know what we lost. 22. Right, time to stop pussyfooting around the niceties of Tim's labyrinthine software suites - let's have a go at producing CRU TS 3.0! since failing to do that will be the definitive failure of the entire project.

Ulp! I am seriously close to giving up, again. The history of this is so complex that I can't get far enough into it before by head hurts and I have to stop. Each parameter has a tortuous history of manual and semi-automated interventions that I simply cannot just go back to early versions and run the update prog. I could be throwing away all kinds of corrections - to lat/lons, to WMOs (yes!), and more. So what the hell can I do about all these duplicate stations?...

As the leaked messages, and especially the HARRY_READ_ME.txt file, found their way around technical circles, two things happened: first, programmers unaffiliated with East Anglia started taking a close look at the quality of the CRU's code, and second, they began to feel sympathetic for anyone who had to spend three years (including working weekends) trying to make sense of code that appeared to be undocumented and buggy, while representing the core of CRU's climate model.

One programmer highlighted the error of relying on computer code that, if it generates an error message, continues as if nothing untoward ever occurred. Another debugged the code by pointing out why the output of a calculation that should always generate a positive number was incorrectly generating a negative one. A third concluded: "I feel for this guy. He's obviously spent years trying to get data from undocumented and completely messy sources."

Programmer-written comments inserted into CRU's Fortran code have drawn fire as well. The file says: "Apply a VERY ARTIFICAL correction for decline!!" and "APPLY ARTIFICIAL CORRECTION." Another,, says: "Low pass filtering at century and longer time scales never gets rid of the trend - so eventually I start to scale down the 120-yr low pass time series to mimic the effect of removing/adding longer time scales!"
Disclaimer: I am not a scientific modeler, nor do I play one on TV. However, I have cobbled together the odd simulation to explain various software behaviors, and I've done the occasional spreadsheet to help me understand various physical processes. In my experience, the basic methodology for producing such a model goes something like this:
  1. Write the model.
  2. Run it.
  3. Stare uncomprehendingly at the results, trying to figure out why they bear no relationship to what you perceive reality to be.
  4. Debug it and modify it.
  5. Until it looks sorta-kinda right, go to step 2.
  6. Once you believe it's getting close, start tweaking the various constants you defined for your model in an effort to get it in closer and closer agreement with whatever your observed data set is.
  7. Run it.
  8. If it doesn't agree with your data set, go to step 6.
  9. Hopefully gain some insight into what you were modeling.
  10. Breathe a sigh of relief that it's finally over.
  11. (and this is the most important step of all) Forget all about how you constructed the model as quickly as possible.
My experience is that as soon as you have the model debugged, you tend to treat its results as gospel. You rely much too heavily on the model for any future inferences that you make about the system.

This is, in my experience, an essential part of the psychology of computer geeks. Computer programming is an exercise in abstraction: you break a problem down into more basic parts that you think you understand. Once you get the behavior you understand from the underlying parts, you can forget about them and concentrate on the behavior generated when you hook those parts together, and so on, until you've created the whole system. It takes a certain amount of discipline to remember that you programmed the basic parts on a day when you had just had a fight with your family, or were interrupted halfway through a key chunk of code, or were sick. They aren't necessarily correct.

I'm sure that the good folks at CRU were conscientious in verifying their models to the best of their ability. I am also sure that they were not software engineers, and our friend Harry's notes make it crystal-clear that he was dealing with a bag of poorly matched components produced by people that probably knew a lot about thermodynamics but not so much about object-oriented design (or even structured programming), to say nothing of rigorous source-code control. It's no wonder that they promptly forgot everything about their models as soon as they got them working.

Why is this important? Several reasons:
  • When a model isn't understandable, it isn't capable of being critiqued. Criticism is the foundation of all scientific inquiry.

  • In my experience, lack of elegance comes from lack of understanding of the underlying problem. You write bad code when you're blundering about. Blundering is all well and good--it's another of the unspoken foundations of scientific inquiry--but it's not so good when your results are being handed over to the UN as a basis for policy recommendations.

  • Bad code implies that you have no way of quantifying how sensitive your model is to its data. We know that a non-linear system like the atmosphere is incredibly sensitive to initial conditions (cf. the "butterfly effect"). My guess is that CRU not only doesn't know how noisy its measurements are, it doesn't know how much variations in those measurements affect the output.

  • There is absolutely no way that you can make cogent engineering recommendations on how to modify atmospheric behavior when you don't understand your model.
Let me harp on this last point a bit more. Climate science has come a long way, and it's been able to generate some results to which we all ought to pay attention. However, there is nothing like a reliable theory of the atmosphere. We can barely reproduce historical climate behavior, to say nothing of predicting future behavior with any accuracy. The big lesson from Climategate is that, while it's a great achievement to be able to tweak a model until it fits your known data sets (and I'll leave the simply despicable behavior with regard to the acquisition, maintenance, manipulation, and obfuscation of those data sets to those with a greater store of moral outrage than I), it's quite another thing to have a model that's so well understood that a set of geo-engineering principles drops out of it.

And make no mistake: What we're embarking upon is a feat of geo-engineering. When we endeavor to reduce emissions, we're basically modifying the forcing functions on the atmosphere. (It's entirely possible that there's something that we don't understand that will cause non-linear behavior to emerge when the second derivative of the historical CO2 level goes strongly negative.)

Engineering has consequences, and it's really nice to understand them with with extremely well-understood science. This is how we avoid producing the atmospheric equivalent of Galloping Gertie.

No comments: