Something Is Bugging Me
Posted by Pete Hague on 14 Jun 2012
I've had a bit of a screw up this week. Its not the end of the world, but I wasted quite a great deal of time tracking down an very silly mistake in the code that I am using.
One part of mode code has to fit a curve (a radial dark matter density profile) to some data (the observed rotation curve of an artificial galaxy I generated) and has been producing some strange results. The curves that it deems ideal fit quite nicely, but the parameters of these curves don't seem right. Worse, when you compute the difference between the dark matter profile the optimisation code proposes, and the one you used to generate the data in the first place, It turns out that the new curve is substantially smaller than the original one.
I tried many, many things - but eventually traced the error to something irritatingly simple. In order to perform this calculation, I had to convert between SI units and the units I was using in the data, so I defined a bunch of conversion factors at the start of my code. The unit my data used for the density is thousandths-of-a-solar-mass-per-cubic-parsec. I already had the mass of the Sun in kilograms and the length of a parsec in metres defined at this point, so I simply multiplied the solar mass by 0.001, and divided by the parsec length cubed. Or at least, I thought I did.
The language I am using for this is R, which is perfectly suited to data analysis and plotting. It offers multiple ways of entering the number "0.001", for example "10^-3", or "1e-3". Presumably, when I typed in this line of code, I couldn't pick which one of those I liked best, and went with "10e-3" as a compromise. This isn't equal to 0.001, it is equal to 0.01 - and so by introducing it as a factor my halos got 10 times as massive, and thus the density parameter required to make them fit the data had to be 10 times smaller. I'm happy to admit this is an Epic Fail on my part.
But why did it happen? I knew there was likely a problem with the conversion factors. I had looked at the offending line tens, or perhaps hundreds, of times in the course of trying to correct this problem. Why did it take me so long to pick out this bug?
I sometimes watch a program entitled "Air Crash Investigation", whose name is a fairly accurate description of its content. It isn't as ghoulish as some TV critics, such as Charlie Brooker, like to make it out to be; it tends to focuses on technical issues and human interface factors more than dwelling on the fate of the passengers. One issue mentioned in the program is accidents caused when a pilot fails to 'scan' his instruments. Crashes can be caused by a pilot simply ignoring a critical instrument (such as the ADI, which shows the horizon) because their focus is elsewhere. Human factors experts bought on as talking heads in the program complain that people naturally suck at monitoring a large number of things at once.
So I'm wondering, could the same phenomenon that caused three pilots to all ignore their instruments telling them their aircraft was nose up and stalled until it was too late cause me to miss simple bugs in lines of code? Maybe humans just plain suck at this sort of thing?
I'd be interested to know if anyone has looked into a possible common cause for missing bugs/flying perfectly functional planes into the ground. As that is likely a long shot, I also wouldn't mind some suggestions as to how to improve my bug-finding ability.