Thursday, December 30, 2010

I hate "fixing" something when I can neither fully explain why it was broken nor why the "fix" works.

On my technical blog, I recently went through how I worked around what seemed to be brokenness in Linux's routing. I'd done a fair amount of digging around to try to solve my problem. I even asked questions on some forum sites (to no avail). Eventually, with enough persistence, I cobbled together a solution.

Unfortunately, that wasn't quite enough for me. I mean, I had an observed behavior I was trying to surmount, but, hadn't had the tools to really tear it apart. So, I was treating symptoms rather than finding a cure. I hate that approach to problem solving.

So, today, since I had some time, I broke down and tried to see what was really going on under the covers. I wanted to sort out the root-cause of the symptoms I'd addressed. So, I downloaded tcpdump and tried to watch my packet flows both with and without the "fix" in place.

Even with what should be the right analysis tools in place to see the problem, it wasn't any more enlightening. The error I expected to see wasn't there. Worse, there wasn't really any alternate error in its place. Fucking. Maddening.

I still have the urge to track it down further (yeah, OCD!), even though I know that it's mostly wasted effort. I mean, the issue I ran into shouldn't happen in production situations, and, even if it did, I've documented the "fix" for it. I just don't like not knowing why it was broken or why my "fix" works.

Gah... I'd be ill-suited for work in medicine or other theoretical scientific endeavors.

No comments:

Post a Comment