How do you troubleshoot completely random problems?
My home desktop machine has been suffering from a Linux kernel "Oops"
approximately once every two days for the last few weeks. I would really
like it to stop doing that. When I get a stack trace in my logs, it's
consistently in the "kswapd" process, even though I disabled all
swap weeks ago.
I'm running Edgy on this machine, just like I was running it on my laptop
and am running it on my work desktop. Those machines were both completely
stable (modulo occasional ndiswrapper issues) running the exact same
kernel.
It doesn't seem like it's a hardware issue. At least, the same machine has
never exhibited any problems under Windows.
It isn't deterministically reproducible. It always seems to be in response
to a click or some kind of user-input event during heavy disk I/O, but
flogging the disks and mashing the keyboard, even for hours at a time,
doesn't cause it to happen.
I am considering a fresh re-install to attempt a fix for this, but besides
the inelegance of that solution, it seems likely that it will leave me in
the same place.
Does anyone have a suggestion for tracking this down so that I'll actually
know that it's fixed?