The Ghost(busters) in the Machine

How do you troubleshoot completely random problems?


My home desktop machine has been suffering from a Linux kernel "Oops" approximately once every two days for the last few weeks. I would really like it to stop doing that. When I get a stack trace in my logs, it's consistently in the "kswapd" process, even though I disabled all swap weeks ago.


I'm running Edgy on this machine, just like I was running it on my laptop and am running it on my work desktop. Those machines were both completely stable (modulo occasional ndiswrapper issues) running the exact same kernel.


It doesn't seem like it's a hardware issue. At least, the same machine has never exhibited any problems under Windows.


It isn't deterministically reproducible. It always seems to be in response to a click or some kind of user-input event during heavy disk I/O, but flogging the disks and mashing the keyboard, even for hours at a time, doesn't cause it to happen.


I am considering a fresh re-install to attempt a fix for this, but besides the inelegance of that solution, it seems likely that it will leave me in the same place.


Does anyone have a suggestion for tracking this down so that I'll actually know that it's fixed?

Grim Prophecy

You all thought I'd forget, didn't you?  About the dire consequences, that is?
  • Chrisopher has not responded yet.
  • Jonathan has not responded yet.  Jonathan has responded.
  • Travis has not responded yet.
  • Raffi has not responded yet, but the Synthesis blog seems to be offline, and it isn't really a personal thing anyway, so maybe I was out of line to tag him there.
  • Jason responded, but in a "friends only" post.  Kind of borderline, there, but he's doing the best so far (except Jonathan) so I'm not going to give him a hard time.
I'm just saying: it's not my fault that the cold hand of destiny will not brook a chain-letter dodger.

The Legend Continues

My saga of ndiswrapper on the macbook continues.


In fact, the dw102 drivers do cause crashes when associating with certain access points. Unfortunately the dw101 drivers don't work with certain (still other) access points, and the lenovo "abgn" drivers have some very peculiar problems with extremely bad UDP performance on the access points the dw101 don't work with.


It occurred to me after a few hours of trolling for better drivers that, in fact, there is a better way. Apple ships windows drivers specifically for this exact card! You don't need to use drivers for some other card with the same (or vaguely similar) chipset.


The Macintosh Drivers For Windows CD included with Boot Camp contains the driver. Obnoxiously, it's encapsulated within a MSI file, within an EXE installer, within a DMG image, inside the Boot Camp application. The only way I could discover to get at the wireless driver was to install the whole thing on a Real, Actual Windows Computer.


To access the driver CD if Boot Camp won't let you burn it, ctrl-click on "/Applications/Utilities/Boot Camp Assistant.app" and click on "Show Package Contents". Then, double-click on "Contents/Resources/DiskImage.dmg". Copy the files on the thing that shows up on your desktop onto a USB key or similar method of conveyance to a Windows machine, and run the installer.


Obnoxious as this process is, it thankfully doesn't make you install to a Windows installation on a Real Actual MacBook Core 2 Duo laptop. Any old Windows machine will do. The installer helpfully puts all the drivers into "C:\Program Files\Macintosh Drivers for Windows XP 1.1.2". The files for the network card are in the "net5416" folder.


Of course, they're also in a file called "net5416.tar.bz2" on my hard disk. I think Apple might take a dim view of me providing a public download site bereft of their unethical and legally dubious EULAs, but if you can't get at the drivers for some reason maybe I could let you have a copy.

come on people we can't lose this one

Twisted is the goddamn engine of your Internet.

Don't let anyone say otherwise.

Update: We're over the line now - 33 to 32!  Keep on voting though, make that bar bright green.

MacBookBuntu Take 2

So, it seems that it is possible to get the last feature that I really, really wanted to use on my MacBook, NetworkManager, to work under Ubuntu Edgy without a custom kernel.  The steps involved are not difficult, but they are tricky, due to an interlocking matrix of bugs.  Thanks to some insomnia and procrastination I've decided to write it up.
Screenshot

Reader beware!  These steps are designed to work around a particular set of bugs on a particular revision of Ubuntu for a particular piece of hardware.  If you're reading this at some point in the future, chances are that the madwifi project has already produced a driver.  To find out, check to see if madwifi ticket 1001 has been resolved before you do any of this.

If you are running Ubuntu Edgy, with the default kernel, and you have a black rev.2 MacBook and you want to get NetworkManager working without screwing around with Feisty kernels, read on.
  • First, you will have to build your own ndiswrapper.  You need at least version 1.29, which is quite a bit newer than the version packaged with edgy.  I chose 1.31 - not too old, not too new.  You need at least version 1.43.  (Earlier versions seemed to work, but 1.44 was the first version I installed which could suspend and resume reliably and did not very occasionally produce random crashes.) These instructions may very well work with older or newer revisions, but I am not going to build a big revision matrix; the whole point of this is a temporary workaround.  Install it with "make uninstall; make install" to make sure to remove ubuntu's packaged ndiswrapper driver first.  Keep in mind that if you upgrade your kernel via apt, you may need to repeat this step, so keep the sources around.
  • Next, you will want to get the driver from d-link.  Go to this page, and get version 1.02.  Some discussion on the Madwifi page says to get version 1.01 instead because 1.02 crashes.  Ignore it.  As far as I can tell, it's just wrong.  I originally followed this advice, which is why I thought that it crashed ndiswrapper; I also had other problems with version 1.01 such as not being able to associate with various public access points.  See my later post about which driver to get.  Anyhow, install the driver into ndiswrapper by unzipping the downloaded archive and running "sudo ndiswrapper -i net5416.inf".
  • Test to make sure the driver works.  If you are already running NetworkManager and nm-applet, simply doing 'modprobe ndiswrapper' ought to set it up nicely and you should get immediate visual feedback that it is working.
  • Configure the module to un-load itself when you suspend or sleep and re-load itself when you resume.  This is important because otherwise ndiswrapper will not allow anything   This is accomplished by editing the file /etc/default/acpi-support and changing the 'MODULES' line to say 'MODULES="ndiswrapper"'.  While you're in there, you might want to also change the 'STOP_SERVICES' line to say 'STOP_SERVICES="mysql bluetooth "' instead of just mysql, since bluetooth is notoriously unreliable in the face of power management, and bluetooth connections, like wifi connections, will not survive a suspend/resume cycle anyway.
    • Make sure that ndiswrapper does not create a 'wlan0' alias for itself; you probably don't need to do anything, but if you're using a different version of the ndiswrapper script, it may create a file called "/etc/modprobe.d/ndiswrapper" with an alias for "wlan0" in it.  If you see this, remove it.  NetworkManager knows about the module name 'wlan0' and will constantly try to load it if it becomes unloaded for some reason.  This results in a particularly nasty race condition where the suspend machinery politely removes ndiswrapper in preparation for suspending and then NetworkManager loads it again, resulting in a hang from which it is impossible to recover without a hard reboot.  I managed to create this situation for myself through experimentation, so it probably won't happen to you, but just in case make sure that file doesn't contain any reference to 'wlan0'.
  • Test suspending and resuming and make sure that the driver loads as expected.
  • Configure the driver to load at boot.  This consists of editing the file /etc/modules and adding a line that says "ndiswrapper". DO NOT do this until you are sure the driver works with your machine; if it causes a crash, this might make your machine unbootable.
There you have it.  It's an unfortunately jury rigged situation, but now that I've gone through these steps, I can effectively (mostly) forget about ndiswrapper and compiling kernel modules and so on, and avoid the tedious and unpleasant typing "iwlist wlan0 scan | less" and instead just click on the access point that I want.