This article is written in a story-like retrospective fashion. (Update, 20100128, morning: Now with a happy ending! See below…) It’s a chain of events that took me from a happy Openfiler user, to and extremely angry Openfiler user, to a cautious Openfiler user. So let’s begin. “It was a dark and stormy night…”
It had been almost a year since I patched my Openfiler 2.3 server, and some significant updates had been released in that time. So after planning an outage window, I set out to apply the queued batch of updates (including
mkinitrd and a
kernel or two, more on that later). It seemed such a simple task….
After the patches were applied, the Openfiler server would not boot. Instead, it spewed out more errors than I could read to the console, ending with a kernel panic.
So I booted to the Openfiler install cdrom, and entered rescue mode with “
linux rescue“. Fortunately, it discovered and mounted all my partitions, and I was able to run with networking in order to reach the package sources or other sources/destinations if I should need them.
I noticed that the errors I had seen on the console pointed me toward missing drivers in the
initrd, so I started looking there. While poking around, I think I noticed that the latest update bundle had re-compiled all my
/boot. I’m not actually 100% sure about that now in hindsight, but it’s not important now. Anyway, I de-constructed one of the
initrd‘s this way to see what was up:
mkdir /tmp/temp cd /tmp/temp cp /boot/initrd-18.104.22.168-0.15.smp.gcc3.4.x86_64.img initrd-22.214.171.124-0.15.smp.gcc3.4.x86_64.img.gz gunzip initrd-126.96.36.199-0.15.smp.gcc3.4.x86_64.img.gz cpio -i --make-directories < initrd-188.8.131.52-0.15.smp.gcc3.4.x86_64.img ls -l ls -l lib/
Hmm. No drivers there. Should be a bunch of
*.ko files… Hmm…
I decided to roll back; so I ran
conary to revert to the state before the update (Run “
conary rblist” to figure out the number of the point-in-time to which to roll back, then “
conary rollback (number)” to actually do it. See this wiki page for details).
For some reason, this did not repair all my
initrd‘s, but it did seem to give me a working
mkinitrd script (in hindsight, I did not think to analyze it or keep a copy of it… drat). So I was now able to hand-build a working
initrd for one of the
kernels (the most recent before the update) with this command:
mv /boot/initrd-184.108.40.206-0.3.smp.gcc3.4.x86_64.img initrd-220.127.116.11-0.3.smp.gcc3.4.x86_64.img.bad mkinitrd /boot/initrd-18.104.22.168-0.3.smp.gcc3.4.x86_64.img 22.214.171.124-0.3.smp.gcc3.4.x86_64
And now, I was able to reboot into this
kernel with my new working
initrd…. and this time I made backups of it in my
So I proceeded to re-apply the patches bit by bit. I quickly realized that the dreaded
kernel update was a requirement as part of the major Openfiler update, so it came back with the re-run of the last few updates anyway. But this time, I was ready. I modified the
/boot/grub/menu.lst to use my old trusty
default=1), and make sure my hand-made
initrd was in place. Ok, good. Reboot. Right.
Back up in my older
kernel (phew!), but with my newer
kernel installed, I troubleshot… I tried to build my own
initrd against the new
kernel, but I kept getting errors like this for each needed module:
/usr/bin/strip: /lib/modules/126.96.36.199-0.15.smp.gcc3.4.x86_64/./kernel/drivers/rtc/rtc-lib.ko: File format not recognized
I dove in to the
/sbin/mkinitrd and found the cause. There are some lines in an “
if” block that optionally call “
strip” if present, and that is where it seems to err out. If “
strip” is not present, it just copies the module, like I want. Hmm. I don’t know if
strip was there before or not, or if the
mkinitrd suddenly has this new “
if” block, and I don’t care. I commented it out to look like this:
for MODULE in $MODULES; do # if [ -x /usr/bin/strip ]; then # /usr/bin/strip -g $verbose /lib/modules/$kernel/$MODULE -o $MNTIMAGE/lib/$(basename $MODULE) # else cp $verbose -a /lib/modules/$kernel/$MODULE $MNTIMAGE/lib # fi done
…and re-built my
initrd for the latest kernel like this:
mv /boot/initrd-188.8.131.52-0.15.smp.gcc3.4.x86_64.img initrd-184.108.40.206-0.15.smp.gcc3.4.x86_64.img.bad mkinitrd -f -v /boot/initrd-220.127.116.11-0.15.smp.gcc3.4.x86_64.img 18.104.22.168-0.15.smp.gcc3.4.x86_64
…and no more errors! Yay! I modified the
/boot/grub/menu.lst back to use my new
default=0), and I re-booted into it with no kernel panic! Yay!
And that’s my story. I really hope this helps someone out there…
UPDATE, 20100127, evening: On the Openfiler forums, there is talk of a fix. To quote a Rafiu, “This was due to a strange situation where 64-bit version of binutils was not built and the group update succeeded regardless.
We have now resolved this issue. Apologies to all that were affected by it. “
I have not yet tested this, but I do see that the latest binutils is in the update list now. More to come…
UPDATE, 20100128, morning: Yes. The latest set of updates at this time, *including* the correct set of binutils (22.214.171.124.6-7-0.0.2) for 64-bit, worked perfectly, and my server is still running as smoothly as ever after rebooting. Hooray. Though I will never get that Saturday night back that I spent chasing my initrd all over the place, but there you go…