This article is written in a story-like retrospective fashion. (Update, 20100128, morning: Now with a happy ending! See below…) It’s a chain of events that took me from a happy Openfiler user, to and extremely angry Openfiler user, to a cautious Openfiler user. So let’s begin. “It was a dark and stormy night…”
It had been almost a year since I patched my Openfiler 2.3 server, and some significant updates had been released in that time. So after planning an outage window, I set out to apply the queued batch of updates (including mkinitrd
and a kernel
or two, more on that later). It seemed such a simple task….
After the patches were applied, the Openfiler server would not boot. Instead, it spewed out more errors than I could read to the console, ending with a kernel panic.
So I booted to the Openfiler install cdrom, and entered rescue mode with “linux rescue
“. Fortunately, it discovered and mounted all my partitions, and I was able to run with networking in order to reach the package sources or other sources/destinations if I should need them.
I noticed that the errors I had seen on the console pointed me toward missing drivers in the initrd
, so I started looking there. While poking around, I think I noticed that the latest update bundle had re-compiled all my initrd
‘s in /boot
. I’m not actually 100% sure about that now in hindsight, but it’s not important now. Anyway, I de-constructed one of the initrd
‘s this way to see what was up:
mkdir /tmp/temp cd /tmp/temp cp /boot/initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img.gz gunzip initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img.gz cpio -i --make-directories < initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img ls -l ls -l lib/
Hmm. No drivers there. Should be a bunch of *.ko
files… Hmm…
I decided to roll back; so I ran conary
to revert to the state before the update (Run “conary rblist
” to figure out the number of the point-in-time to which to roll back, then “conary rollback (number)
” to actually do it. See this wiki page for details).
For some reason, this did not repair all my initrd
‘s, but it did seem to give me a working mkinitrd
script (in hindsight, I did not think to analyze it or keep a copy of it… drat). So I was now able to hand-build a working initrd
for one of the kernels
(the most recent before the update) with this command:
mv /boot/initrd-2.6.29.3-0.3.smp.gcc3.4.x86_64.img initrd-2.6.29.3-0.3.smp.gcc3.4.x86_64.img.bad mkinitrd /boot/initrd-2.6.29.3-0.3.smp.gcc3.4.x86_64.img 2.6.29.3-0.3.smp.gcc3.4.x86_64
And now, I was able to reboot into this kernel
with my new working initrd
…. and this time I made backups of it in my /home
directory!
So I proceeded to re-apply the patches bit by bit. I quickly realized that the dreaded kernel
update was a requirement as part of the major Openfiler update, so it came back with the re-run of the last few updates anyway. But this time, I was ready. I modified the /boot/grub/menu.lst
to use my old trusty kernel
and initrd
, (default=1
), and make sure my hand-made initrd
was in place. Ok, good. Reboot. Right.
Back up in my older kernel
(phew!), but with my newer kernel
installed, I troubleshot… I tried to build my own initrd
against the new kernel
, but I kept getting errors like this for each needed module:
/usr/bin/strip: /lib/modules/2.6.29.6-0.15.smp.gcc3.4.x86_64/./kernel/drivers/rtc/rtc-lib.ko: File format not recognized
I dove in to the /sbin/mkinitrd
and found the cause. There are some lines in an “if
” block that optionally call “strip
” if present, and that is where it seems to err out. If “strip
” is not present, it just copies the module, like I want. Hmm. I don’t know if strip
was there before or not, or if the mkinitrd
suddenly has this new “if
” block, and I don’t care. I commented it out to look like this:
for MODULE in $MODULES; do # if [ -x /usr/bin/strip ]; then # /usr/bin/strip -g $verbose /lib/modules/$kernel/$MODULE -o $MNTIMAGE/lib/$(basename $MODULE) # else cp $verbose -a /lib/modules/$kernel/$MODULE $MNTIMAGE/lib # fi done
…and re-built my initrd
for the latest kernel like this:
mv /boot/initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img.bad mkinitrd -f -v /boot/initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img 2.6.29.6-0.15.smp.gcc3.4.x86_64
…and no more errors! Yay! I modified the /boot/grub/menu.lst
back to use my new kernel
and initrd
, (default=0
), and I re-booted into it with no kernel panic! Yay!
And that’s my story. I really hope this helps someone out there…
UPDATE, 20100127, evening: On the Openfiler forums, there is talk of a fix. To quote a Rafiu, “This was due to a strange situation where 64-bit version of binutils was not built and the group update succeeded regardless.
We have now resolved this issue. Apologies to all that were affected by it. “
I have not yet tested this, but I do see that the latest binutils is in the update list now. More to come…
UPDATE, 20100128, morning: Yes. The latest set of updates at this time, *including* the correct set of binutils (2.17.50.0.6-7-0.0.2) for 64-bit, worked perfectly, and my server is still running as smoothly as ever after rebooting. Hooray. Though I will never get that Saturday night back that I spent chasing my initrd all over the place, but there you go…
😉
Great job debugging the problem. I also encountered this Openfiler-conary issue with the initrd and started to google for the answer. Your blog was the only result that came up.
It seems that a new kernel up (2.6.29.6-0.16-1) is up (just 3 hours ago).
@ Ambo
Ugh. Thanks for the heads-up, I checked and see you are right. I guess I’ll clear out my schedule for a day and give the updates a try…
😉
It seems there are still issues with kernel 2.6.29.6-0.16-1. My board’s Realtek NIC driver is not available and the boot cannot reach to the login console. Maybe something in /dev got broken….
the of guys are fixing it now and should be fixed shortly, good writeup, I assume you are mr elgato?
@ anders
Thanks for the tip; I’ll keep watch for it.
…Mr. ElGato? Um, no… But I like cats…
😉