How I fixed my Openfiler 2.3 server, after updates broke it…

This article is written in a story-like retrospective fashion.  (Update, 20100128, morning: Now with a happy ending! See below…)  It’s a chain of events that took me from a happy Openfiler user, to and extremely angry Openfiler user, to a cautious Openfiler user.  So let’s begin.  “It was a dark and stormy night…”

It had been almost a year since I patched my Openfiler 2.3 server, and some significant updates had been released in that time.  So after planning an outage window, I set out to apply the queued batch of updates (including mkinitrd and a kernel or two, more on that later).  It seemed such a simple task….

After the patches were applied, the Openfiler server would not boot.  Instead, it spewed out more errors than I could read to the console, ending with a kernel panic.

So I booted to the Openfiler install cdrom, and entered rescue mode with “linux rescue“.  Fortunately, it discovered and mounted all my partitions, and I was able to run with networking in order to reach the package sources or other sources/destinations if I should need them.

I noticed that the errors I had seen on the console pointed me toward missing drivers in the initrd, so I started looking there.  While poking around, I think I noticed that the latest update bundle had re-compiled all my initrd‘s in /boot. I’m not actually 100% sure about that now in hindsight, but it’s not important now.  Anyway, I de-constructed one of the initrd‘s this way to see what was up:

mkdir /tmp/temp
cd /tmp/temp
cp /boot/initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img.gz
gunzip initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img.gz
cpio -i --make-directories < initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img
ls -l
ls -l lib/

Hmm.  No drivers there.  Should be a bunch of *.ko files…  Hmm…

I decided to roll back;  so I ran conary to revert to the state before the update  (Run “conary rblist” to figure out the number of the point-in-time to which to roll back, then “conary rollback (number)” to actually do it.  See this wiki page for details).

For some reason, this did not repair all my initrd‘s, but it did seem to give me a working mkinitrd script (in hindsight, I did not think to analyze it or keep a copy of it… drat).  So I was now able to hand-build a working initrd for one of the kernels (the most recent before the update) with this command:

mv /boot/initrd-2.6.29.3-0.3.smp.gcc3.4.x86_64.img initrd-2.6.29.3-0.3.smp.gcc3.4.x86_64.img.bad
mkinitrd /boot/initrd-2.6.29.3-0.3.smp.gcc3.4.x86_64.img 2.6.29.3-0.3.smp.gcc3.4.x86_64

And now, I was able to reboot into this kernel with my new working initrd…. and this time I made backups of it in my /home directory!

So I proceeded to re-apply the patches bit by bit.  I quickly realized that the dreaded kernel update was a requirement as part of the major Openfiler update, so it came back with the re-run of the last few updates anyway.  But this time, I was ready.  I modified the /boot/grub/menu.lst to use my old trusty kernel and initrd, (default=1), and make sure my hand-made initrd was in place.  Ok, good.  Reboot.  Right.

Back up in my older kernel (phew!), but with my newer kernel installed, I troubleshot…  I tried to build my own initrd against the new kernel, but I kept getting errors like this for each needed module:

/usr/bin/strip: /lib/modules/2.6.29.6-0.15.smp.gcc3.4.x86_64/./kernel/drivers/rtc/rtc-lib.ko: File format not recognized

I dove in to the /sbin/mkinitrd and found the cause.  There are some lines in an “if” block that optionally call “strip” if present, and that is where it seems to err out.  If “strip” is not present, it just copies the module, like I want.  Hmm.  I don’t know if strip was there before or not, or if the mkinitrd suddenly has this new “if” block, and I don’t care.  I commented it out to look like this:

for MODULE in $MODULES; do
#    if [ -x /usr/bin/strip ]; then
#        /usr/bin/strip -g $verbose /lib/modules/$kernel/$MODULE -o $MNTIMAGE/lib/$(basename $MODULE)
#    else
cp $verbose -a /lib/modules/$kernel/$MODULE $MNTIMAGE/lib
#    fi
done

…and re-built my initrd for the latest kernel like this:

mv /boot/initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img.bad
mkinitrd -f -v /boot/initrd-2.6.29.6-0.15.smp.gcc3.4.x86_64.img 2.6.29.6-0.15.smp.gcc3.4.x86_64

…and no more errors!  Yay!  I modified the /boot/grub/menu.lst back to use my new kernel and initrd, (default=0), and I re-booted into it with no kernel panic! Yay!

And that’s my story.  I really hope this helps someone out there…

UPDATE, 20100127, evening:  On the Openfiler forums, there is talk of a fix.  To quote a Rafiu, “This was due to a strange situation where 64-bit version of binutils was not built and the group update succeeded regardless.

We have now resolved this issue. Apologies to all that were affected by it. “

I have not yet tested this, but I do see that the latest binutils is in the update list now.  More to come…

UPDATE, 20100128, morning:  Yes.  The latest set of updates at this time, *including* the correct set of binutils (2.17.50.0.6-7-0.0.2) for 64-bit, worked perfectly, and my server is still running as smoothly as ever after rebooting.  Hooray.  Though I will never get that Saturday night back that I spent chasing my initrd all over the place, but there you go…

😉

5 Comments

  1. Ambo

    Great job debugging the problem. I also encountered this Openfiler-conary issue with the initrd and started to google for the answer. Your blog was the only result that came up.

    It seems that a new kernel up (2.6.29.6-0.16-1) is up (just 3 hours ago).

  2. Jeremy Pavlov

    @ Ambo

    Ugh. Thanks for the heads-up, I checked and see you are right. I guess I’ll clear out my schedule for a day and give the updates a try…

    😉

  3. Ambo

    It seems there are still issues with kernel 2.6.29.6-0.16-1. My board’s Realtek NIC driver is not available and the boot cannot reach to the login console. Maybe something in /dev got broken….

  4. anders

    the of guys are fixing it now and should be fixed shortly, good writeup, I assume you are mr elgato?

  5. Jeremy Pavlov

    @ anders

    Thanks for the tip; I’ll keep watch for it.

    …Mr. ElGato? Um, no… But I like cats…

    😉

Leave a Comment

Your email address will not be published. Required fields are marked *