1 |
I spent the day recovering from a Gentoo upgrade, and thought I'd document |
2 |
the experience in case it helps someone else. |
3 |
|
4 |
I'm running a custom kernel 2.6.25-gentoo-r7 on amd64, though I don't think |
5 |
the rarer hardware is relevant. |
6 |
|
7 |
I tend to put off upgrading my Gentoo box because anytime I do, something |
8 |
breaks. I'm afraid I haven't changed my opinion about that. Anyway, I did |
9 |
"emerge --update --deep world" and plugged my ears. Some 600-odd packages |
10 |
(and a few simpler problems) later, the system seemed to be doing okay. So |
11 |
I thought I'd see if it could survive a reboot. No, it couldn't. |
12 |
|
13 |
On boot it failed checking the root file system and dropped into the repair |
14 |
shell. The reason the fsck failed is that the root pseudo device file |
15 |
/dev/md0, didn't exist. The root file system was actually, fine, though. |
16 |
Inside the repair shell, I could see all the files from my root, but there |
17 |
wasn't much in /dev. (I have the md stuff compiled in to the kernel, and |
18 |
don't use an initrd, so it wasn't an initrd problem.) |
19 |
|
20 |
*Short Solution |
21 |
|
22 |
*The problem was with udev, the facility which automatically populates the |
23 |
/dev directory. During the upgrade, emerge noted that my kernel version was |
24 |
a bit early, but acceptable. What was missing, apparently, was the signalfd |
25 |
syscall, which that kernel version either doesn't have or I hadn't |
26 |
configured. Apparently, udev has only started using signalfd recently, so |
27 |
the solution was to downgrade to an older version of udev (udev-141 to be |
28 |
precise). |
29 |
|
30 |
*What I Actually Did To Get There* |
31 |
|
32 |
Of course, I didn't know that at first. Just had a fun unbootable system. |
33 |
I might have been able to simply emerge the downgrade from the repair shell |
34 |
(the network did come up), but I didn't know to try that yet. I figured I |
35 |
wanted to find some way to make the system boot. Since the failing file |
36 |
check is done from /etc/init.d/checkroot, I added a mknod command to create |
37 |
the device node before trying to run the file check. At the start of the |
38 |
start() method: |
39 |
|
40 |
if [ ! -e /dev/md0 ] ; then |
41 |
mknod -m 0660 /dev/md0 b 9 0 |
42 |
fi |
43 |
|
44 |
It's a hack, not a solution, but it did make the system boot, to a rather |
45 |
crippled state. Since there were a lot of devices missing, a lot of |
46 |
services wouldn't start. (If you're using a more boring root partition, it |
47 |
might be something like "mknod -m 0660 /dev/sda1 b 8 1") |
48 |
|
49 |
So I had managed by now to gather that udev wasn't working, but I didn't |
50 |
know why. My first thought was to try "/etc/init.d/udev start", to see if |
51 |
it would start. But it told me that the script is written for baselevel-2, |
52 |
and I shouldn't use it on baselevel-1. Following a bit of googling about |
53 |
what the heck a baselevel is, I gathered that I was using baselevel-1, and |
54 |
so the service wasn't supposed to be started that way. So it wasn't a bug |
55 |
that it wouldn't start that way. Another page suggested trying to run it |
56 |
directly, with "/sbin/udevd --daemon", which gave the message "error getting |
57 |
signalfd". That told my why it didn't start. This message was also in the |
58 |
logs, but for some reason I didn't look there until later. |
59 |
|
60 |
So back to Google, and I found a message on a Debian board noting that udev |
61 |
had started using signalfd recently. This suggested an old version might do |
62 |
the trick. I tried one, and it did. |