1 |
On Wed, Dec 16, 2009 at 10:07 PM, Tom Bennet <twbennet@×××××.com> wrote: |
2 |
> I spent the day recovering from a Gentoo upgrade, and thought I'd document |
3 |
> the experience in case it helps someone else. |
4 |
> |
5 |
> I'm running a custom kernel 2.6.25-gentoo-r7 on amd64, though I don't think |
6 |
> the rarer hardware is relevant. |
7 |
> |
8 |
> I tend to put off upgrading my Gentoo box because anytime I do, something |
9 |
> breaks. I'm afraid I haven't changed my opinion about that. Anyway, I did |
10 |
> "emerge --update --deep world" and plugged my ears. Some 600-odd packages |
11 |
> (and a few simpler problems) later, the system seemed to be doing okay. So |
12 |
> I thought I'd see if it could survive a reboot. No, it couldn't. |
13 |
> |
14 |
> On boot it failed checking the root file system and dropped into the repair |
15 |
> shell. The reason the fsck failed is that the root pseudo device file |
16 |
> /dev/md0, didn't exist. The root file system was actually, fine, though. |
17 |
> Inside the repair shell, I could see all the files from my root, but there |
18 |
> wasn't much in /dev. (I have the md stuff compiled in to the kernel, and |
19 |
> don't use an initrd, so it wasn't an initrd problem.) |
20 |
> |
21 |
> Short Solution |
22 |
> |
23 |
> The problem was with udev, the facility which automatically populates the |
24 |
> /dev directory. During the upgrade, emerge noted that my kernel version was |
25 |
> a bit early, but acceptable. What was missing, apparently, was the signalfd |
26 |
> syscall, which that kernel version either doesn't have or I hadn't |
27 |
> configured. Apparently, udev has only started using signalfd recently, so |
28 |
> the solution was to downgrade to an older version of udev (udev-141 to be |
29 |
> precise). |
30 |
> |
31 |
> What I Actually Did To Get There |
32 |
> |
33 |
> Of course, I didn't know that at first. Just had a fun unbootable system. |
34 |
> I might have been able to simply emerge the downgrade from the repair shell |
35 |
> (the network did come up), but I didn't know to try that yet. I figured I |
36 |
> wanted to find some way to make the system boot. Since the failing file |
37 |
> check is done from /etc/init.d/checkroot, I added a mknod command to create |
38 |
> the device node before trying to run the file check. At the start of the |
39 |
> start() method: |
40 |
> |
41 |
> if [ ! -e /dev/md0 ] ; then |
42 |
> mknod -m 0660 /dev/md0 b 9 0 |
43 |
> fi |
44 |
> |
45 |
> It's a hack, not a solution, but it did make the system boot, to a rather |
46 |
> crippled state. Since there were a lot of devices missing, a lot of |
47 |
> services wouldn't start. (If you're using a more boring root partition, it |
48 |
> might be something like "mknod -m 0660 /dev/sda1 b 8 1") |
49 |
> |
50 |
> So I had managed by now to gather that udev wasn't working, but I didn't |
51 |
> know why. My first thought was to try "/etc/init.d/udev start", to see if |
52 |
> it would start. But it told me that the script is written for baselevel-2, |
53 |
> and I shouldn't use it on baselevel-1. Following a bit of googling about |
54 |
> what the heck a baselevel is, I gathered that I was using baselevel-1, and |
55 |
> so the service wasn't supposed to be started that way. So it wasn't a bug |
56 |
> that it wouldn't start that way. Another page suggested trying to run it |
57 |
> directly, with "/sbin/udevd --daemon", which gave the message "error getting |
58 |
> signalfd". That told my why it didn't start. This message was also in the |
59 |
> logs, but for some reason I didn't look there until later. |
60 |
> |
61 |
> So back to Google, and I found a message on a Debian board noting that udev |
62 |
> had started using signalfd recently. This suggested an old version might do |
63 |
> the trick. I tried one, and it did. |
64 |
|
65 |
I really only have two things to say, after reading this... First, and |
66 |
this really does overshadow the second in weight, thank you for the |
67 |
excellently presented writeup of problem *and* solution, as more often |
68 |
than ever should be (less so here, but across the net as a whole), |
69 |
problems are mentioned, solutions are offered, and rarely does a good, |
70 |
clear, "this worked" follow. Secondly... it's been my experience, with |
71 |
Gentoo, that things break far more often when I allow longer delays |
72 |
between updating than when I keep up to date with everything, and it's |
73 |
held true for me on both x86 and ~x86 systems (as has the headache |
74 |
when I've put updates off). |
75 |
|
76 |
And.. I reiterate a part of the "first"... Thank you for the writeup. |
77 |
|
78 |
-- |
79 |
Poison [BLX] |
80 |
Joshua M. Murphy |