Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: RAID1 boot - no bootable media found
Date: Tue, 30 Mar 2010 19:03:18
Message-Id: pan.2010.03.30.18.08.47@cox.net
In Reply to: Re: [gentoo-amd64] Re: RAID1 boot - no bootable media found by Mark Knecht
1 Mark Knecht posted on Tue, 30 Mar 2010 06:56:14 -0700 as excerpted:
2
3 > 3) I LOVE your idea of managing 3 /boot partitions by hand instead of
4 > using RAID. Easy to do, completely testable ahead of time. If I ensure
5 > that every disk can boot then no matter what disk goes down the machine
6 > still works, at least a little. Not that much work and even if I don't
7 > do it for awhile it doesn't matter as I can do repairs without a CD.
8 > (well....)
9
10 That's one of those things you only tend to realize after running a RAID
11 for awhile... and possibly after having grub die, for some reason I don't
12 quite understand, on just a kernel update... and realizing that had I
13 setup multiple independent /boot and boot-backup partitions instead of a
14 single RAID-1 /boot, I'd have had the backups to boot to if I'd have
15 needed it.
16
17 So call it the voice of experience! =:^)
18
19 Meanwhile, glad you figured the problem out. A boot-flag-requiring-
20 BIOS... that'd explain the problem for both the RAID and no-RAID version!
21
22 100% waits for long periods... I've seen a number of reasons for this.
23 One key to remember is that I/O backups have a way of stopping many other
24 things at times. Among the reasons I've seen:
25
26 1a) Dying disk. I've had old disks that would sometimes take some time to
27 respond, especially if they had gone to sleep. If you hear several clicks
28 (aka "the click of death") as it resets the disk and tries again... it's
29 time to think about either replacing the old disk, or sending in the new
30 one for a replacement.
31
32 1b) Another form of that is hard to read data sectors. It'll typically
33 try to read a bad sector quite a number of times, often for several
34 minutes at a time, before either giving up or reading it correctly.
35 Again, if you're seeing this, get a new disk and get your data transferred
36 before it's too late!
37
38 2) I think this one was fixed and I only read of it, I didn't experience
39 it myself. Back some time ago, if a network interface were active using
40 DHCP, but couldn't get a response from a DHCP server, it could cause
41 pretty much the entire system to hang for some time, every time the fake/
42 random address normally assigned from the zero-conf reserved netblock
43 expired. The system would try to find a DHCP server again, and again, if
44 one didn't answer, would eventually assign the zero-conf block fake/random
45 address again, but would cause a system hang of upto a minute (the default
46 timeout, AFAIK), before it would do so. Again, this /should/ have been
47 fixed quite some time ago, but one can never be sure what similar symptom
48 bug may be lurking in some hardware or other.
49
50 3) Back to disks, but not the harbinger of doom that #1 is, perhaps your
51 system is simply set to suspend the disks after a period of inactivity,
52 and it takes them some time to spin back up. I've had this happen to me,
53 but it was years ago and back on MS. But because of the issues I've had
54 more recently with (1), I'm sure it'd still be an issue in some
55 configurations. (Fortunately, laptop mode on my netbook with 120 gig SATA
56 hard drive seems to work very well and almost invisibly, to the point I
57 don't worry about disk sleep there at all, as the resume is smooth enough
58 I basically don't even notice -- save for the extra hour and a half of
59 runtime I normally get with laptop mode active! FWIW, the thing "just
60 works" in terms of both suspend2ram and hibernate/suspend2disk, as well.
61 =:^)
62
63 4) Kernels before... 2.6.30 I believe... could occasionally exhibit a read/
64 write I/O priority inversion on ext3. The problem had existed for
65 sometime, but was attributed to the normal effects of the then default
66 data=ordered as opposed to data=writeback journaling, until some massive
67 stability issues with ext4 (which ubuntu had just deployed as a non-
68 default option for their new installs, the problem came in combining that
69 with stuff like the unstable black-box nVidia drivers, which crashed
70 systems in the middle of writes on occasion!) prompted a reexamination of
71 a number of related previous assumptions. 2.6.30 had a quick-fix. 2.6.31
72 had better fixes, and additionally and quite controversially, switched
73 ext3 defaults to data=writeback, which with the new fixes, was judged
74 sufficiently stable to be the default. (As a reiserfs user who lived thru
75 the period before it got proper data=ordered, I'll never trust
76 data=writeback again, so I disagree with Linus decision to make it the
77 ext3 default, but at least I can change that on the systems I run.) So if
78 you're running a kernel older than 2.6.30 or .31, this could potentially
79 be an issue, tho it's unlikely to be /too/ bad under normal conditions.
80
81 Those are the possibilities I know of.
82
83 --
84 Duncan - List replies preferred. No HTML msgs.
85 "Every nonfree program has a lord, a master --
86 and if you use the program, he is your master." Richard Stallman

Replies

Subject Author
Re: [gentoo-amd64] Re: RAID1 boot - no bootable media found Mark Knecht <markknecht@×××××.com>