Gentoo Archives: gentoo-amd64

From:	Duncan <1i5t5.duncan@×××.net>
To:	gentoo-amd64@l.g.o
Subject:	[gentoo-amd64] Re: RAID1 boot - no bootable media found
Date:	Tue, 30 Mar 2010 19:03:18
Message-Id:	`pan.2010.03.30.18.08.47@cox.net`
In Reply to:	Re: [gentoo-amd64] Re: RAID1 boot - no bootable media found by Mark Knecht

1	Mark Knecht posted on Tue, 30 Mar 2010 06:56:14 -0700 as excerpted:
2
3	> 3) I LOVE your idea of managing 3 /boot partitions by hand instead of
4	> using RAID. Easy to do, completely testable ahead of time. If I ensure
5	> that every disk can boot then no matter what disk goes down the machine
6	> still works, at least a little. Not that much work and even if I don't
7	> do it for awhile it doesn't matter as I can do repairs without a CD.
8	> (well....)
9
10	That's one of those things you only tend to realize after running a RAID
11	for awhile... and possibly after having grub die, for some reason I don't
12	quite understand, on just a kernel update... and realizing that had I
13	setup multiple independent /boot and boot-backup partitions instead of a
14	single RAID-1 /boot, I'd have had the backups to boot to if I'd have
15	needed it.
16
17	So call it the voice of experience! =:^)
18
19	Meanwhile, glad you figured the problem out. A boot-flag-requiring-
20	BIOS... that'd explain the problem for both the RAID and no-RAID version!
21
22	100% waits for long periods... I've seen a number of reasons for this.
23	One key to remember is that I/O backups have a way of stopping many other
24	things at times. Among the reasons I've seen:
25
26	1a) Dying disk. I've had old disks that would sometimes take some time to
27	respond, especially if they had gone to sleep. If you hear several clicks
28	(aka "the click of death") as it resets the disk and tries again... it's
29	time to think about either replacing the old disk, or sending in the new
30	one for a replacement.
31
32	1b) Another form of that is hard to read data sectors. It'll typically
33	try to read a bad sector quite a number of times, often for several
34	minutes at a time, before either giving up or reading it correctly.
35	Again, if you're seeing this, get a new disk and get your data transferred
36	before it's too late!
37
38	2) I think this one was fixed and I only read of it, I didn't experience
39	it myself. Back some time ago, if a network interface were active using
40	DHCP, but couldn't get a response from a DHCP server, it could cause
41	pretty much the entire system to hang for some time, every time the fake/
42	random address normally assigned from the zero-conf reserved netblock
43	expired. The system would try to find a DHCP server again, and again, if
44	one didn't answer, would eventually assign the zero-conf block fake/random
45	address again, but would cause a system hang of upto a minute (the default
46	timeout, AFAIK), before it would do so. Again, this /should/ have been
47	fixed quite some time ago, but one can never be sure what similar symptom
48	bug may be lurking in some hardware or other.
49
50	3) Back to disks, but not the harbinger of doom that #1 is, perhaps your
51	system is simply set to suspend the disks after a period of inactivity,
52	and it takes them some time to spin back up. I've had this happen to me,
53	but it was years ago and back on MS. But because of the issues I've had
54	more recently with (1), I'm sure it'd still be an issue in some
55	configurations. (Fortunately, laptop mode on my netbook with 120 gig SATA
56	hard drive seems to work very well and almost invisibly, to the point I
57	don't worry about disk sleep there at all, as the resume is smooth enough
58	I basically don't even notice -- save for the extra hour and a half of
59	runtime I normally get with laptop mode active! FWIW, the thing "just
60	works" in terms of both suspend2ram and hibernate/suspend2disk, as well.
61	=:^)
62
63	4) Kernels before... 2.6.30 I believe... could occasionally exhibit a read/
64	write I/O priority inversion on ext3. The problem had existed for
65	sometime, but was attributed to the normal effects of the then default
66	data=ordered as opposed to data=writeback journaling, until some massive
67	stability issues with ext4 (which ubuntu had just deployed as a non-
68	default option for their new installs, the problem came in combining that
69	with stuff like the unstable black-box nVidia drivers, which crashed
70	systems in the middle of writes on occasion!) prompted a reexamination of
71	a number of related previous assumptions. 2.6.30 had a quick-fix. 2.6.31
72	had better fixes, and additionally and quite controversially, switched
73	ext3 defaults to data=writeback, which with the new fixes, was judged
74	sufficiently stable to be the default. (As a reiserfs user who lived thru
75	the period before it got proper data=ordered, I'll never trust
76	data=writeback again, so I disagree with Linus decision to make it the
77	ext3 default, but at least I can change that on the systems I run.) So if
78	you're running a kernel older than 2.6.30 or .31, this could potentially
79	be an issue, tho it's unlikely to be /too/ bad under normal conditions.
80
81	Those are the possibilities I know of.
82
83	--
84	Duncan - List replies preferred. No HTML msgs.
85	"Every nonfree program has a lord, a master --
86	and if you use the program, he is your master." Richard Stallman

Replies

Subject	Author
Re: [gentoo-amd64] Re: RAID1 boot - no bootable media found	Mark Knecht <markknecht@×××××.com>

Report Message

Find on MARC Find on Google Groups