Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: RAID1 boot - no bootable media found
Date: Tue, 30 Mar 2010 07:03:30
Message-Id: pan.2010.03.30.06.39.35@cox.net
In Reply to: [gentoo-amd64] RAID1 boot - no bootable media found by Mark Knecht
1 Mark Knecht posted on Sun, 28 Mar 2010 10:14:03 -0700 as excerpted:
2
3 > I brought up new hardware yesterday for my first RAID install. I
4 > followed this Gentoo page describing a software RAID1/LVM install:
5 >
6 > http://www.gentoo.org/doc/en/gentoo-x86+raid+lvm2-quickinstall.xml
7 >
8 > Note that I followed this page verbatim, even if it wasn't what I
9 > wanted, with exceptions:
10 >
11 > a) My RAID1 is 3 drives instead of 2
12 > b) I'm AMD64 Gentoo based.
13 > c) I used grub-static
14
15 Had you gotten anything off the other list, I see no other replies here.
16 Do you have that install or are you trying over as you mentioned you might?
17
18 That post was a bit long to quote in full, and somewhat disordered to try
19 to reply per element, so I just quoted the above and will cover a few
20 things as I go.
21
22 1) I'm running kernel/md RAID here, too (and was formerly running LVM2,
23 which is what I expect you mean by LVM, and I'll continue simply calling
24 it LVM), so I know some about it.
25
26 2) The Gentoo instructions don't say to, but just in case... you didn't
27 put /boot and / on LVM, only on the RAID-1, correct? LVM is only for non-
28 root non-boot. (Actually, you can put / on LVM, if and only if you run an
29 initrd/initramfs, but it significantly complicates things. Keeping / off
30 of LVM simplifies things considerably, so I'd recommend it.) This is
31 because while the kernel can auto-detect and configure RAID, or the RAID
32 config can be fed to it on the command line. The kernel cannot by itself
33 figure out how to configure LVM -- only the LVM userspace knows how to
34 read and configure LVM, so an LVM userspace and config must be available
35 before it can be loaded. This can be accomplished by using an initrd/
36 initramfs with LVM loaded on it, but things are MUCH less complex if /
37 isn't LVM, so LVM can be loaded from the normal /.
38
39 3) You mention not quite understanding how /boot works on md/RAID -- how
40 does grub know where to look? Well, it only works on md/kernel RAID-1,
41 and that only because RAID-1 is basically the same as a non-RAID setup,
42 only instead of one disk, there's several, each a mirror duplicate of the
43 others (but for a bit of RAID metadata). Thus, grub basically treats each
44 disk as if it wasn't in RAID, and it works, because it's organized almost
45 the same as if it wasn't in RAID. That's why you have to install grub
46 separately to each disk, because it's treating them as separate disks, not
47 RAID mirrors. But it doesn't work with other RAID levels because they mix
48 up data stripes, and grub doesn't know anything about that.
49
50 4) Due to personal experience recovering from a bad disk (pre-RAID, that's
51 why I switched to RAID), I'd actually recommend putting everything portage
52 touches or installs to on / as well. That way, everything is kept in sync
53 and you don't get into a situation where / including /bin /sbin and /etc
54 are a snapshot from one point in time, while portage's database in /var/db
55 is a different one, and stuff installed to /usr may be an entirely
56 different one. Not to mention /opt if you have anything installed
57 there... If all that's on /, then it should all remain in sync. Plus
58 then you don't have to worry about something boot-critical being installed
59 to /usr, which isn't mounted until about midway thru the boot cycle.
60
61 4 cont) What then goes on other partitions is subdirs of the above,
62 /usr/local, very likely, as you'll probably want to keep it if you
63 reinstall, /home, for the same reason, /var/log, so a runaway log can't
64 eat up all the space on /, it's limited to eating up everything on the log
65 partition, likely /tmp, which I have as tmpfs here but which otherwise you
66 may well want to be RAID-0 for speed, /var/tmp, which here is a symlink to
67 my /tmp so it's on tmpfs too, very possibly /usr/src and the linux kernel
68 tree it contains, as RAID-0 is fine for that as it can simply be
69 redownloaded off the net if need be, same with your portage dir,
70 /usr/portage by default tho you can point that elsewhere (maybe to the
71 same partition holding /usr/src, but if you use FEATURES=buildpkg, you
72 probably want your packagedir on something with some redundancy, so not on
73 the same RAID-0) if you want, etc... If you have a system-wide mail setup
74 with multiple users, you may want a separate mail partition as well (if
75 not, part of /home is fine). Desktop users may well find a separate,
76 likely BIG, partition for their media storage is useful, etc... FWIW,
77 the / partition on my ~amd64 workstation with kde4 is 5 gigs (according to
78 df). On my slightly more space constrained 32-bit netbook, it's 4 gigs.
79 Used space on both is ~2.4 gigs, with the various other partitions as
80 mentioned separate, but with everything portage touches on /. (That
81 compares to what appears to be a 1-gig / md3 root in the guide, with /var
82 and /usr on their own partitions/volumes, but they have an 8 gig /usr, a 4
83 gig /var, and a 4 gig /opt, totaling 17 gigs, that's mostly on that 4-5
84 gig /, here.)
85
86 5) The hexidecimal digits you mentioned during the BIOS post process
87 indicate, as you guessed, BIOS POST and config process progress. I wasn't
88 aware that they're documented, but as your board is an Intel and the link
89 you mentioned appears to be Intel documentation for them, it seems in your
90 case they are, which is nice. =:^)
91
92 6) Your BIOS has slightly different SATA choices than mine. Here, I have
93 RAID or JBOD (plain SATA, "just a bunch of disks", as my two choices.
94 JBOD mode would compare to your AHCI, which is what I'd recommend. (Seems
95 Intel wants AHCI to be a standard, thus killing the need for individual
96 SATA controller drivers like the SATA_SIL drivers I run here. That'd be
97 nice, but I don't know how well it's being accepted by others.)
98 Compatibility mode will likely slow things down, and RAID mode would be
99 firmware based RAID, which on Linux would be supported by the device-
100 mapper (as is LVM2). JBOD/SATA/AHCI mode, with md/kernel RAID, is
101 generally considered a better choice than firmware RAID with device-mapper
102 support, well, unless you need MSWormOS RAID compatibility, in which case
103 the firmware/device-mapper mode is probably better as it's more compatible.
104
105 6 cont) So I'd recommend AHCI. However, the on-disk layout may be
106 different between compatibility and AHCI mode, so it's possible the disk
107 won't be readable after switching and you'd need to repartition and
108 reinstall, which you were planning on doing anyway, so no big deal.
109
110
111 OK, now that those are covered... what's wrong with your boot?
112
113 Well, there's two possibilities. Either the BIOS isn't finding grub
114 stage-1, or grub stage-1 is found and loaded, but it can't find stage 1.5
115 or 2, depending on what it needs for your setup. Either way, that's a
116 grub problem. As long as you didn't make the mistake of putting /boot on
117 your LVM, which grub doesn't groke, and since it can pretend md/kernel
118 RAID-1 is an ordinary disk, we really don't need to worry about the md/
119 RAID or LVM until you can at LEAST get to the grub menu/prompt.
120
121 So we have a grub problem. That's what we have to solve first, before we
122 deal with anything else.
123
124 Based on that, here's the relevant excerpt from your post (well, after a
125 bit of a detour I forgot to include in the above, so we'll call this point
126 7):
127
128 > NOTE: THIS INSTALL PUTS EVERYTHING ON RAID1. (/, /boot, everything)
129 > I didn't start out thinking I wanted to do that.
130
131 7) Well, not quite. /boot and / are on RAID-1, yes. But the guide puts
132 the LVM2 physical volume on md4, which is created as RAID-0/striped. I
133 don't really agree with that as striped is fast but has no redundancy.
134 Why you'd put stuff like /home, /usr (including stuff you may well want to
135 keep in /usr/local), /var (including portage's package database in /var/
136 db), and presumably additional partitions as you may create them (media
137 and mail partitions were the examples I mentioned above) on a non-
138 redundant RAID-0, I don't know. That'd be what I wanted on RAID-1, here,
139 to make sure I still had copies of it if any of the disks died.
140
141 7 cont) Actually, given that md/raid is now partitionable (years ago it
142 wasn't, with LVM traditionally layered on top to overcome that), and after
143 some experience of my own with LVM, I decided it wasn't worth the hassle
144 of the extra LVM layer here, and when I redid my system last year, I
145 killed the LVM and just use partitioned md/kernel RAID now. If you want
146 the flexibility of LVM, great, but here, I decided it simply wasn't worth
147 the extra hassle of maintaining it. So I'd recommend NOT using LVM and
148 thus not having to worry about it. But it's your choice.
149
150 OK, now on to the grub issue...
151
152 > So, the first problem is that on the reboot to see if the install
153 > worked the Intel BIOS reports 'no bootable media found'. I am very
154 > unclear how any system boots software RAID1 before software is loaded,
155 > assuming I understand the Gentoo instructions. The instructions I used
156 > to install grub where
157 >
158 > root (hd0,0)
159 > setup (hd0)
160 > root (hd1,0)
161 > setup (hd1)
162 > root (hd2,0)
163 > setup (hd2)
164
165 That /looks/ correct. But particularly with RAID, grub's mapping between
166 BIOS drives, kernel drives and grub drives, sometimes gets mixed up.
167 That's one of the things I always hate touching, since I'm never quite
168 sure if it's going to work or not, or that I'm actually telling it to
169 setup where I think I'm telling it to setup, until I actually test it.
170
171 Do you happen to have a floppy on that machine? If so, probably the most
172 error resistant way to handle it is to install grub to a floppy disk,
173 which unlike thumb drives and possibly CD/DVD drives, has no potential to
174 interfere with the hard drive order as seen by BIOS. Then boot the floppy
175 disk to the grub menu, and run the setup from there.
176
177 One thing I discovered here is that I could only setup one disk at a time,
178 regardless of whether I was doing it from (in Linux, from a floppy grub
179 menu, or from a bootable USB stick grub boot menu). Changing the root
180 would seem to work after the first setup, but the second setup would have
181 some weird error and testing a boot from that disk wouldn't work, so
182 obviously it didn't take.
183
184 But doing it a disk at a time, root (hd0,0) , setup (hd0), reboot (or
185 restart grub if doing it from Linux), root (hd1,0), setup (hd1), reboot...
186 same thing for each additional disk (you have three, I have four). THAT
187 worked.
188
189 However you do it, test them, both with all disks running, and with only
190 one running (turn off or disconnect the others). Having a RAID-1 system
191 and installing grub to all the disks isn't going to do you a lot of good
192 if when one dies, you find that it was the only one that had grub
193 installed correctly!
194
195 There's another alternative that I'd actually recommend instead, however.
196 The problem with a RAID-1 boot, is that if you somehow screw up something
197 while updating /boot, since it's RAID-1, you've screwed it up for all
198 mirrors on that RAID-1. Since RAID-1 is simply mirroring the data across
199 the multiple disks, it can be better to not RAID that partition at all,
200 but to have each disk have its own /boot partition, un-RAIDed, which
201 effectively becomes a /boot and one more (two in your case of three disks,
202 three in my case of four disks, tho here I actually went with two separate
203 RAID-1s instead) /boot backups.
204
205 That solves a couple problems at once. First of all, when you first
206 install, you install to just one, as an ordinary disk, test it, and when
207 it's working and booting, you can copy that install to the others, and do
208 the boot sector grub setup on each one separately, as its own disk, having
209 tested that the first one is working. Then you'd test each of the others
210 as well.
211
212 Second, when you upgrade, especially when you upgrade grub, but also when
213 you upgrade the kernel, you only upgrade the one. If it works, great, you
214 can then upgrade the others. If it fails, no big deal, simply set your
215 BIOS to boot from one of the others instead, and you're back to a known
216 working config, since you had tested it after the /last/ upgrade, and you
217 didn't yet do this upgrade to it since you were just testing this upgrade
218 and it broke before you copied it to your backups.
219
220 So basically, the only difference here as opposed to the guide, is that
221 you don't create /dev/md1, you configure and mount /dev/sda1 as /boot, and
222 when you have your system up and running, /then/ you go back and setup
223 /dev/sdb1 as your backup boot (say /mnt/boot/). And when you get it setup
224 and tested working, then you do the same thing for /dev/sdc1, except that
225 you can use the same /mnt/boot/ backup mount-point when mounting it as
226 well, since presumably you won't need both backups mounted at once.
227
228 Everything else will be the same, and as it was RAID-1/mirrored, you'll
229 have about the same space in each /dev/sd[abc]1 partition as you did in
230 the combined md1.
231
232 As for upgrading the three separate /boot and backups, as I mentioned,
233 when you upgrade grub, DEFINITELY only upgrade one at a time, and test
234 that the upgrade worked and you can boot from it before you touch the
235 others. For kernel upgrades, it doesn't matter too much if the backups
236 are a bit behind, so you don't have to upgrade them for every kernel
237 upgrade. If you run kernel rcs or git-kernels, as I do, I'd suggest
238 upgrading the backups once per kernel release (so from 2.6.32 to 2.6.33,
239 for instance), so the test kernels are only on the working /boot, not its
240 backups, but the backups contain at least one version of the last two
241 release kernels. Pretty much the same if you run upstream stable kernels
242 (so 2.6.33, 2.6.33.1, 2.6.33.2...), or Gentoo -rX kernels. Keep at least
243 on of each of the last two kernels on the backups, tested to boot of
244 course, and only update the working /boot for the stable or -rX bumps.
245
246 If you only upgrade kernels once a kernel release cycle or less (maybe
247 you're still running 2.6.28.x or something), then you probably want to
248 upgrade and test the backups as soon as you've upgraded and tested a new
249 kernel on the working /boot.
250
251 Hope it helps...
252
253 --
254 Duncan - List replies preferred. No HTML msgs.
255 "Every nonfree program has a lord, a master --
256 and if you use the program, he is your master." Richard Stallman

Replies

Subject Author
Re: [gentoo-amd64] Re: RAID1 boot - no bootable media found Mark Knecht <markknecht@×××××.com>