Gentoo Archives: gentoo-user

From: Kerin Millar <kerframil@×××××××××××.uk>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Software RAID-1
Date: Tue, 26 Aug 2014 13:21:35
Message-Id: 53FC89CF.3060902@fastmail.co.uk
In Reply to: Re: [gentoo-user] Software RAID-1 by Peter Humphrey
1 On 26/08/2014 10:38, Peter Humphrey wrote:
2 > On Monday 25 August 2014 18:46:23 Kerin Millar wrote:
3 >> On 25/08/2014 17:51, Peter Humphrey wrote:
4 >>> On Monday 25 August 2014 13:35:11 Kerin Millar wrote:
5 >>>> I now wonder if this is a race condition between the init script running
6 >>>> `mdadm -As` and the fact that the mdadm package installs udev rules that
7 >>>> allow for automatic incremental assembly?
8 >>>
9 >>> Isn't it just that, with the kernel auto-assembly of the root partition,
10 >>> and udev rules having assembled the rest, all the work's been done by the
11 >>> time the mdraid init script is called? I had wondered about the time that
12 >>> udev startup takes; assembling the raids would account for it.
13 >>
14 >> Yes, it's a possibility and would constitute a race condition - even
15 >> though it might ultimately be a harmless one.
16 >
17 > I thought a race involved the competitors setting off at more-or-less the same
18 > time, not one waiting until the other had finished. No matter.
19
20 The mdraid script can assemble arrays and runs at a particular point in
21 the boot sequence. The udev rules can also assemble arrays and, being
22 event-driven, I suspect that they are likely to prevail. The point is
23 that both the sequence and timing of these two mechanisms is not
24 deterministic. There is definitely the potential for a race condition. I
25 just don't yet know whether it is a harmful race condition.
26
27 >
28 >> As touched upon in the preceding post, I'd really like to know why mdadm
29 >> sees fit to return a non-zero exit code given that the arrays are actually
30 >> assembled successfully.
31 >
32 > I can see why a dev might think "I haven't managed to do my job" here.
33
34 It may be that mdadm returns different non-zero exit codes depending on
35 the exact circumstances. It does have this characteristic for certain
36 other operations (such as -t --detail).
37
38 >
39 >> After all, even if the arrays are assembled at the point that mdadm is
40 >> executed by the mdraid init script, partially or fully, it surely ought
41 >> not to matter. As long as the arrays are fully assembled by the time
42 >> mdadm exits, it should return 0 to signify success. Nothing else makes
43 >> sense, in my opinion. It's absurd that the mdraid script is drawn into
44 >> printing a blank error message where nothing has gone wrong.
45 >
46 > I agree, that is absurd.
47 >
48 >> Further, the mdadm ebuild still prints elog messages stating that mdraid
49 >> is a requirement for the boot runlevel but, with udev rules, I don't see
50 >> how that can be true. With udev being event-driven and calling mdadm
51 >> upon the introduction of a new device, the array should be up and
52 >> running as of the very moment that all the disks are seen, no matter
53 >> whether the mdraid init script is executed or not.
54 >
55 > We agree again. The question is what to do about it. Maybe a bug report
56 > against mdadm?
57
58 Definitely. Again, can you find out what the exit status is under the
59 circumstances that mdadm produces a blank error? I am hoping it is
60 something other than 1. If so, solving this problem might be as simple
61 as having the mdraid script consider only a specific non-zero value to
62 indicate an intractable error.
63
64 There is also the matter of whether it makes sense to explicitly
65 assemble the arrays in the script where udev rules are already doing the
66 job. However, I think this would require further investigation before
67 considering making a bug of it.
68
69 >
70 > --->8
71 >
72 >>> Right. Here's the position:
73 >>> 1. I've left /etc/init.d/mdraid out of all run levels. I have nothing but
74 >>> comments in mdadm.conf, but then it's not likely to be read anyway if the
75 >>> init script isn't running.
76 >>> 2. I have empty /etc/udev rules files as above.
77 >>> 3. I have kernel auto-assembly of raid enabled.
78 >>> 4. I don't use an init ram disk.
79 >>> 5. The root partition is on /dev/md5 (0.99 metadata)
80 >>> 6. All other partitions except /boot are under /dev/vg7 which is built on
81 >>> top of /dev/md7 (1.x metadata).
82 >>> 7. The system boots normally.
83 >>
84 >> I must confess that this boggles my mind. Under these circumstances, I
85 >> cannot fathom how - or when - the 1.x arrays are being assembled.
86 >> Something has to be executing mdadm at some point.
87 >
88 > I think it's udev. I had a look at the rules, but I no grok. I do see
89 > references to mdadm though.
90
91 So would I, only you said in step 2 that you have "empty" rules, which I
92 take to mean that you had overridden the mdadm-provided udev rules with
93 empty files. If all of the conditions you describe were true, you would
94 have eliminated all three of the aformentioned contexts in which mdadm
95 can be invoked. Given that mdadm is needed to assemble your 1.x arrays
96 (see below), I would expect such conditions to result in mount errors on
97 account of the missing arrays.
98
99 >
100 >>> Do I even need sys-fs/mdadm installed? Maybe
101 >>> I'll try removing it. I have a little rescue system in the same box, so
102 >>> it'd be easy to put it back if necessary.
103 >>
104 >> Yes, you need mdadm because 1.x metadata arrays must be assembled in
105 >> userspace.
106 >
107 > I realised after writing that that I may well need it for maintenance. I'd do
108 > that from my rescue system though, which does have it installed, so I think I
109 > can ditch it from the main system.
110
111 Again, 1.x arrays must be assembled in userspace. The kernel cannot
112 assemble them by itself as it can with 0.9x arrays. If you uninstall
113 mdadm, you will be removing the very userspace tool that is employed for
114 assembly. Neither udev nor mdraid will be able to execute it, which
115 cannot end well.
116
117 It's a different matter when using an initramfs, because it will bundle
118 and make use of its own copy of mdadm.
119
120 --Kerin

Replies

Subject Author
Re: [gentoo-user] Software RAID-1 Peter Humphrey <peter@××××××××××××.uk>