1 |
I rebooted to upgrade to kernel 3.4.1. I accidentally had the |
2 |
combination of uvesafb, nouveau kms and nvidia-drivers enabled, which |
3 |
caused my system to go blank after rebooting. I was not able to SSH |
4 |
into the machine, so I did the magic-sysrq REISUB to reboot into my |
5 |
previous kernel. When it booted into the previous kernel (3.3.5), I |
6 |
saw a whole bunch of "I/O error" messages scrolling by, for every disk |
7 |
in my RAID array. I have never seen these errors before. I hoped it |
8 |
was just some module confusion because I was booting a different |
9 |
kernel. I was able to boot into my root filesystem, but the raid did |
10 |
not assemble. After blacklisting nouveau and rebooting into 3.4.1, |
11 |
there were none of the I/O errors mentioned, but mdraid failed with |
12 |
this message: |
13 |
|
14 |
* Starting up RAID devices ... |
15 |
* mdadm main: failed to get exclusive lock on mapfile |
16 |
mdadm: /dev/md2 is already in use. |
17 |
mdadm: /dev/md1 is already in use. |
18 |
[ !! ] |
19 |
|
20 |
Oh no! Heart beating quickly... terabytes of data... Google finds |
21 |
nothing useful with these messages. |
22 |
|
23 |
My mdadm.conf has not changed, no physical disks have been added or |
24 |
removed in over a year. mdadm configuration has not changed at all. I |
25 |
have of course updated hundreds of packages since my last reboot, |
26 |
including mdadm. |
27 |
|
28 |
From the /proc/mdstat it shows that it's not detecting all of the |
29 |
member disks/partitions: |
30 |
|
31 |
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] |
32 |
[raid4] [multipath] [faulty] |
33 |
md1 : inactive sdb1[0](S) |
34 |
1048575868 blocks super 1.1 |
35 |
|
36 |
md2 : inactive sdf2[5](S) |
37 |
904938415 blocks super 1.1 |
38 |
|
39 |
unused devices: <none> |
40 |
|
41 |
|
42 |
Those normally included all disks in sdb through sdf, partition 1 and |
43 |
2 from each disk. |
44 |
|
45 |
My mdadm.conf has always had only two ARRAY lines (for /dev/md1 and |
46 |
/dev/md2) with the UUID of the arrays. Previously the member disks |
47 |
were always automatically detected and assembled when I booted and |
48 |
started mdadm. Running mdadm --query --examine on the partitions |
49 |
showed they did still contain the valid raid information. So I felt |
50 |
confident in trying to reassemble it. |
51 |
|
52 |
To fix, I did: |
53 |
|
54 |
/etc/init.d/mdraid stop |
55 |
|
56 |
to stop the array (could have also done "mdadm -Ss", which is what the |
57 |
stop script did) |
58 |
|
59 |
Then I edited mdadm.conf and added a device line: |
60 |
|
61 |
DEVICE /dev/sd[bcdef][12] |
62 |
|
63 |
So now I am telling it specifically where to look. I then restarted mdraid: |
64 |
|
65 |
/etc/init.d/mdraid start |
66 |
|
67 |
et voilà! my raid was back and functioning. I don't know if this is a |
68 |
result of a change in kernel or mdadm behavior, or simply a result of |
69 |
my REISUB that left the raid in a strange state. |