Re: [gentoo-user] Re: initramfs & RAID at boot time - gentoo-user

From:	Mark Knecht <markknecht@×××××.com>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Re: initramfs & RAID at boot time
Date:	Sun, 18 Apr 2010 15:13:42
Message-Id:	`p2h5bdc1c8b1004180813if71503ai178d386457e77eb8@mail.gmail.com`
In Reply to:	Re: [gentoo-user] Re: initramfs & RAID at boot time by Neil Bothwick

1

On Sat, Apr 17, 2010 at 3:01 PM, Neil Bothwick <neil@××××××××××.uk> wrote:

2

> On Sat, 17 Apr 2010 14:36:39 -0700, Mark Knecht wrote:

3

>

4

>> Empirically any way there doesn't seem to be a problem. I built the

5

>> new kernel and it booted normally so I think I'm misinterpreting what

6

>> was written in the Wiki or the Wiki is wrong.

7

>

8

> As long as /boot is not on RAID, or is on RAID1, you don't need an

9

> initrd. I've been booting this system for years with / on RAID1 and

10

> everything else on RAID5.

11

>

12

>

13

> --

14

> Neil Bothwick

15

16

Neil,

17

   Completely agreed, and in fact it's the way I built my new system.

18

/boot is just a partition, / is RAID1 is three partitions marked with

19

0xfd partition type, using metadata=0.90 and assembled by the kernel.

20

I'm using WD RAID Edition drives and an Asus Rampage II Extreme

21

motherboard.

22

23

   It works, however I'm running into the sort of thing I ran into

24

this morning when booting - both md5 and md6 have problems this

25

morning. Random partitions get dropped out. It's never the same ones,

26

and it's sometimes only 1 partition out of three on the same drive -

27

sdc5 and sdc6 aren't found until I reboot, but sda3, sdb3 & sdc3 were.

28

Flakey hardware? What? The motherboard? The drives?

29

30

   I've noticed the entering the BIOS setup screens before allowing

31

grub to take over seems to eliminate the problem. Timing?

32

33

mark@c2stable ~ $ cat /proc/mdstat

34

Personalities : [raid0] [raid1]

35

md6 : active raid1 sda6[0] sdb6[1]

36

      247416933 blocks super 1.1 [3/2] [UU_]

37

38

md11 : active raid0 sdd1[0] sde1[1]

39

      104871936 blocks super 1.1 512k chunks

40

41

md3 : active raid1 sdc3[2] sdb3[1] sda3[0]

42

      52436096 blocks [3/3] [UUU]

43

44

md5 : active raid1 sdb5[1] sda5[0]

45

      52436032 blocks [3/2] [UU_]

46

47

unused devices: <none>

48

mark@c2stable ~ $

49

50

   For clarity, md3 is the only one needed to boot the system. The

51

other three RAIDs aren't required until I start running apps. However

52

they are all being assembled by the kernel at boot time and I would

53

prefer not to do that, or at least learn how not to do it.

54

55

   Now, as to why they are being assembled I suspect it's because I

56

marked them all with partition type 0xfd when possibly it's not the

57

best thing to have done. The kernel won't bother with non-0xfd

58

partitions and then mdadm could have done it later:

59

60

c2stable ~ # fdisk -l /dev/sda

61

62

Disk /dev/sda: 500.1 GB, 500107862016 bytes

63

255 heads, 63 sectors/track, 60801 cylinders

64

Units = cylinders of 16065 * 512 = 8225280 bytes

65

Disk identifier: 0x8b45be24

66

67

   Device Boot      Start         End      Blocks   Id  System

68

/dev/sda1   *           1           7       56196   83  Linux

69

/dev/sda2               8         530     4200997+  82  Linux swap / Solaris

70

/dev/sda3             536        7063    52436160   fd  Linux raid autodetect

71

/dev/sda4            7064       60801   431650485    5  Extended

72

/dev/sda5            7064       13591    52436128+  fd  Linux raid autodetect

73

/dev/sda6           30000       60801   247417065   fd  Linux raid autodetect

74

c2stable ~ #

75

76

However the Gentoo Wiki says we are supposed to mark everything 0xfd:

77

78

http://en.gentoo-wiki.com/wiki/RAID/Software#Setup_Partitions

79

80

I'm not sure that we good advice or not for RAIDs that could be

81

assembled later but that's what I did and it leads to the kernel

82

trying to do everything before the system is totally up and mdadm is

83

really running.

84

85

   Anyway, the failures happen, so I can step through and fail, remove

86

and add the partition back to the array. (In this case fail and remove

87

aren't necessary)

88

89

c2stable ~ # mdadm /dev/md5 -f /dev/sdc5

90

mdadm: set device faulty failed for /dev/sdc5:  No such device

91

c2stable ~ # mdadm /dev/md5 -r /dev/sdc5

92

mdadm: hot remove failed for /dev/sdc5: No such device or address

93

c2stable ~ # mdadm /dev/md5 -a /dev/sdc5

94

mdadm: re-added /dev/sdc5

95

c2stable ~ # mdadm /dev/md6 -a /dev/sdc6

96

mdadm: re-added /dev/sdc6

97

c2stable ~ #

98

99

At this point md5 is repaired and I'm waiting for md6

100

101

c2stable ~ # cat /proc/mdstat

102

Personalities : [raid0] [raid1]

103

md6 : active raid1 sdc6[2] sda6[0] sdb6[1]

104

      247416933 blocks super 1.1 [3/2] [UU_]

105

      [====>................]  recovery = 22.0% (54525440/247416933)

106

finish=38.1min speed=84230K/sec

107

108

md11 : active raid0 sdd1[0] sde1[1]

109

      104871936 blocks super 1.1 512k chunks

110

111

md3 : active raid1 sdc3[2] sdb3[1] sda3[0]

112

      52436096 blocks [3/3] [UUU]

113

114

md5 : active raid1 sdc5[2] sdb5[1] sda5[0]

115

      52436032 blocks [3/3] [UUU]

116

117

unused devices: <none>

118

c2stable ~ #c2stable ~ # cat /proc/mdstat

119

Personalities : [raid0] [raid1]

120

md6 : active raid1 sdc6[2] sda6[0] sdb6[1]

121

      247416933 blocks super 1.1 [3/2] [UU_]

122

      [====>................]  recovery = 22.0% (54525440/247416933)

123

finish=38.1min speed=84230K/sec

124

125

md11 : active raid0 sdd1[0] sde1[1]

126

      104871936 blocks super 1.1 512k chunks

127

128

md3 : active raid1 sdc3[2] sdb3[1] sda3[0]

129

      52436096 blocks [3/3] [UUU]

130

131

md5 : active raid1 sdc5[2] sdb5[1] sda5[0]

132

      52436032 blocks [3/3] [UUU]

133

134

unused devices: <none>

135

c2stable ~ #

136

137

   How do I get past this? It's happening 2-3 times a week! I'm

138

figuring if the kernel doesn't auto-assemble the RAIDs that I don't

139

need assembled then I can somehow check that all the partitions are

140

ready to go before I start them up. This exercise this morning will

141

have taken an hour before I can start using the machine.

142

143

- Mark

144

145

- Mark

Gentoo Archives: gentoo-user

Replies

1	On Sat, Apr 17, 2010 at 3:01 PM, Neil Bothwick <neil@××××××××××.uk> wrote:
2	> On Sat, 17 Apr 2010 14:36:39 -0700, Mark Knecht wrote:
3	>
4	>> Empirically any way there doesn't seem to be a problem. I built the
5	>> new kernel and it booted normally so I think I'm misinterpreting what
6	>> was written in the Wiki or the Wiki is wrong.
7	>
8	> As long as /boot is not on RAID, or is on RAID1, you don't need an
9	> initrd. I've been booting this system for years with / on RAID1 and
10	> everything else on RAID5.
11	>
12	>
13	> --
14	> Neil Bothwick
15
16	Neil,
17	Completely agreed, and in fact it's the way I built my new system.
18	/boot is just a partition, / is RAID1 is three partitions marked with
19	0xfd partition type, using metadata=0.90 and assembled by the kernel.
20	I'm using WD RAID Edition drives and an Asus Rampage II Extreme
21	motherboard.
22
23	It works, however I'm running into the sort of thing I ran into
24	this morning when booting - both md5 and md6 have problems this
25	morning. Random partitions get dropped out. It's never the same ones,
26	and it's sometimes only 1 partition out of three on the same drive -
27	sdc5 and sdc6 aren't found until I reboot, but sda3, sdb3 & sdc3 were.
28	Flakey hardware? What? The motherboard? The drives?
29
30	I've noticed the entering the BIOS setup screens before allowing
31	grub to take over seems to eliminate the problem. Timing?
32
33	mark@c2stable ~ $ cat /proc/mdstat
34	Personalities : [raid0] [raid1]
35	md6 : active raid1 sda6[0] sdb6[1]
36	247416933 blocks super 1.1 [3/2] [UU_]
37
38	md11 : active raid0 sdd1[0] sde1[1]
39	104871936 blocks super 1.1 512k chunks
40
41	md3 : active raid1 sdc3[2] sdb3[1] sda3[0]
42	52436096 blocks [3/3] [UUU]
43
44	md5 : active raid1 sdb5[1] sda5[0]
45	52436032 blocks [3/2] [UU_]
46
47	unused devices: <none>
48	mark@c2stable ~ $
49
50	For clarity, md3 is the only one needed to boot the system. The
51	other three RAIDs aren't required until I start running apps. However
52	they are all being assembled by the kernel at boot time and I would
53	prefer not to do that, or at least learn how not to do it.
54
55	Now, as to why they are being assembled I suspect it's because I
56	marked them all with partition type 0xfd when possibly it's not the
57	best thing to have done. The kernel won't bother with non-0xfd
58	partitions and then mdadm could have done it later:
59
60	c2stable ~ # fdisk -l /dev/sda
61
62	Disk /dev/sda: 500.1 GB, 500107862016 bytes
63	255 heads, 63 sectors/track, 60801 cylinders
64	Units = cylinders of 16065 * 512 = 8225280 bytes
65	Disk identifier: 0x8b45be24
66
67	Device Boot Start End Blocks Id System
68	/dev/sda1 * 1 7 56196 83 Linux
69	/dev/sda2 8 530 4200997+ 82 Linux swap / Solaris
70	/dev/sda3 536 7063 52436160 fd Linux raid autodetect
71	/dev/sda4 7064 60801 431650485 5 Extended
72	/dev/sda5 7064 13591 52436128+ fd Linux raid autodetect
73	/dev/sda6 30000 60801 247417065 fd Linux raid autodetect
74	c2stable ~ #
75
76	However the Gentoo Wiki says we are supposed to mark everything 0xfd:
77
78	http://en.gentoo-wiki.com/wiki/RAID/Software#Setup_Partitions
79
80	I'm not sure that we good advice or not for RAIDs that could be
81	assembled later but that's what I did and it leads to the kernel
82	trying to do everything before the system is totally up and mdadm is
83	really running.
84
85	Anyway, the failures happen, so I can step through and fail, remove
86	and add the partition back to the array. (In this case fail and remove
87	aren't necessary)
88
89	c2stable ~ # mdadm /dev/md5 -f /dev/sdc5
90	mdadm: set device faulty failed for /dev/sdc5: No such device
91	c2stable ~ # mdadm /dev/md5 -r /dev/sdc5
92	mdadm: hot remove failed for /dev/sdc5: No such device or address
93	c2stable ~ # mdadm /dev/md5 -a /dev/sdc5
94	mdadm: re-added /dev/sdc5
95	c2stable ~ # mdadm /dev/md6 -a /dev/sdc6
96	mdadm: re-added /dev/sdc6
97	c2stable ~ #
98
99	At this point md5 is repaired and I'm waiting for md6
100
101	c2stable ~ # cat /proc/mdstat
102	Personalities : [raid0] [raid1]
103	md6 : active raid1 sdc6[2] sda6[0] sdb6[1]
104	247416933 blocks super 1.1 [3/2] [UU_]
105	[====>................] recovery = 22.0% (54525440/247416933)
106	finish=38.1min speed=84230K/sec
107
108	md11 : active raid0 sdd1[0] sde1[1]
109	104871936 blocks super 1.1 512k chunks
110
111	md3 : active raid1 sdc3[2] sdb3[1] sda3[0]
112	52436096 blocks [3/3] [UUU]
113
114	md5 : active raid1 sdc5[2] sdb5[1] sda5[0]
115	52436032 blocks [3/3] [UUU]
116
117	unused devices: <none>
118	c2stable ~ #c2stable ~ # cat /proc/mdstat
119	Personalities : [raid0] [raid1]
120	md6 : active raid1 sdc6[2] sda6[0] sdb6[1]
121	247416933 blocks super 1.1 [3/2] [UU_]
122	[====>................] recovery = 22.0% (54525440/247416933)
123	finish=38.1min speed=84230K/sec
124
125	md11 : active raid0 sdd1[0] sde1[1]
126	104871936 blocks super 1.1 512k chunks
127
128	md3 : active raid1 sdc3[2] sdb3[1] sda3[0]
129	52436096 blocks [3/3] [UUU]
130
131	md5 : active raid1 sdc5[2] sdb5[1] sda5[0]
132	52436032 blocks [3/3] [UUU]
133
134	unused devices: <none>
135	c2stable ~ #
136
137	How do I get past this? It's happening 2-3 times a week! I'm
138	figuring if the kernel doesn't auto-assemble the RAIDs that I don't
139	need assembled then I can somehow check that all the partitions are
140	ready to go before I start them up. This exercise this morning will
141	have taken an hour before I can start using the machine.
142
143	- Mark
144
145	- Mark