Gentoo Archives: gentoo-amd64

From: Mark Knecht <markknecht@×××××.com>
To: gentoo-amd64@l.g.o
Subject: Re: [gentoo-amd64] Re: RAID1 boot - no bootable media found
Date: Thu, 01 Apr 2010 19:02:40
Message-Id: o2q5bdc1c8b1004011157if9fb419ey3a777f4fd3743c46@mail.gmail.com
In Reply to: [gentoo-amd64] Re: RAID1 boot - no bootable media found by Duncan <1i5t5.duncan@cox.net>
1 A bit long in response. Sorry.
2
3 On Tue, Mar 30, 2010 at 11:56 PM, Duncan <1i5t5.duncan@×××.net> wrote:
4 > Mark Knecht posted on Tue, 30 Mar 2010 13:26:59 -0700 as excerpted:
5 >
6 >> I've set up a duplicate boot partition on sdb and it boots. However one
7 >> thing I'm unsure if when I change the hard drive boot does the old sdb
8 >> become the new sda because it's what got booted? Or is the order still
9 >> as it was? The answer determines what I do in grub.conf as to which
10 >> drive I'm trying to use. I can figure this out later by putting
11 >> something different on each drive and looking. Might be system/BIOS
12 >> dependent.
13 >
14 > That depends on your BIOS.  My current system (the workstation, now 6+
15 > years old but still going strong as it was a $400+ server grade mobo) will
16 > boot from whatever disk I tell it to, but keeps the same BIOS disk order
17 > regardless -- unless I physically turn one or more of them off, of
18 > course.  My previous system would always switch the chosen boot drive to
19 > be the first one.  (I suppose it could be IDE vs. SATA as well, as the
20 > switcher was IDE, the stable one is SATA-1.)
21 >
22 > So that's something I guess you figure out for yourself.  But it sounds
23 > like you're already well on your way...
24 >
25
26 It seems to be constant mapping meaning (I guess) that I need to
27 change the drive specs in grub.conf on the second drive to actually
28 use the second drive.
29
30 I made the titles for booting different for each grub.conf file to
31 ensure I was really getting grub from the second drive. My sda grub
32 boot menu says "2.6.33-gentoo booting from sda" on the first drive,
33 sdb on the second drive, etc.
34
35 <SNIP>
36 >
37 > The point being... it /is/ actually possible to verify that they're
38 > working well before you fdisk/mkfs and load data.  Tho it does take
39 > awhile... days... on drives of modern size.
40 >
41
42 I'm trying badblocks right now on sdc. using
43
44 badblocks -v /dev/sdc
45
46 Maybe I need to do something more strenuous? It looks like it will be
47 done an an hour or two. (i7-920 with SATA drives so it should be fast,
48 as long as I'm not just reading the buffers or something like that.
49
50 Roughly speaking 1TB read at 100MB/S should take 10,000 seconds or 2.7
51 hours. I'm at 18% after 28 minutes so that seems about right. (With no
52 errors so far assuming I'm using the right command)
53
54 >>> 3) suspend the disks after a period of inactivity
55 >>
56 >> This could be part of what's going on, but I don't think it's the whole
57 >> story. My drives (WD Green 1TB drives) apparently park the heads after 8
58 >> seconds (yes 8 seconds!) of inactivity to save power. Each time it parks
59 >> it increments the Load_Cycle_Count SMART parameter. I've been tracking
60 >> this on the three drives in the system. The one I'm currently using is
61 >> incrementing while the 2 that sit unused until I get RAID going again
62 >> are not. Possibly there is something about how these drives come out of
63 >> park that creates large delays once in awhile.
64 >
65 > You may wish to take a second look at that, for an entirely /different/
66 > reason.  If those are the ones I just googled on the WD site, they're
67 > rated 300K load/unload cycles.  Take a look at your BIOS spin-down
68 > settings, and use hdparm to get a look at the disk's powersaving and
69 > spindown settings.  You may wish to set the disks to something more
70 > reasonable, as with 8 second timeouts, that 300k cycles isn't going to
71 > last so long...
72
73 Very true. Here is the same drive model I put in a new machine for my
74 dad. It's been powered up and running Gentoo as a typical desktop
75 machine for about 50 days. He doesn't use it more than about an hour a
76 day on average. It's already hit 31K load/unload cycles. At 10% of
77 300K that about 1.5 years of life before I hit that spec. I've watched
78 his system a bit and his system seems to add 1 to the count almost
79 exactly every 2 minutes on average. Is that a common cron job maybe?
80
81 I looked up the spec on all three WD lines - Green, Blue and Black.
82 All three were 300K cycles. This issue has come up on the RAID list.
83 It seems that some other people are seeing this and aren't exactly
84 sure what Linux is doing to cause this.
85
86 I'll study hdparm and BIOS when I can reboot.
87
88 My dad's current data:
89
90 gandalf ~ # smartctl -A /dev/sda
91 smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
92 Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
93
94 === START OF READ SMART DATA SECTION ===
95 SMART Attributes Data Structure revision number: 16
96 Vendor Specific SMART Attributes with Thresholds:
97 ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
98 UPDATED WHEN_FAILED RAW_VALUE
99 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
100 Always - 0
101 3 Spin_Up_Time 0x0027 129 128 021 Pre-fail
102 Always - 6525
103 4 Start_Stop_Count 0x0032 100 100 000 Old_age
104 Always - 21
105 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
106 Always - 0
107 7 Seek_Error_Rate 0x002e 200 200 000 Old_age
108 Always - 0
109 9 Power_On_Hours 0x0032 099 099 000 Old_age
110 Always - 1183
111 10 Spin_Retry_Count 0x0032 100 253 000 Old_age
112 Always - 0
113 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
114 Always - 0
115 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
116 Always - 20
117 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age
118 Always - 5
119 193 Load_Cycle_Count 0x0032 190 190 000 Old_age
120 Always - 31240
121 194 Temperature_Celsius 0x0022 121 116 000 Old_age
122 Always - 26
123 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age
124 Always - 0
125 197 Current_Pending_Sector 0x0032 200 200 000 Old_age
126 Always - 0
127 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
128 Offline - 0
129 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age
130 Always - 0
131 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
132 Offline - 0
133
134 gandalf ~ #
135
136
137 >
138 > You may recall a couple years ago when Ubuntu accidentally shipped with
139 > laptop mode (or something, IDR the details) turned on by default, and
140 > people were watching their drives wear out before their eyes.  That's
141 > effectively what you're doing, with an 8-second idle timeout.  Most laptop
142 > drives (2.5" and 1.8") are designed for it.  Most 3.5" desktop/server
143 > drives are NOT designed for that tight an idle timeout spec, and in fact,
144 > may well last longer spinning at idle overnight, as opposed to shutting
145 > down every day even.
146 >
147 > I'd at least look into it, as there's no use wearing the things out
148 > unnecessarily.  Maybe you'll decide to let them run that way and save the
149 > power, but you'll know about the available choices then, at least.
150 >
151
152 Yeah, that's important. Thanks. If I can solve all these RAID problems
153 then maybe I'll look at adding RAID to his box with better drives or
154 something.
155
156 Note that on my system only I'm seeing real problems in
157 /var/log/message, non-RAID, like 1000's of these:
158
159 Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 45276264 on sda3
160 Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46309336 on sda3
161 Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46567488 on sda3
162 Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46567680 on sda3
163
164 or
165
166 Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555752 on sda3
167 Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555760 on sda3
168 Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555768 on sda3
169 Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555776 on sda3
170
171
172 However I see NONE of that on my dad's machine using the same drive
173 but different chipset.
174
175 The above problems seem to result in this sort of problem when I try
176 going with RAID as I tried again this monring:
177
178 INFO: task kjournald:5064 blocked for more than 120 seconds.
179 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
180 kjournald D ffff880028351580 0 5064 2 0x00000000
181 ffff8801ac91a190 0000000000000046 0000000000000000 ffffffff81067110
182 000000000000dcf8 ffff880180863fd8 0000000000011580 0000000000011580
183 ffff88014165ba20 ffff8801ac89a834 ffff8801af920150 ffff8801ac91a418
184 Call Trace:
185 [<ffffffff81067110>] ? __alloc_pages_nodemask+0xfa/0x58c
186 [<ffffffff8129174a>] ? md_make_request+0xde/0x119
187 [<ffffffff810a9576>] ? sync_buffer+0x0/0x40
188 [<ffffffff81334305>] ? io_schedule+0x3e/0x54
189 [<ffffffff810a95b1>] ? sync_buffer+0x3b/0x40
190 [<ffffffff81334789>] ? __wait_on_bit+0x41/0x70
191 [<ffffffff810a9576>] ? sync_buffer+0x0/0x40
192 [<ffffffff81334823>] ? out_of_line_wait_on_bit+0x6b/0x77
193 [<ffffffff81040a66>] ? wake_bit_function+0x0/0x23
194 [<ffffffff8111f400>] ? journal_commit_transaction+0xb56/0x1112
195 [<ffffffff81334280>] ? schedule+0x8f4/0x93b
196 [<ffffffff81335e3d>] ? _raw_spin_lock_irqsave+0x18/0x34
197 [<ffffffff81040a38>] ? autoremove_wake_function+0x0/0x2e
198 [<ffffffff81335bcc>] ? _raw_spin_unlock_irqrestore+0x12/0x2c
199 [<ffffffff8112278c>] ? kjournald+0xe2/0x20a
200 [<ffffffff81040a38>] ? autoremove_wake_function+0x0/0x2e
201 [<ffffffff811226aa>] ? kjournald+0x0/0x20a
202 [<ffffffff81040665>] ? kthread+0x79/0x81
203 [<ffffffff81002c94>] ? kernel_thread_helper+0x4/0x10
204 [<ffffffff810405ec>] ? kthread+0x0/0x81
205 [<ffffffff81002c90>] ? kernel_thread_helper+0x0/0x10
206 Thanks,
207 Mark

Replies

Subject Author
[gentoo-amd64] Re: RAID1 boot - no bootable media found Duncan <1i5t5.duncan@×××.net>