1 |
A bit long in response. Sorry. |
2 |
|
3 |
On Tue, Mar 30, 2010 at 11:56 PM, Duncan <1i5t5.duncan@×××.net> wrote: |
4 |
> Mark Knecht posted on Tue, 30 Mar 2010 13:26:59 -0700 as excerpted: |
5 |
> |
6 |
>> I've set up a duplicate boot partition on sdb and it boots. However one |
7 |
>> thing I'm unsure if when I change the hard drive boot does the old sdb |
8 |
>> become the new sda because it's what got booted? Or is the order still |
9 |
>> as it was? The answer determines what I do in grub.conf as to which |
10 |
>> drive I'm trying to use. I can figure this out later by putting |
11 |
>> something different on each drive and looking. Might be system/BIOS |
12 |
>> dependent. |
13 |
> |
14 |
> That depends on your BIOS. My current system (the workstation, now 6+ |
15 |
> years old but still going strong as it was a $400+ server grade mobo) will |
16 |
> boot from whatever disk I tell it to, but keeps the same BIOS disk order |
17 |
> regardless -- unless I physically turn one or more of them off, of |
18 |
> course. My previous system would always switch the chosen boot drive to |
19 |
> be the first one. (I suppose it could be IDE vs. SATA as well, as the |
20 |
> switcher was IDE, the stable one is SATA-1.) |
21 |
> |
22 |
> So that's something I guess you figure out for yourself. But it sounds |
23 |
> like you're already well on your way... |
24 |
> |
25 |
|
26 |
It seems to be constant mapping meaning (I guess) that I need to |
27 |
change the drive specs in grub.conf on the second drive to actually |
28 |
use the second drive. |
29 |
|
30 |
I made the titles for booting different for each grub.conf file to |
31 |
ensure I was really getting grub from the second drive. My sda grub |
32 |
boot menu says "2.6.33-gentoo booting from sda" on the first drive, |
33 |
sdb on the second drive, etc. |
34 |
|
35 |
<SNIP> |
36 |
> |
37 |
> The point being... it /is/ actually possible to verify that they're |
38 |
> working well before you fdisk/mkfs and load data. Tho it does take |
39 |
> awhile... days... on drives of modern size. |
40 |
> |
41 |
|
42 |
I'm trying badblocks right now on sdc. using |
43 |
|
44 |
badblocks -v /dev/sdc |
45 |
|
46 |
Maybe I need to do something more strenuous? It looks like it will be |
47 |
done an an hour or two. (i7-920 with SATA drives so it should be fast, |
48 |
as long as I'm not just reading the buffers or something like that. |
49 |
|
50 |
Roughly speaking 1TB read at 100MB/S should take 10,000 seconds or 2.7 |
51 |
hours. I'm at 18% after 28 minutes so that seems about right. (With no |
52 |
errors so far assuming I'm using the right command) |
53 |
|
54 |
>>> 3) suspend the disks after a period of inactivity |
55 |
>> |
56 |
>> This could be part of what's going on, but I don't think it's the whole |
57 |
>> story. My drives (WD Green 1TB drives) apparently park the heads after 8 |
58 |
>> seconds (yes 8 seconds!) of inactivity to save power. Each time it parks |
59 |
>> it increments the Load_Cycle_Count SMART parameter. I've been tracking |
60 |
>> this on the three drives in the system. The one I'm currently using is |
61 |
>> incrementing while the 2 that sit unused until I get RAID going again |
62 |
>> are not. Possibly there is something about how these drives come out of |
63 |
>> park that creates large delays once in awhile. |
64 |
> |
65 |
> You may wish to take a second look at that, for an entirely /different/ |
66 |
> reason. If those are the ones I just googled on the WD site, they're |
67 |
> rated 300K load/unload cycles. Take a look at your BIOS spin-down |
68 |
> settings, and use hdparm to get a look at the disk's powersaving and |
69 |
> spindown settings. You may wish to set the disks to something more |
70 |
> reasonable, as with 8 second timeouts, that 300k cycles isn't going to |
71 |
> last so long... |
72 |
|
73 |
Very true. Here is the same drive model I put in a new machine for my |
74 |
dad. It's been powered up and running Gentoo as a typical desktop |
75 |
machine for about 50 days. He doesn't use it more than about an hour a |
76 |
day on average. It's already hit 31K load/unload cycles. At 10% of |
77 |
300K that about 1.5 years of life before I hit that spec. I've watched |
78 |
his system a bit and his system seems to add 1 to the count almost |
79 |
exactly every 2 minutes on average. Is that a common cron job maybe? |
80 |
|
81 |
I looked up the spec on all three WD lines - Green, Blue and Black. |
82 |
All three were 300K cycles. This issue has come up on the RAID list. |
83 |
It seems that some other people are seeing this and aren't exactly |
84 |
sure what Linux is doing to cause this. |
85 |
|
86 |
I'll study hdparm and BIOS when I can reboot. |
87 |
|
88 |
My dad's current data: |
89 |
|
90 |
gandalf ~ # smartctl -A /dev/sda |
91 |
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build) |
92 |
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net |
93 |
|
94 |
=== START OF READ SMART DATA SECTION === |
95 |
SMART Attributes Data Structure revision number: 16 |
96 |
Vendor Specific SMART Attributes with Thresholds: |
97 |
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE |
98 |
UPDATED WHEN_FAILED RAW_VALUE |
99 |
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail |
100 |
Always - 0 |
101 |
3 Spin_Up_Time 0x0027 129 128 021 Pre-fail |
102 |
Always - 6525 |
103 |
4 Start_Stop_Count 0x0032 100 100 000 Old_age |
104 |
Always - 21 |
105 |
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail |
106 |
Always - 0 |
107 |
7 Seek_Error_Rate 0x002e 200 200 000 Old_age |
108 |
Always - 0 |
109 |
9 Power_On_Hours 0x0032 099 099 000 Old_age |
110 |
Always - 1183 |
111 |
10 Spin_Retry_Count 0x0032 100 253 000 Old_age |
112 |
Always - 0 |
113 |
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age |
114 |
Always - 0 |
115 |
12 Power_Cycle_Count 0x0032 100 100 000 Old_age |
116 |
Always - 20 |
117 |
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age |
118 |
Always - 5 |
119 |
193 Load_Cycle_Count 0x0032 190 190 000 Old_age |
120 |
Always - 31240 |
121 |
194 Temperature_Celsius 0x0022 121 116 000 Old_age |
122 |
Always - 26 |
123 |
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age |
124 |
Always - 0 |
125 |
197 Current_Pending_Sector 0x0032 200 200 000 Old_age |
126 |
Always - 0 |
127 |
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age |
128 |
Offline - 0 |
129 |
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age |
130 |
Always - 0 |
131 |
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age |
132 |
Offline - 0 |
133 |
|
134 |
gandalf ~ # |
135 |
|
136 |
|
137 |
> |
138 |
> You may recall a couple years ago when Ubuntu accidentally shipped with |
139 |
> laptop mode (or something, IDR the details) turned on by default, and |
140 |
> people were watching their drives wear out before their eyes. That's |
141 |
> effectively what you're doing, with an 8-second idle timeout. Most laptop |
142 |
> drives (2.5" and 1.8") are designed for it. Most 3.5" desktop/server |
143 |
> drives are NOT designed for that tight an idle timeout spec, and in fact, |
144 |
> may well last longer spinning at idle overnight, as opposed to shutting |
145 |
> down every day even. |
146 |
> |
147 |
> I'd at least look into it, as there's no use wearing the things out |
148 |
> unnecessarily. Maybe you'll decide to let them run that way and save the |
149 |
> power, but you'll know about the available choices then, at least. |
150 |
> |
151 |
|
152 |
Yeah, that's important. Thanks. If I can solve all these RAID problems |
153 |
then maybe I'll look at adding RAID to his box with better drives or |
154 |
something. |
155 |
|
156 |
Note that on my system only I'm seeing real problems in |
157 |
/var/log/message, non-RAID, like 1000's of these: |
158 |
|
159 |
Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 45276264 on sda3 |
160 |
Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46309336 on sda3 |
161 |
Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46567488 on sda3 |
162 |
Mar 29 14:06:33 keeper kernel: rsync(3368): READ block 46567680 on sda3 |
163 |
|
164 |
or |
165 |
|
166 |
Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555752 on sda3 |
167 |
Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555760 on sda3 |
168 |
Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555768 on sda3 |
169 |
Mar 29 14:07:36 keeper kernel: flush-8:0(3365): WRITE block 33555776 on sda3 |
170 |
|
171 |
|
172 |
However I see NONE of that on my dad's machine using the same drive |
173 |
but different chipset. |
174 |
|
175 |
The above problems seem to result in this sort of problem when I try |
176 |
going with RAID as I tried again this monring: |
177 |
|
178 |
INFO: task kjournald:5064 blocked for more than 120 seconds. |
179 |
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. |
180 |
kjournald D ffff880028351580 0 5064 2 0x00000000 |
181 |
ffff8801ac91a190 0000000000000046 0000000000000000 ffffffff81067110 |
182 |
000000000000dcf8 ffff880180863fd8 0000000000011580 0000000000011580 |
183 |
ffff88014165ba20 ffff8801ac89a834 ffff8801af920150 ffff8801ac91a418 |
184 |
Call Trace: |
185 |
[<ffffffff81067110>] ? __alloc_pages_nodemask+0xfa/0x58c |
186 |
[<ffffffff8129174a>] ? md_make_request+0xde/0x119 |
187 |
[<ffffffff810a9576>] ? sync_buffer+0x0/0x40 |
188 |
[<ffffffff81334305>] ? io_schedule+0x3e/0x54 |
189 |
[<ffffffff810a95b1>] ? sync_buffer+0x3b/0x40 |
190 |
[<ffffffff81334789>] ? __wait_on_bit+0x41/0x70 |
191 |
[<ffffffff810a9576>] ? sync_buffer+0x0/0x40 |
192 |
[<ffffffff81334823>] ? out_of_line_wait_on_bit+0x6b/0x77 |
193 |
[<ffffffff81040a66>] ? wake_bit_function+0x0/0x23 |
194 |
[<ffffffff8111f400>] ? journal_commit_transaction+0xb56/0x1112 |
195 |
[<ffffffff81334280>] ? schedule+0x8f4/0x93b |
196 |
[<ffffffff81335e3d>] ? _raw_spin_lock_irqsave+0x18/0x34 |
197 |
[<ffffffff81040a38>] ? autoremove_wake_function+0x0/0x2e |
198 |
[<ffffffff81335bcc>] ? _raw_spin_unlock_irqrestore+0x12/0x2c |
199 |
[<ffffffff8112278c>] ? kjournald+0xe2/0x20a |
200 |
[<ffffffff81040a38>] ? autoremove_wake_function+0x0/0x2e |
201 |
[<ffffffff811226aa>] ? kjournald+0x0/0x20a |
202 |
[<ffffffff81040665>] ? kthread+0x79/0x81 |
203 |
[<ffffffff81002c94>] ? kernel_thread_helper+0x4/0x10 |
204 |
[<ffffffff810405ec>] ? kthread+0x0/0x81 |
205 |
[<ffffffff81002c90>] ? kernel_thread_helper+0x0/0x10 |
206 |
Thanks, |
207 |
Mark |