1 |
On Tue, Mar 30, 2010 at 11:08 AM, Duncan <1i5t5.duncan@×××.net> wrote: |
2 |
> Mark Knecht posted on Tue, 30 Mar 2010 06:56:14 -0700 as excerpted: |
3 |
> |
4 |
>> 3) I LOVE your idea of managing 3 /boot partitions by hand instead of |
5 |
>> using RAID. Easy to do, completely testable ahead of time. If I ensure |
6 |
>> that every disk can boot then no matter what disk goes down the machine |
7 |
>> still works, at least a little. Not that much work and even if I don't |
8 |
>> do it for awhile it doesn't matter as I can do repairs without a CD. |
9 |
>> (well....) |
10 |
> |
11 |
> That's one of those things you only tend to realize after running a RAID |
12 |
> for awhile... and possibly after having grub die, for some reason I don't |
13 |
> quite understand, on just a kernel update... and realizing that had I |
14 |
> setup multiple independent /boot and boot-backup partitions instead of a |
15 |
> single RAID-1 /boot, I'd have had the backups to boot to if I'd have |
16 |
> needed it. |
17 |
> |
18 |
> So call it the voice of experience! =:^) |
19 |
> |
20 |
> Meanwhile, glad you figured the problem out. A boot-flag-requiring- |
21 |
> BIOS... that'd explain the problem for both the RAID and no-RAID version! |
22 |
|
23 |
I've set up a duplicate boot partition on sdb and it boots. However |
24 |
one thing I'm unsure if when I change the hard drive boot does the old |
25 |
sdb become the new sda because it's what got booted? Or is the order |
26 |
still as it was? The answer determines what I do in grub.conf as to |
27 |
which drive I'm trying to use. I can figure this out later by putting |
28 |
something different on each drive and looking. Might be system/BIOS |
29 |
dependent. |
30 |
|
31 |
> |
32 |
> 100% waits for long periods... I've seen a number of reasons for this. |
33 |
> One key to remember is that I/O backups have a way of stopping many other |
34 |
> things at times. Among the reasons I've seen: |
35 |
> |
36 |
|
37 |
OK, so some new information is another person the RAID list is |
38 |
experiencing something very similar with different hardware. |
39 |
|
40 |
As for your ideas: |
41 |
|
42 |
> 1a) Dying disk. |
43 |
> 1b) hard to read data sectors. |
44 |
|
45 |
All new drives, smartctl says no problems reading anything and no |
46 |
registered error correction has taken place. |
47 |
|
48 |
> |
49 |
> 2) DHCP |
50 |
|
51 |
Not using it, at least not intentionally. Doesn't mean networking |
52 |
isn't doing something strange. |
53 |
|
54 |
> |
55 |
> 3) suspend the disks after a period of inactivity |
56 |
|
57 |
This could be part of what's going on, but I don't think it's the |
58 |
whole story. My drives (WD Green 1TB drives) apparently park the heads |
59 |
after 8 seconds (yes 8 seconds!) of inactivity to save power. Each |
60 |
time it parks it increments the Load_Cycle_Count SMART parameter. I've |
61 |
been tracking this on the three drives in the system. The one I'm |
62 |
currently using is incrementing while the 2 that sit unused until I |
63 |
get RAID going again are not. Possibly there is something about how |
64 |
these drives come out of park that creates large delays once in |
65 |
awhile. |
66 |
|
67 |
OK, now the only problem with that analysis is the other guy |
68 |
experiencing this problem doesn't use this drive so that problem |
69 |
requires that he has something similar happening in his drives. |
70 |
Additionally I just used one of these drives in my dad's new machine |
71 |
with a different motherboard and didn't see this problem, or didn't |
72 |
notice it but I'll go study that and see what his system does. |
73 |
|
74 |
> |
75 |
> 4) I/O priority inversion on ext3 |
76 |
|
77 |
Now this one is an interesting idea. Maybe I should try a few |
78 |
different file systems for no other reason than eliminating the file |
79 |
system type as the cause. Good input. |
80 |
|
81 |
Thanks for the ideas! |
82 |
|
83 |
Cheers, |
84 |
Mark |