1 |
On Wed, Mar 3, 2010 at 6:26 AM, Stroller <stroller@××××××××××××××××××.uk> wrote: |
2 |
> |
3 |
> On 3 Mar 2010, at 14:00, Mark Knecht wrote: |
4 |
>> |
5 |
>> On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@××××××××××××××××××.uk> |
6 |
>> wrote: |
7 |
>>> |
8 |
>>> There seem to have been a few people posting with filesystem corruption |
9 |
>>> in |
10 |
>>> the last week or two. It seems to be my turn, so I hope it isn't |
11 |
>>> contagious. |
12 |
>>> The cause here is quite clear - whilst rummaging in the server cupboard |
13 |
>>> yesterday, power to the machine was accidentally disconnected. |
14 |
>> |
15 |
>> ... |
16 |
>> Sorry for your problems. I've had a rash of machine problems over |
17 |
>> the last 6 weeks. No fun. I feel for you. |
18 |
>> |
19 |
>> In my most recent case what looked like a simple disk corruption |
20 |
>> problem was really a prelude to the drive just plain going bad. Have |
21 |
>> you tried smartctl to see what it says about the drive at this point? |
22 |
>> |
23 |
>> It would be even more frustrating to chroot in, do all the work, |
24 |
>> think you had it fixed and then the underlying foundation of your |
25 |
>> house crumbles beneath you 3 weeks from now. |
26 |
> |
27 |
> I don't think this is a problem. I would love to know what others think of |
28 |
> the `smartctl` output: |
29 |
> |
30 |
> |
31 |
> root@sysresccd /root % smartctl -H /dev/sda |
32 |
> smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen |
33 |
> Home page is http://smartmontools.sourceforge.net/ |
34 |
> |
35 |
> === START OF READ SMART DATA SECTION === |
36 |
> SMART overall-health self-assessment test result: PASSED |
37 |
> Please note the following marginal Attributes: |
38 |
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED |
39 |
> WHEN_FAILED RAW_VALUE |
40 |
> 9 Power_On_Seconds 0x0012 001 001 020 Old_age Always |
41 |
> FAILING_NOW 44803h+12m+16s |
42 |
> |
43 |
> root@sysresccd /root % smartctl -i /dev/sda |
44 |
> smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen |
45 |
> Home page is http://smartmontools.sourceforge.net/ |
46 |
> |
47 |
> === START OF INFORMATION SECTION === |
48 |
> Model Family: Fujitsu MPA..MPG series |
49 |
> Device Model: FUJITSU MPF3204AT |
50 |
> Serial Number: 05030567 |
51 |
> Firmware Version: 0028 |
52 |
> User Capacity: 20,496,236,544 bytes |
53 |
> Device is: In smartctl database [for details use: -P show] |
54 |
> ATA Version is: 5 |
55 |
> ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 |
56 |
> Local Time is: Wed Mar 3 14:14:31 2010 UTC |
57 |
> SMART support is: Available - device has SMART capability. |
58 |
> SMART support is: Enabled |
59 |
> |
60 |
> root@sysresccd /root % |
61 |
> |
62 |
> |
63 |
> This looks to me like smartctl is going "OMG! What an ancient drive!" - it's |
64 |
> a 20gig EIDE drive and if my pocket calculator is correct (44803/24/365), |
65 |
> it's seen 5 years of active use - and that's the "marginal attribute" |
66 |
> referred to. |
67 |
> |
68 |
> Like I said, the power plug was accidentally pulled on this drive, so I'm |
69 |
> inclined to attribute the corruption only to that, not to the drive actually |
70 |
> failing. |
71 |
> |
72 |
> The drive is in a computer that has rarely been turned off in the last |
73 |
> couple of years, and is also in a warm environment, conditions which are |
74 |
> ideal. I appreciate the latter seems unintuitive, but in fact studies have |
75 |
> showed that drives in somewhat warm environments last longer than those that |
76 |
> are cooled. |
77 |
> |
78 |
> That it passes the "SMART overall-health self-assessment test" suggests to |
79 |
> me that it is chugging away quite happily. |
80 |
> |
81 |
> I would have dismissed your concerns were it not for the capitalised |
82 |
> "FAILING_NOW" in the output. Like I say, I think this is just smartctl |
83 |
> declaring "OMG! this drive is old!", but I open this matter to the list for |
84 |
> discussion (should you wish). |
85 |
> |
86 |
> I think I'm actually nearly ready to migrate off this system. The power was |
87 |
> actually pulled as I installed 3 new (to me) rackmount machines in the |
88 |
> server cupboard - the plan is to have identical machines running RAID, so |
89 |
> that in the case of ANY problems I have spares available. I have take |
90 |
> nightly backups of the important data on this machine, however I'd prefer it |
91 |
> to run just a couple or a few weeks longer to allow me to migrate at my own |
92 |
> leisure. |
93 |
> |
94 |
> Stroller. |
95 |
|
96 |
I've had two machines go bad due to hard drive problems in the last 6 |
97 |
weeks. One drive was 4.5 years old, the other 6 years old. I have no |
98 |
experience with smart. I'm just learning about it. However it is |
99 |
generated by the microcontroller in the hard drive as per the view of |
100 |
the drive manufacturer so if the drive is telling you it's failing |
101 |
then... |
102 |
|
103 |
My 4.5 year failure actually stopped producing smart output somewhere |
104 |
along the way before it failed. The 6 year drive I wasn't using smart |
105 |
at the time so I had no data from it but it was in an environment |
106 |
where the UPS went through a lot of abuse. |
107 |
|
108 |
I sounds like you have good backups so just make sure they are good |
109 |
and do what you want. |
110 |
|
111 |
- Mark |