1 |
On 3 Mar 2010, at 14:00, Mark Knecht wrote: |
2 |
> On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@××××××××××××××××××.uk |
3 |
> > wrote: |
4 |
>> There seem to have been a few people posting with filesystem |
5 |
>> corruption in |
6 |
>> the last week or two. It seems to be my turn, so I hope it isn't |
7 |
>> contagious. |
8 |
>> The cause here is quite clear - whilst rummaging in the server |
9 |
>> cupboard |
10 |
>> yesterday, power to the machine was accidentally disconnected. |
11 |
> ... |
12 |
> Sorry for your problems. I've had a rash of machine problems over |
13 |
> the last 6 weeks. No fun. I feel for you. |
14 |
> |
15 |
> In my most recent case what looked like a simple disk corruption |
16 |
> problem was really a prelude to the drive just plain going bad. Have |
17 |
> you tried smartctl to see what it says about the drive at this point? |
18 |
> |
19 |
> It would be even more frustrating to chroot in, do all the work, |
20 |
> think you had it fixed and then the underlying foundation of your |
21 |
> house crumbles beneath you 3 weeks from now. |
22 |
|
23 |
I don't think this is a problem. I would love to know what others |
24 |
think of the `smartctl` output: |
25 |
|
26 |
|
27 |
root@sysresccd /root % smartctl -H /dev/sda |
28 |
smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce |
29 |
Allen |
30 |
Home page is http://smartmontools.sourceforge.net/ |
31 |
|
32 |
=== START OF READ SMART DATA SECTION === |
33 |
SMART overall-health self-assessment test result: PASSED |
34 |
Please note the following marginal Attributes: |
35 |
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE |
36 |
UPDATED WHEN_FAILED RAW_VALUE |
37 |
9 Power_On_Seconds 0x0012 001 001 020 Old_age |
38 |
Always FAILING_NOW 44803h+12m+16s |
39 |
|
40 |
root@sysresccd /root % smartctl -i /dev/sda |
41 |
smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce |
42 |
Allen |
43 |
Home page is http://smartmontools.sourceforge.net/ |
44 |
|
45 |
=== START OF INFORMATION SECTION === |
46 |
Model Family: Fujitsu MPA..MPG series |
47 |
Device Model: FUJITSU MPF3204AT |
48 |
Serial Number: 05030567 |
49 |
Firmware Version: 0028 |
50 |
User Capacity: 20,496,236,544 bytes |
51 |
Device is: In smartctl database [for details use: -P show] |
52 |
ATA Version is: 5 |
53 |
ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 |
54 |
Local Time is: Wed Mar 3 14:14:31 2010 UTC |
55 |
SMART support is: Available - device has SMART capability. |
56 |
SMART support is: Enabled |
57 |
|
58 |
root@sysresccd /root % |
59 |
|
60 |
|
61 |
This looks to me like smartctl is going "OMG! What an ancient drive!" |
62 |
- it's a 20gig EIDE drive and if my pocket calculator is correct |
63 |
(44803/24/365), it's seen 5 years of active use - and that's the |
64 |
"marginal attribute" referred to. |
65 |
|
66 |
Like I said, the power plug was accidentally pulled on this drive, so |
67 |
I'm inclined to attribute the corruption only to that, not to the |
68 |
drive actually failing. |
69 |
|
70 |
The drive is in a computer that has rarely been turned off in the last |
71 |
couple of years, and is also in a warm environment, conditions which |
72 |
are ideal. I appreciate the latter seems unintuitive, but in fact |
73 |
studies have showed that drives in somewhat warm environments last |
74 |
longer than those that are cooled. |
75 |
|
76 |
That it passes the "SMART overall-health self-assessment test" |
77 |
suggests to me that it is chugging away quite happily. |
78 |
|
79 |
I would have dismissed your concerns were it not for the capitalised |
80 |
"FAILING_NOW" in the output. Like I say, I think this is just smartctl |
81 |
declaring "OMG! this drive is old!", but I open this matter to the |
82 |
list for discussion (should you wish). |
83 |
|
84 |
I think I'm actually nearly ready to migrate off this system. The |
85 |
power was actually pulled as I installed 3 new (to me) rackmount |
86 |
machines in the server cupboard - the plan is to have identical |
87 |
machines running RAID, so that in the case of ANY problems I have |
88 |
spares available. I have take nightly backups of the important data on |
89 |
this machine, however I'd prefer it to run just a couple or a few |
90 |
weeks longer to allow me to migrate at my own leisure. |
91 |
|
92 |
Stroller. |