Gentoo Archives: gentoo-user

From: Mark Knecht <markknecht@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
Date: Wed, 03 Mar 2010 15:26:43
Message-Id: 5bdc1c8b1003030726u90f837fw16015f0e724251fb@mail.gmail.com
In Reply to: Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" by Stroller
1 On Wed, Mar 3, 2010 at 6:26 AM, Stroller <stroller@××××××××××××××××××.uk> wrote:
2 >
3 > On 3 Mar 2010, at 14:00, Mark Knecht wrote:
4 >>
5 >> On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@××××××××××××××××××.uk>
6 >> wrote:
7 >>>
8 >>> There seem to have been a few people posting with filesystem corruption
9 >>> in
10 >>> the last week or two. It seems to be my turn, so I hope it isn't
11 >>> contagious.
12 >>> The cause here is quite clear - whilst rummaging in the server cupboard
13 >>> yesterday, power to the machine was accidentally disconnected.
14 >>
15 >> ...
16 >>  Sorry for your problems. I've had a rash of machine problems over
17 >> the last 6 weeks. No fun. I feel for you.
18 >>
19 >>  In my most recent case what looked like a simple disk corruption
20 >> problem was really a prelude to the drive just plain going bad. Have
21 >> you tried smartctl to see what it says about the drive at this point?
22 >>
23 >>  It would be even more frustrating to chroot in, do all the work,
24 >> think you had it fixed and then the underlying foundation of your
25 >> house crumbles beneath you 3 weeks from now.
26 >
27 > I don't think this is a problem. I would love to know what others think of
28 > the `smartctl` output:
29 >
30 >
31 > root@sysresccd /root % smartctl -H /dev/sda
32 > smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
33 > Home page is http://smartmontools.sourceforge.net/
34 >
35 > === START OF READ SMART DATA SECTION ===
36 > SMART overall-health self-assessment test result: PASSED
37 > Please note the following marginal Attributes:
38 > ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
39 >  WHEN_FAILED RAW_VALUE
40 >  9 Power_On_Seconds        0x0012   001   001   020    Old_age   Always
41 > FAILING_NOW 44803h+12m+16s
42 >
43 > root@sysresccd /root % smartctl -i /dev/sda
44 > smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
45 > Home page is http://smartmontools.sourceforge.net/
46 >
47 > === START OF INFORMATION SECTION ===
48 > Model Family:     Fujitsu MPA..MPG series
49 > Device Model:     FUJITSU MPF3204AT
50 > Serial Number:    05030567
51 > Firmware Version: 0028
52 > User Capacity:    20,496,236,544 bytes
53 > Device is:        In smartctl database [for details use: -P show]
54 > ATA Version is:   5
55 > ATA Standard is:  ATA/ATAPI-5 T13 1321D revision 1
56 > Local Time is:    Wed Mar  3 14:14:31 2010 UTC
57 > SMART support is: Available - device has SMART capability.
58 > SMART support is: Enabled
59 >
60 > root@sysresccd /root %
61 >
62 >
63 > This looks to me like smartctl is going "OMG! What an ancient drive!" - it's
64 > a 20gig EIDE drive and if my pocket calculator is correct (44803/24/365),
65 > it's seen 5 years of active use - and that's the "marginal attribute"
66 > referred to.
67 >
68 > Like I said, the power plug was accidentally pulled on this drive, so I'm
69 > inclined to attribute the corruption only to that, not to the drive actually
70 > failing.
71 >
72 > The drive is in a computer that has rarely been turned off in the last
73 > couple of years, and is also in a warm environment, conditions which are
74 > ideal. I appreciate the latter seems unintuitive, but in fact studies have
75 > showed that drives in somewhat warm environments last longer than those that
76 > are cooled.
77 >
78 > That it passes the "SMART overall-health self-assessment test" suggests to
79 > me that it is chugging away quite happily.
80 >
81 > I would have dismissed your concerns were it not for the capitalised
82 > "FAILING_NOW" in the output. Like I say, I think this is just smartctl
83 > declaring "OMG! this drive is old!", but I open this matter to the list for
84 > discussion (should you wish).
85 >
86 > I think I'm actually nearly ready to migrate off this system. The power was
87 > actually pulled as I installed 3 new (to me) rackmount machines in the
88 > server cupboard - the plan is to have identical machines running RAID, so
89 > that in the case of ANY problems I have spares available. I have take
90 > nightly backups of the important data on this machine, however I'd prefer it
91 > to run just a couple or a few weeks longer to allow me to migrate at my own
92 > leisure.
93 >
94 > Stroller.
95
96 I've had two machines go bad due to hard drive problems in the last 6
97 weeks. One drive was 4.5 years old, the other 6 years old. I have no
98 experience with smart. I'm just learning about it. However it is
99 generated by the microcontroller in the hard drive as per the view of
100 the drive manufacturer so if the drive is telling you it's failing
101 then...
102
103 My 4.5 year failure actually stopped producing smart output somewhere
104 along the way before it failed. The 6 year drive I wasn't using smart
105 at the time so I had no data from it but it was in an environment
106 where the UPS went through a lot of abuse.
107
108 I sounds like you have good backups so just make sure they are good
109 and do what you want.
110
111 - Mark