Gentoo Archives: gentoo-user

From: Stroller <stroller@××××××××××××××××××.uk>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
Date: Wed, 03 Mar 2010 14:27:11
Message-Id: F22E4619-5236-49FD-A9C8-4EDAC7DB9A2F@stellar.eclipse.co.uk
In Reply to: Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" by Mark Knecht
1 On 3 Mar 2010, at 14:00, Mark Knecht wrote:
2 > On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@××××××××××××××××××.uk
3 > > wrote:
4 >> There seem to have been a few people posting with filesystem
5 >> corruption in
6 >> the last week or two. It seems to be my turn, so I hope it isn't
7 >> contagious.
8 >> The cause here is quite clear - whilst rummaging in the server
9 >> cupboard
10 >> yesterday, power to the machine was accidentally disconnected.
11 > ...
12 > Sorry for your problems. I've had a rash of machine problems over
13 > the last 6 weeks. No fun. I feel for you.
14 >
15 > In my most recent case what looked like a simple disk corruption
16 > problem was really a prelude to the drive just plain going bad. Have
17 > you tried smartctl to see what it says about the drive at this point?
18 >
19 > It would be even more frustrating to chroot in, do all the work,
20 > think you had it fixed and then the underlying foundation of your
21 > house crumbles beneath you 3 weeks from now.
22
23 I don't think this is a problem. I would love to know what others
24 think of the `smartctl` output:
25
26
27 root@sysresccd /root % smartctl -H /dev/sda
28 smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce
29 Allen
30 Home page is http://smartmontools.sourceforge.net/
31
32 === START OF READ SMART DATA SECTION ===
33 SMART overall-health self-assessment test result: PASSED
34 Please note the following marginal Attributes:
35 ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
36 UPDATED WHEN_FAILED RAW_VALUE
37 9 Power_On_Seconds 0x0012 001 001 020 Old_age
38 Always FAILING_NOW 44803h+12m+16s
39
40 root@sysresccd /root % smartctl -i /dev/sda
41 smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce
42 Allen
43 Home page is http://smartmontools.sourceforge.net/
44
45 === START OF INFORMATION SECTION ===
46 Model Family: Fujitsu MPA..MPG series
47 Device Model: FUJITSU MPF3204AT
48 Serial Number: 05030567
49 Firmware Version: 0028
50 User Capacity: 20,496,236,544 bytes
51 Device is: In smartctl database [for details use: -P show]
52 ATA Version is: 5
53 ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1
54 Local Time is: Wed Mar 3 14:14:31 2010 UTC
55 SMART support is: Available - device has SMART capability.
56 SMART support is: Enabled
57
58 root@sysresccd /root %
59
60
61 This looks to me like smartctl is going "OMG! What an ancient drive!"
62 - it's a 20gig EIDE drive and if my pocket calculator is correct
63 (44803/24/365), it's seen 5 years of active use - and that's the
64 "marginal attribute" referred to.
65
66 Like I said, the power plug was accidentally pulled on this drive, so
67 I'm inclined to attribute the corruption only to that, not to the
68 drive actually failing.
69
70 The drive is in a computer that has rarely been turned off in the last
71 couple of years, and is also in a warm environment, conditions which
72 are ideal. I appreciate the latter seems unintuitive, but in fact
73 studies have showed that drives in somewhat warm environments last
74 longer than those that are cooled.
75
76 That it passes the "SMART overall-health self-assessment test"
77 suggests to me that it is chugging away quite happily.
78
79 I would have dismissed your concerns were it not for the capitalised
80 "FAILING_NOW" in the output. Like I say, I think this is just smartctl
81 declaring "OMG! this drive is old!", but I open this matter to the
82 list for discussion (should you wish).
83
84 I think I'm actually nearly ready to migrate off this system. The
85 power was actually pulled as I installed 3 new (to me) rackmount
86 machines in the server cupboard - the plan is to have identical
87 machines running RAID, so that in the case of ANY problems I have
88 spares available. I have take nightly backups of the important data on
89 this machine, however I'd prefer it to run just a couple or a few
90 weeks longer to allow me to migrate at my own leisure.
91
92 Stroller.

Replies