Gentoo Archives: gentoo-user

From: Mark Knecht <markknecht@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] recovery from /var corruption?
Date: Fri, 26 Feb 2010 15:17:31
Message-Id: 5bdc1c8b1002260717qd783a59k6f78a57ed384c0a9@mail.gmail.com
In Reply to: Re: [gentoo-user] recovery from /var corruption? by Alex Schuster
1 On Fri, Feb 26, 2010 at 1:46 AM, Alex Schuster <wonko@×××××××××.org> wrote:
2 > Mark Knecht writes:
3 >
4 >> Do I just watch the logs looking for problems? I have no way of
5 >> knowing right now whether this was a disk problem that's going to come
6 >> back, a 1 time deal due to power, or something else entirely.
7 >>
8 >> As these cheap machines that don't use RAID what's the right way to
9 >> go? emerge -e @world and then wait for the next event? Do nothing and
10 >> wait?
11 >
12 > Emerge smartmontools, then:
13 >
14 > smartctl -h /dev/sda  # get overview of what the drive thinks about itself
15 >
16 > smartctl -t short /dev/sda     # start short self test
17 > Wait
18 > smartctl -l selftest /dev/sda  # see results
19 >
20 > smartctl -t long /dev/sda      # start long self test
21 > Wait a lot longer
22 > smartctl -l selftest /dev/sda  # see results
23 >
24 > You can continue working in the meanwhile, there will be no performance
25 > impact. You will see something like this in the log:
26 >
27 > === START OF READ SMART DATA SECTION ===
28 > SMART Self-test log structure revision number 1
29 > Num  Test_Description   Status              Remaining  LifeTime(hours)
30 > LBA_of_first_error
31 > # 1  Short offline      Completed without error   00%    2275       -
32 > # 2  Extended offline   Completed without error   00%    2270       -
33 > # 3  Extended offline   Completed without error   00%    1799       -
34 > # 4  Extended offline   Completed without error   00%     197       -
35 > # 5  Extended offline   Completed without error   00%      26       -
36 >
37 > I you have a '-' in the right column, the disk has found no errors. If
38 > there is a number, than it's the position of the first error.
39 >
40 > There's also badblocks, this will check every block and output the bad
41 > ones: badblocks -sv /dev/sda
42 >
43 > badblocks -svn /dev/sda will do a read-write test. In case of a bad block,
44 > the drive should exchange it with a spare one. Maybe this happens already
45 > in read-only mode, I am not sure.
46 >
47 > Also watch for errors in syslog or via dmesg, there should be some when
48 > bad blocks are being accessed.
49 >
50 >        Wonko
51 >
52 >
53
54 Hi Wonko,
55 Yes, I do use smartctl on some other machines although I'm not very
56 good about it and your write-up is helpful so thanks for that.
57
58 My wife's machines is older and and I don't think SMART is
59 supported on her drive. Note the lack of a * on the SMART line in
60 hdparm -I:
61
62 dragonfly ~ # hdparm -I /dev/hda
63
64 /dev/hda:
65
66 ATA device, with non-removable media
67 Model Number: WDC WD1600BB-00FTA0
68 Serial Number: WD-WMAES2091586
69 Firmware Revision: 15.05R15
70 Standards:
71 Supported: 6 5 4
72 Likely used: 6
73 Configuration:
74 Logical max current
75 cylinders 16383 16383
76 heads 16 16
77 sectors/track 63 63
78 --
79 CHS current addressable sectors: 16514064
80 LBA user addressable sectors: 268435455
81 LBA48 user addressable sectors: 312581808
82 Logical/Physical Sector size: 512 bytes
83 device size with M = 1024*1024: 152627 MBytes
84 device size with M = 1000*1000: 160041 MBytes (160 GB)
85 cache/buffer size = 2048 KBytes (type=DualPortCache)
86 Capabilities:
87 LBA, IORDY(can be disabled)
88 Standby timer values: spec'd by Standard, with device specific minimum
89 R/W multiple sector transfer: Max = 16 Current = 16
90 Recommended acoustic management value: 128, current value: 254
91 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
92 Cycle time: min=120ns recommended=120ns
93 PIO: pio0 pio1 pio2 pio3 pio4
94 Cycle time: no flow control=120ns IORDY flow control=120ns
95 Commands/features:
96 Enabled Supported:
97 SMART feature set
98 Security Mode feature set
99 * Power Management feature set
100 * Write cache
101 * Look-ahead
102 * Host Protected Area feature set
103 * WRITE_BUFFER command
104 * READ_BUFFER command
105 * DOWNLOAD_MICROCODE
106 SET_MAX security extension
107 Automatic Acoustic Management feature set
108 * 48-bit Address feature set
109 * Device Configuration Overlay feature set
110 * Mandatory FLUSH_CACHE
111 * FLUSH_CACHE_EXT
112 * SMART error logging
113 * SMART self-test
114 Security:
115 supported
116 not enabled
117 not locked
118 not frozen
119 not expired: security count
120 not supported: enhanced erase
121 HW reset results:
122 CBLID- above Vih
123 Device num = 0 determined by CSEL
124 Checksum: correct
125 dragonfly ~ #
126
127 dragonfly ~ # smartctl -H /dev/hda
128 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
129 Home page is http://smartmontools.sourceforge.net/
130
131 SMART Disabled. Use option -s with argument 'on' to enable it.
132 dragonfly ~ # smartctl -s on /dev/hda
133 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
134 Home page is http://smartmontools.sourceforge.net/
135
136 === START OF ENABLE/DISABLE COMMANDS SECTION ===
137 Error SMART Enable failed: Input/output error
138 Smartctl: SMART Enable Failed.
139
140 A mandatory SMART command failed: exiting. To continue, add one or
141 more '-T permissive' options.
142 dragonfly ~ #
143
144 I've not tried the -T permissive options.
145
146 I've never used badblocks as it seems I should only do that off-line.
147 This might be a good time to boot with a CD and try it out.
148
149 Maybe I should just get a new drive that supports SMART?
150
151 - Mark

Replies

Subject Author
Re: [gentoo-user] recovery from /var corruption? Alex Schuster <wonko@×××××××××.org>