Gentoo Archives: gentoo-user

From: Daniel Frey <djqfrey@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Testing SSD? (Somewhat OT)
Date: Sat, 25 Jul 2015 15:50:14
Message-Id: 55B3B00F.9040506@gmail.com
In Reply to: Re: [gentoo-user] Testing SSD? (Somewhat OT) by lee
1 On 07/25/2015 05:12 AM, lee wrote:
2 > Daniel Frey <djqfrey@×××××.com> writes:
3 >
4 >> Well, I sure haven't had much luck with SSDs. This will be the third one
5 >> I've lost.
6 >
7 > + Buy good hardware.
8 > + Never store anything on only a single disk (with very few exceptions).
9 > + Do not put swap partitions on single disks, either.
10 > + Disks always come in pairs at least.
11 >
12 > SSDs, I'd currently buy Samsung 850 pro or evo, depending on how they
13 > are going to be used.
14
15 Samsung firmware is glitchy. When I bought the replacement I read that
16 the firmware in the 840/850 may be glitchy causing random blocks of
17 valid data to be erased during TRIM operations. (Reported June 2015.)
18
19 I was going to get a Samsung until I read of this new firmware glitch,
20 as the store I went to had them in stock. I was originally going to get
21 an Intel SSD but nobody here had stock.
22
23 >
24 >> yesterday bought a new SSD, this time a SanDisk model. It was cheap and
25 >> I hope I don't regret this in the future.
26 >
27 > Well, you get what you pay for. To me, all the hassle a failed disk (or
28 > other hardware) will give me isn't worth saving a bit of money on it.
29
30 I maintain full backup images (or stage4, is that what they call them?)
31 so I just unpack, reinstall grub, and reboot. This is for a remote
32 mythtv frontend so there are no other files on it. The databases and
33 everything it needs are run on a server with RAID and backups of the
34 database on said server to a NAS nightly.
35
36 >
37 > Since you got it cheap, why not buy another one and use RAID-1 (and/or
38 > zfs)? When one fails, shutdown, replace the failed disk, restart --- no
39 > hassle involved.
40 >
41 >> That aside, the drive that failed is a Crucial m4. I have done some
42 >> searching as how to run diagnostics on an SSD.
43 >
44 > When you get sector errors reported in the log file when accessing the
45 > disk, the disk has failed (provided that the cabling and power supply
46 > are ok). This goes for hard disks --- are SSDs any different in that?
47
48 I wasn't getting sector errors from the disk itself. I was experiencing
49 random kernel panics and apparently scrambled data. From reviews,
50 Crucial seems to make decent SSDs. I've lost Crucial and Kingston SSDs
51 in the past, all in this machine (and one in the server, now I find
52 they're too unreliable for that use so I don't use them in my server
53 anymore.)
54
55 >
56 > Other than that, I don't need any more diagnostics. It would only tell
57 > me what I already know.
58
59 Well, after lots of scratching head, I decided to
60
61 -run smartctl tests (no real info)
62 -I used shred on it, then used fstrim, checked results
63 -Updated the firmware, another shred and fstrim
64
65 I guess for SSDs there's really no actual check. Best you can do is
66 shred/fstrim using smartctl (check drive stats before and after
67 operations so you can compare them.) My SSD with the new firmware didn't
68 trip any more errors.
69
70 >
71 >> I usually send them back for warranty, but this time I'm curious.
72 >
73 > Without physically destroying it, I won't give any disk out of hand
74 > which has had my data on it. Unfortunately, that probably means that
75 > there is no warranty on disks. I only take the duration of the warranty
76 > as some indicator of what the manufacturer entrusts the disk with, as in
77 > "5 years may be better than 3".
78
79 There's no data on it that I care about, as I said, it's a remote mythtv
80 frontend. For drives in my main workstation I destroy them.
81
82 >
83 > In practice, hard disks either fail not long after new, or after about 3
84 > years, or virtually never because they are replaced for other reasons
85 > before they fail. SSDs might be different; I don't have much experience
86 > with them yet.
87 >
88
89 In my experience SSDs just randomly fail with no warning whatsoever.
90 Random issues/crashes/segfaults, kernel panics. smartctl reports nothing.
91
92 I'd actually forgotten I'd posted this. What happened in the end is that
93 I called the manufacturer and they asked me to leave the SSD plugged in
94 without a SATA cable attached to do manual garbage collection. That got
95 me thinking, I checked the machine and the discard option wasn't set in
96 fstab. I do recall reading newer kernels are supposed to use TRIM
97 automatically and so I didn't explicitly set it, but maybe that's with
98 other distros.
99
100 I suspect it was a configuration error on my part. I have added the
101 discard option and am going to convert /boot to ext4 so I can use the
102 discard option there too, and install grub2 to take care of the booting.
103 I also set up anacron with fstrim to run weekly as I found recommended
104 elsewhere to try to resolve the problem. I'm going to convert vixie-cron
105 and anacron to cronie with the anacron USE set as well, so I can set the
106 MAILFROM var in crontab as when machines email me I can't figure out
107 which machine it came from.
108
109 I don't have another machine to try this Crucial drive in yet, but I'll
110 find something. It'll probably be fine now.

Replies

Subject Author
Re: [gentoo-user] Testing SSD? (Somewhat OT) Peter Humphrey <peter@××××××××××××.uk>