1 |
On 07/25/2015 05:12 AM, lee wrote: |
2 |
> Daniel Frey <djqfrey@×××××.com> writes: |
3 |
> |
4 |
>> Well, I sure haven't had much luck with SSDs. This will be the third one |
5 |
>> I've lost. |
6 |
> |
7 |
> + Buy good hardware. |
8 |
> + Never store anything on only a single disk (with very few exceptions). |
9 |
> + Do not put swap partitions on single disks, either. |
10 |
> + Disks always come in pairs at least. |
11 |
> |
12 |
> SSDs, I'd currently buy Samsung 850 pro or evo, depending on how they |
13 |
> are going to be used. |
14 |
|
15 |
Samsung firmware is glitchy. When I bought the replacement I read that |
16 |
the firmware in the 840/850 may be glitchy causing random blocks of |
17 |
valid data to be erased during TRIM operations. (Reported June 2015.) |
18 |
|
19 |
I was going to get a Samsung until I read of this new firmware glitch, |
20 |
as the store I went to had them in stock. I was originally going to get |
21 |
an Intel SSD but nobody here had stock. |
22 |
|
23 |
> |
24 |
>> yesterday bought a new SSD, this time a SanDisk model. It was cheap and |
25 |
>> I hope I don't regret this in the future. |
26 |
> |
27 |
> Well, you get what you pay for. To me, all the hassle a failed disk (or |
28 |
> other hardware) will give me isn't worth saving a bit of money on it. |
29 |
|
30 |
I maintain full backup images (or stage4, is that what they call them?) |
31 |
so I just unpack, reinstall grub, and reboot. This is for a remote |
32 |
mythtv frontend so there are no other files on it. The databases and |
33 |
everything it needs are run on a server with RAID and backups of the |
34 |
database on said server to a NAS nightly. |
35 |
|
36 |
> |
37 |
> Since you got it cheap, why not buy another one and use RAID-1 (and/or |
38 |
> zfs)? When one fails, shutdown, replace the failed disk, restart --- no |
39 |
> hassle involved. |
40 |
> |
41 |
>> That aside, the drive that failed is a Crucial m4. I have done some |
42 |
>> searching as how to run diagnostics on an SSD. |
43 |
> |
44 |
> When you get sector errors reported in the log file when accessing the |
45 |
> disk, the disk has failed (provided that the cabling and power supply |
46 |
> are ok). This goes for hard disks --- are SSDs any different in that? |
47 |
|
48 |
I wasn't getting sector errors from the disk itself. I was experiencing |
49 |
random kernel panics and apparently scrambled data. From reviews, |
50 |
Crucial seems to make decent SSDs. I've lost Crucial and Kingston SSDs |
51 |
in the past, all in this machine (and one in the server, now I find |
52 |
they're too unreliable for that use so I don't use them in my server |
53 |
anymore.) |
54 |
|
55 |
> |
56 |
> Other than that, I don't need any more diagnostics. It would only tell |
57 |
> me what I already know. |
58 |
|
59 |
Well, after lots of scratching head, I decided to |
60 |
|
61 |
-run smartctl tests (no real info) |
62 |
-I used shred on it, then used fstrim, checked results |
63 |
-Updated the firmware, another shred and fstrim |
64 |
|
65 |
I guess for SSDs there's really no actual check. Best you can do is |
66 |
shred/fstrim using smartctl (check drive stats before and after |
67 |
operations so you can compare them.) My SSD with the new firmware didn't |
68 |
trip any more errors. |
69 |
|
70 |
> |
71 |
>> I usually send them back for warranty, but this time I'm curious. |
72 |
> |
73 |
> Without physically destroying it, I won't give any disk out of hand |
74 |
> which has had my data on it. Unfortunately, that probably means that |
75 |
> there is no warranty on disks. I only take the duration of the warranty |
76 |
> as some indicator of what the manufacturer entrusts the disk with, as in |
77 |
> "5 years may be better than 3". |
78 |
|
79 |
There's no data on it that I care about, as I said, it's a remote mythtv |
80 |
frontend. For drives in my main workstation I destroy them. |
81 |
|
82 |
> |
83 |
> In practice, hard disks either fail not long after new, or after about 3 |
84 |
> years, or virtually never because they are replaced for other reasons |
85 |
> before they fail. SSDs might be different; I don't have much experience |
86 |
> with them yet. |
87 |
> |
88 |
|
89 |
In my experience SSDs just randomly fail with no warning whatsoever. |
90 |
Random issues/crashes/segfaults, kernel panics. smartctl reports nothing. |
91 |
|
92 |
I'd actually forgotten I'd posted this. What happened in the end is that |
93 |
I called the manufacturer and they asked me to leave the SSD plugged in |
94 |
without a SATA cable attached to do manual garbage collection. That got |
95 |
me thinking, I checked the machine and the discard option wasn't set in |
96 |
fstab. I do recall reading newer kernels are supposed to use TRIM |
97 |
automatically and so I didn't explicitly set it, but maybe that's with |
98 |
other distros. |
99 |
|
100 |
I suspect it was a configuration error on my part. I have added the |
101 |
discard option and am going to convert /boot to ext4 so I can use the |
102 |
discard option there too, and install grub2 to take care of the booting. |
103 |
I also set up anacron with fstrim to run weekly as I found recommended |
104 |
elsewhere to try to resolve the problem. I'm going to convert vixie-cron |
105 |
and anacron to cronie with the anacron USE set as well, so I can set the |
106 |
MAILFROM var in crontab as when machines email me I can't figure out |
107 |
which machine it came from. |
108 |
|
109 |
I don't have another machine to try this Crucial drive in yet, but I'll |
110 |
find something. It'll probably be fine now. |