1 |
On Thursday, 7 March 2019 14:45:31 GMT Rich Freeman wrote: |
2 |
> On Thu, Mar 7, 2019 at 9:29 AM Grant Edwards <grant.b.edwards@×××××.com> |
3 |
wrote: |
4 |
> > On 2019-03-07, Mick <michaelkintzios@×××××.com> wrote: |
5 |
> > > I can think of 3 things, but more learned M/L contributors may add to |
6 |
> > > these: |
7 |
> > > |
8 |
> > > 1. The SATA connection has come loose. With time and movement it can |
9 |
> > > come |
10 |
> > > (slightly) adrift. Pushing it back in fully fixes this problem - also |
11 |
> > > see No. 2 below. |
12 |
> > > |
13 |
> > > 2. The physical connector's contacts are beginning to oxidise. Reseat |
14 |
> > > the |
15 |
> > > SATA cable connectors both on the drive and any ribbons on the MoBo. |
16 |
> > > This |
17 |
> > > usualy cleans any oxidisation. |
18 |
> > > |
19 |
> > > 3. The AHCI driver is deploying energy saving measures (aka. Aggressive |
20 |
> > > Link> > |
21 |
> > > Power Management - ALPM). Check the output of: |
22 |
> > > cat /sys/class/scsi_host/host*/link_power_management_policy |
23 |
> > > |
24 |
> > > If it doesn't say 'max_performance' you'll need to revisit your BIOS |
25 |
> > > settings and also PCIEASPM settings in the kernel. |
26 |
> > > |
27 |
> > > 4. Finally, there is a chance the PSU is playing up. |
28 |
> > |
29 |
> > Perhaps it's already been mentioned, but failing RAM can cause all |
30 |
> > sorts failures that might appear to be failing disks, failing network |
31 |
> > cards, failing video cards whatever. I'd run memtest86 for at least |
32 |
> > 12 hours just to make sure... |
33 |
> |
34 |
> Failing RAM or failing power certainly can cause all manner of |
35 |
> filesystem and other corruption. I've seen it firsthand and cleaning |
36 |
> up from it is a total mess (usually best to restore from backup). I |
37 |
> would definitely start with a memory test - if the motherboard is good |
38 |
> then you can work outwards from there. |
39 |
> |
40 |
> From what I've heard SSDs can have bizarre failure modes since they |
41 |
> interpose a logical layer between the physical storage media and the |
42 |
> rest of the system. They're doing wear-leveling and so on behind the |
43 |
> scenes, which means that if something goes wrong all kinds of bizarre |
44 |
> problems can occur. |
45 |
> |
46 |
> I've also experienced a spinning hard drive exhibit lots of data |
47 |
> corruption issues due to a faulty SATA interface (not sure where in |
48 |
> the interface it - chipset, port, or cable). ZFS saved me there with |
49 |
> detection and resolution of errors, and when I moved the drive to a |
50 |
> different HBA it worked fine after a scrub. I'd never seen anything |
51 |
> like it before but it really made me appreciate ZFS (btrfs should have |
52 |
> also worked) - I don't think mdadm would have had any way to resolve |
53 |
> these errors easily, though maybe if I used a hex editor to figure out |
54 |
> which drive was the bad one I might have been able to move it, wipe |
55 |
> it, then re-add it to the mirror pair and let it rebuild. With ZFS I |
56 |
> just got an email complaining about errors from zed and it just kept |
57 |
> beating back the hordes until I fixed the connection. I forget if it |
58 |
> dropped the drive or not - I didn't have any spares but if I did I |
59 |
> suspect it would have swapped it in after enough problems. |
60 |
|
61 |
Good points raised re. faulty memory. Oxidisation can also occur on RAM |
62 |
modules' contacts and reseating them works well. However, I can't recall the |
63 |
OP mentioning corrupt data, which is usually the first thing observed with |
64 |
faulty memory. |
65 |
|
66 |
-- |
67 |
Regards, |
68 |
Mick |