Gentoo Archives: gentoo-user

From: Mick <michaelkintzios@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Re: Emerge --sync source
Date: Thu, 07 Mar 2019 15:11:35
Message-Id: 2811879.LfGSAUyNQX@dell_xps
In Reply to: Re: [gentoo-user] Re: Emerge --sync source by Rich Freeman
1 On Thursday, 7 March 2019 14:45:31 GMT Rich Freeman wrote:
2 > On Thu, Mar 7, 2019 at 9:29 AM Grant Edwards <grant.b.edwards@×××××.com>
3 wrote:
4 > > On 2019-03-07, Mick <michaelkintzios@×××××.com> wrote:
5 > > > I can think of 3 things, but more learned M/L contributors may add to
6 > > > these:
7 > > >
8 > > > 1. The SATA connection has come loose. With time and movement it can
9 > > > come
10 > > > (slightly) adrift. Pushing it back in fully fixes this problem - also
11 > > > see No. 2 below.
12 > > >
13 > > > 2. The physical connector's contacts are beginning to oxidise. Reseat
14 > > > the
15 > > > SATA cable connectors both on the drive and any ribbons on the MoBo.
16 > > > This
17 > > > usualy cleans any oxidisation.
18 > > >
19 > > > 3. The AHCI driver is deploying energy saving measures (aka. Aggressive
20 > > > Link> >
21 > > > Power Management - ALPM). Check the output of:
22 > > > cat /sys/class/scsi_host/host*/link_power_management_policy
23 > > >
24 > > > If it doesn't say 'max_performance' you'll need to revisit your BIOS
25 > > > settings and also PCIEASPM settings in the kernel.
26 > > >
27 > > > 4. Finally, there is a chance the PSU is playing up.
28 > >
29 > > Perhaps it's already been mentioned, but failing RAM can cause all
30 > > sorts failures that might appear to be failing disks, failing network
31 > > cards, failing video cards whatever. I'd run memtest86 for at least
32 > > 12 hours just to make sure...
33 >
34 > Failing RAM or failing power certainly can cause all manner of
35 > filesystem and other corruption. I've seen it firsthand and cleaning
36 > up from it is a total mess (usually best to restore from backup). I
37 > would definitely start with a memory test - if the motherboard is good
38 > then you can work outwards from there.
39 >
40 > From what I've heard SSDs can have bizarre failure modes since they
41 > interpose a logical layer between the physical storage media and the
42 > rest of the system. They're doing wear-leveling and so on behind the
43 > scenes, which means that if something goes wrong all kinds of bizarre
44 > problems can occur.
45 >
46 > I've also experienced a spinning hard drive exhibit lots of data
47 > corruption issues due to a faulty SATA interface (not sure where in
48 > the interface it - chipset, port, or cable). ZFS saved me there with
49 > detection and resolution of errors, and when I moved the drive to a
50 > different HBA it worked fine after a scrub. I'd never seen anything
51 > like it before but it really made me appreciate ZFS (btrfs should have
52 > also worked) - I don't think mdadm would have had any way to resolve
53 > these errors easily, though maybe if I used a hex editor to figure out
54 > which drive was the bad one I might have been able to move it, wipe
55 > it, then re-add it to the mirror pair and let it rebuild. With ZFS I
56 > just got an email complaining about errors from zed and it just kept
57 > beating back the hordes until I fixed the connection. I forget if it
58 > dropped the drive or not - I didn't have any spares but if I did I
59 > suspect it would have swapped it in after enough problems.
60
61 Good points raised re. faulty memory. Oxidisation can also occur on RAM
62 modules' contacts and reseating them works well. However, I can't recall the
63 OP mentioning corrupt data, which is usually the first thing observed with
64 faulty memory.
65
66 --
67 Regards,
68 Mick

Attachments

File name MIME type
signature.asc application/pgp-signature