1 |
On Thursday, 7 March 2019 10:10:53 GMT Peter Humphrey wrote: |
2 |
> On Wednesday, 6 March 2019 16:31:27 GMT Laurence Perkins wrote: |
3 |
> > On Fri, 2019-03-01 at 10:12 +0000, Peter Humphrey wrote: |
4 |
> > > [OT] |
5 |
> > > Evidence is mounting that the Atom box is in terminal decline. I get |
6 |
> > > things like batches of files in the portage tree changing owner, and |
7 |
> > > then |
8 |
> > > when I correct that, long lists of supposedly locally changed ebuilds |
9 |
> > > preventing syncing. And when I boot weekly into its little rescue system |
10 |
> > > to backup the main system, the root filesystem remounts itself read-only |
11 |
> > > while tar is running. Smartd recognises the SSD and runs daily tests, |
12 |
> > > but |
13 |
> > > reports no errors. No amount of wiping and reinstalling has helped so |
14 |
> > > far. |
15 |
> > |
16 |
> > What filesystem are you running and how old is the SSD? That sounds |
17 |
> > like some of the symptoms EXT4 had on early generation flash media |
18 |
> > where its assumptions about what order writes would physically make it |
19 |
> > to the disk in were wrong, leading to corruption. |
20 |
> |
21 |
> The disk is a 64GB SanDisk SDSSDP device, which I bought five years ago to |
22 |
> replace a failed spinning disk. All partitions are ext4 except /boot, which |
23 |
> is ext2. |
24 |
> |
25 |
> > So unless it was working correctly at some point in the past, try a |
26 |
> > different filesystem. EXT3 or BTRFS didn't have the same problems. |
27 |
> |
28 |
> It was working just fine until recently. |
29 |
> |
30 |
> > If it's just that the SSD is failing, then get a new one before |
31 |
> > something important gets damaged and you have to redo the whole thing. |
32 |
> |
33 |
> Everything on it is disposable. |
34 |
> |
35 |
> The box is getting a bit long in the tooth: I bought it in November 2010. |
36 |
> It's a single-core, 32-bit Atom N270 (not N2700). It doesn't owe me |
37 |
> anything now, in spite of having cost £450 at the time. I don't know |
38 |
> whether it's worth throwing any more money at it. On the other hand, I see |
39 |
> Amazon are only asking for £20 for a small SSD. |
40 |
> |
41 |
> The repeatability of some of the errors it throws makes me question whether |
42 |
> the disk or something else is at fault. (What would cause a file system to |
43 |
> be remounted read-only in the middle of its work?) |
44 |
|
45 |
I can think of 3 things, but more learned M/L contributors may add to these: |
46 |
|
47 |
1. The SATA connection has come loose. With time and movement it can come |
48 |
(slightly) adrift. Pushing it back in fully fixes this problem - also see No. |
49 |
2 below. |
50 |
|
51 |
2. The physical connector's contacts are beginning to oxidise. Reseat the |
52 |
SATA cable connectors both on the drive and any ribbons on the MoBo. This |
53 |
usualy cleans any oxidisation. |
54 |
|
55 |
3. The AHCI driver is deploying energy saving measures (aka. Aggressive Link |
56 |
Power Management - ALPM). Check the output of: |
57 |
|
58 |
cat /sys/class/scsi_host/host*/link_power_management_policy |
59 |
|
60 |
If it doesn't say 'max_performance' you'll need to revisit your BIOS settings |
61 |
and also PCIEASPM settings in the kernel. |
62 |
|
63 |
4. Finally, there is a chance the PSU is playing up. |
64 |
|
65 |
|
66 |
1 & 2 above are more noticeable on spinning disks, but it is a matter of scale |
67 |
before SSDs are affected too. If BIOS, kernel settings and drivers were not |
68 |
altered recently, then 1 & 2 merit attention in the first instance. |
69 |
|
70 |
|
71 |
> I have a spare four-core, 64-bit Celeron box (I bought it for a purpose |
72 |
> that's gone away). I've been wondering what to do with it, so maybe it can |
73 |
> replace the Atom box. It's powerful enough to compile its own software, |
74 |
> whereas the Atom needs help. Whichever I use, its job will be as a server |
75 |
> of DNS, LAN mail, time and git. Maybe print too. Also it will fetch my |
76 |
> ISP's POP mail and serve it over IMAP to this box. |
77 |
> |
78 |
> > The self-test capability of storage media is almost universally |
79 |
> > horrible and you generally don't get a failure report until your data |
80 |
> > has already been lost. If your SMART output gives you the raw |
81 |
> > statistics on the device instead of just pass/fail then analyzing that |
82 |
> > usually gives a better indication of whether something is about to go |
83 |
> > wrong. |
84 |
> |
85 |
> It seems to report only pass/fail, so that's not much help. |
86 |
> |
87 |
> Decisions, decisions... |
88 |
|
89 |
Do short/long smartctl tests report no errors, assuming the disk comes with |
90 |
S.M.A.R.T. capability? |
91 |
|
92 |
-- |
93 |
Regards, |
94 |
Mick |