Gentoo Archives: gentoo-user

From: Bill Kenworthy <billk@×××××××××.au>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Re: ceph on gentoo?
Date: Wed, 24 Dec 2014 04:34:26
Message-Id: 549A423A.7060102@iinet.net.au
In Reply to: Re: [gentoo-user] Re: ceph on gentoo? by Rich Freeman
1 On 24/12/14 11:24, Rich Freeman wrote:
2 > On Tue, Dec 23, 2014 at 4:08 PM, Holger Hoffstätte
3 > <holger.hoffstaette@××××××××××.com> wrote:
4 >> On Tue, 23 Dec 2014 21:54:00 +0100, Stefan G. Weichinger wrote:
5 >>
6 >>> In the other direction: what protects against these errors you mention?
7 >>
8 >> ceph scrub :)
9 >>
10 >
11 > Are you sure about that? I was under the impression that it just
12 > checked that everything was retrievable. I'm not sure if it compares
13 > all the copies of everything to make sure that they match, and if they
14 > don't match I don't think that it has any way to know which one is
15 > right. I believe an algorithm just picks one as the official version,
16 > and it may or may not be identical to the one that was originally
17 > stored.
18 >
19 > If the data is on btrfs then it is protected from silent corruption
20 > since the filesystem will give an error when that node tries to read a
21 > file, and presumably the cluster will find another copy elsewhere. On
22 > the other hand if the file were logically overwritten in some way
23 > above the btrfs layer then btrfs won't complain and the cluster won't
24 > realize the file has been corrupted.
25 >
26 > If I'm wrong on this by all means point me to the truth. From
27 > everything I read though I don't think that ceph maintains a list of
28 > checksums on all the data that is stored while it is at rest.
29 >
30 > --
31 > Rich
32 >
33
34 Scrub used to pick up and fix errors - well mostly fix. Sometimes the
35 whole thing collapses in a heap. The problem with small systems is that
36 they are already very I/O restricted and you add either a scrub or deep
37 scrub and it slows very noticeably more. On terrabytes of data it would
38 take many hours after which checking the logs might find another error
39 message so it had to be triggered again. I suspect some errors I got
40 were btrfs related and but ceph certainly contributed its share. Not
41 sure of the cause but they "seemed" to occur when the cluster was doing
42 anything other than idle. As I used the "golden master/clone" approach
43 to vm's corruption in the wrong place was very noticeable :(
44
45 Towards the point I gave up it was getting better but I came to the
46 conclusion the expensive upgrades I needed to fix the I/O problems of
47 running lots of VM's at once wasn't worth it.
48
49 BillK