1 |
On 24/12/14 11:24, Rich Freeman wrote: |
2 |
> On Tue, Dec 23, 2014 at 4:08 PM, Holger Hoffstätte |
3 |
> <holger.hoffstaette@××××××××××.com> wrote: |
4 |
>> On Tue, 23 Dec 2014 21:54:00 +0100, Stefan G. Weichinger wrote: |
5 |
>> |
6 |
>>> In the other direction: what protects against these errors you mention? |
7 |
>> |
8 |
>> ceph scrub :) |
9 |
>> |
10 |
> |
11 |
> Are you sure about that? I was under the impression that it just |
12 |
> checked that everything was retrievable. I'm not sure if it compares |
13 |
> all the copies of everything to make sure that they match, and if they |
14 |
> don't match I don't think that it has any way to know which one is |
15 |
> right. I believe an algorithm just picks one as the official version, |
16 |
> and it may or may not be identical to the one that was originally |
17 |
> stored. |
18 |
> |
19 |
> If the data is on btrfs then it is protected from silent corruption |
20 |
> since the filesystem will give an error when that node tries to read a |
21 |
> file, and presumably the cluster will find another copy elsewhere. On |
22 |
> the other hand if the file were logically overwritten in some way |
23 |
> above the btrfs layer then btrfs won't complain and the cluster won't |
24 |
> realize the file has been corrupted. |
25 |
> |
26 |
> If I'm wrong on this by all means point me to the truth. From |
27 |
> everything I read though I don't think that ceph maintains a list of |
28 |
> checksums on all the data that is stored while it is at rest. |
29 |
> |
30 |
> -- |
31 |
> Rich |
32 |
> |
33 |
|
34 |
Scrub used to pick up and fix errors - well mostly fix. Sometimes the |
35 |
whole thing collapses in a heap. The problem with small systems is that |
36 |
they are already very I/O restricted and you add either a scrub or deep |
37 |
scrub and it slows very noticeably more. On terrabytes of data it would |
38 |
take many hours after which checking the logs might find another error |
39 |
message so it had to be triggered again. I suspect some errors I got |
40 |
were btrfs related and but ceph certainly contributed its share. Not |
41 |
sure of the cause but they "seemed" to occur when the cluster was doing |
42 |
anything other than idle. As I used the "golden master/clone" approach |
43 |
to vm's corruption in the wrong place was very noticeable :( |
44 |
|
45 |
Towards the point I gave up it was getting better but I came to the |
46 |
conclusion the expensive upgrades I needed to fix the I/O problems of |
47 |
running lots of VM's at once wasn't worth it. |
48 |
|
49 |
BillK |