Gentoo Archives: gentoo-user

From:	Rich Freeman <rich0@g.o>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Re: [offtopic] Copy-On-Write ?
Date:	Sat, 16 Sep 2017 17:05:29
Message-Id:	`CAGfcS_nmn_m=Qbm=75V5xcauP-mbPrmsfUAc2dQdxrwjU4Mycw@mail.gmail.com`
In Reply to:	[gentoo-user] Re: [offtopic] Copy-On-Write ? by Kai Krakow

1	On Sat, Sep 16, 2017 at 9:43 AM, Kai Krakow <hurikhan77@×××××.com> wrote:
2	>
3	> Actually, I'm running across 3x 1TB here on my desktop, with mraid1 and
4	> draid 0. Combined with bcache it gives confident performance.
5	>
6
7	Not entirely sure I'd use the word "confident" to describe a
8	filesystem where the loss of one disk guarantees that:
9	1. You will lose data (no data redundancy).
10	2. But the filesystem will be able to tell you exactly what data you
11	lost (as metadata will be fine).
12
13	>
14	> I was very happy a long time with XFS but switched to btrfs when it
15	> became usable due to compression and stuff. But performance of
16	> compression seems to get worse lately, IO performance drops due to
17	> hogged CPUs even if my system really isn't that incapable.
18	>
19
20	Btrfs performance is pretty bad in general right now. The problem is
21	that they just simply haven't gotten around to optimizing it fully,
22	mainly because they're more focused on getting rid of the data
23	corruption bugs (which is of course the right priority). For example,
24	with raid1 mode btrfs picks the disk to use for raid based on whether
25	the PID is even or odd, without any regard to disk utilization.
26
27	When I moved to zfs I noticed a huge performance boost.
28
29	Fundamentally I don't see why btrfs can't perform just as well as the
30	others. It just isn't there yet.
31
32	> What's still cool is that I don't need to manage volumes since the
33	> volume manager is built into btrfs. XFS on LVM was not that flexible.
34	> If btrfs wouldn't have this feature, I probably would have switched
35	> back to XFS already.
36
37	My main concern with xfs/ext4 is that neither provides on-disk
38	checksums or protection against the raid write hole.
39
40	I just switched motherboards a few weeks ago and either a connection
41	or a SATA port was bad because one of my drives was getting a TON of
42	checksum errors on zfs. I moved it to an LSI card and scrubbed, and
43	while it took forever and the system degraded the array more than once
44	due to the high error rate, eventually it patched up all the errors
45	and now the array is working without issue. I didn't suffer more than
46	a bit of inconvenience but with even mdadm raid1 I'd have had a HUGE
47	headache trying to recover from that (doing who knows how much
48	troubleshooting before realizing I had to do a slow full restore from
49	backup with the system down).
50
51	I just don't see how a modern filesystem can get away without having
52	full checksum support. It is a bit odd that it has taken so long for
53	Ceph to introduce it, and I'm still not sure if it is truly
54	end-to-end, or if at any point in its life the data isn't protected by
55	checksums. If I were designing something like Ceph I'd checksum the
56	data at the client the moment it enters storage, then independently
57	store the checksum and data, and then retrieve both and check it at
58	the client when the data leaves storage. Then you're protected
59	against corruption at any layer below that. You could of course have
60	additional protections to catch errors sooner before the client even
61	sees them. I think that the issue is that Ceph was really designed
62	for object storage originally and they just figured the application
63	would be responsible for data integrity.
64
65	The other benefit of checksums is that if they're done right scrubs
66	can go a lot faster, because you don't have to scrub all the
67	redundancy data synchronously. You can just start an idle-priority
68	read thread on every drive and then pause it anytime a drive is
69	accessed, and an access on one drive won't slow down the others. With
70	traditional RAID you have to read all the redundancy data
71	synchronously because you can't check the integrity of any of it
72	without the full set. I think even ZFS is stuck doing synchronous
73	reads due to how it stores/computes the checksums. This is something
74	btrfs got right.
75
76	>
77	>> For the moment I'm
78	>> relying more on zfs.
79	>
80	> How does it perform memory-wise? Especially, I'm currently using bees[1]
81	> for deduplication: It uses a 1G memory mapped file (you can choose
82	> other sizes if you want), and it picks up new files really fast, within
83	> a minute. I don't think zfs can do anything like that within the same
84	> resources.
85
86	I'm not using deduplication, but my understanding is that zfs deduplication:
87	1. Works just fine.
88	2. Uses a TON of RAM.
89
90	So, it might not be your cup of tea. There is no way to do
91	semi-offline dedup as with btrfs (not really offline in that the
92	filesystem is fully running - just that you periodically scan for dups
93	and fix them after the fact, vs detect them in realtime). With a
94	semi-offline mode then the performance hits would only come at a time
95	of my choosing, vs using gobs of RAM all the time to detect what are
96	probably fairly rare dups.
97
98	That aside, I find it works fine memory-wise (I don't use dedup). It
99	has its own cache system not integrated fully into the kernel's native
100	cache, so it tends to hold on to a lot more ram than other
101	filesystems, but you can tune this behavior so that it stays fairly
102	tame.
103
104	--
105	Rich

Replies

Subject	Author
[gentoo-user] Re: [offtopic] Copy-On-Write ?	Kai Krakow <hurikhan77@×××××.com>

Report Message

Find on MARC Find on Google Groups