Gentoo Archives: gentoo-dev

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] Re: RFC: using .xz for doc/man/info compression
Date: Tue, 13 May 2014 17:27:41
Message-Id: pan$8dbef$4f96a0f6$eda0c1b3$7f9f5663@cox.net
In Reply to: Re: [gentoo-dev] RFC: using .xz for doc/man/info compression by Rich Freeman
1 Rich Freeman posted on Tue, 13 May 2014 08:18:25 -0400 as excerpted:
2
3 > Btrfs also supports file inlining, so every byte saved on small files
4 > does actually help (I believe the data structure that stores the inlined
5 > data doesn't have a fixed record size).
6
7 There's an option for it, altho I've not screwed with it and don't know
8 the default without looking it up.
9
10 The overall metadata node size (set at mkfs.btrfs time) originally
11 defaulted to the filesystem block size, which is the memory page size,
12 thus 4096 bytes on x86/amd64 and I believe arm. However, the metadata
13 node size default recently changed to 16KiB (or page size where that is
14 larger than 16KiB), altho I'd guess there's still more 4KiB node size
15 users due to all the legacy btrfs out there, but 16KiB will certainly be
16 the majority at some point.
17
18 Individual file inline size is certainly smaller than metadata node size,
19 but again, I've not messed with that so don't know the actual default for
20 it.
21
22 > Then again, btrfs also supports lzo compression and I believe this is
23 > fairly widely used, so I'm not sure that the impact of not compressing
24 > small files will be felt.
25
26 Of course there's gzip as well, and it's the (now legacy) default if
27 compression is specified but not type, altho lzo is recommended as faster
28 with "good enough" compression.
29
30 The other factor to consider is replication mode. On a single device
31 filesystem data replication mode is single by default, with metadata dup
32 (two copies), except on detected ssd, where the metadata default is
33 (somewhat controversially) single due to some ssds doing internal
34 deduplication. On multi-device filesystems the metadata default is (two-
35 copy, regardless of the number of devices) raid1, while the data default
36 remains single.
37
38 So from a size perspective, assuming defaults of single data, dup or
39 raid1 metadata, uncompressed, the cutover should be near 2048 bytes,
40 since under that, duplicated metadata inlining will still be smaller than
41 the 4096 byte data block size, while over that, sticking it in a single-
42 mode data extent should be more efficient.
43
44 Bottom line, there's enough btrfs variables including inlining size, data
45 vs. metadata replication modes, metadata node sizes and compression and
46 compression type, and the chances that gentoo btrfs users are likely to
47 be tweaking at least one of those variables is high enough, that I'm not
48 sure a generic ideal cutover makes a lot of sense, but to the extent that
49 there is one, it's likely to be near 2048 bytes.
50
51 FWIW I believe I'm still using portage bzip2 docs compression by default
52 here, altho in the context of this thread I should really examine that
53 since I use compress=lzo at the filesystem level. Both data and metadata
54 are raid1 here, so inlining doesn't matter except that AFAIK inlining is
55 NOT compressed while data extents can be, so portage level compression is
56 likely to make even less difference if it's in the range that portage
57 level bzip2 compression makes it small enough to be inlined, vs not
58 portage level compressed but then big enough to not be inlined, thus
59 btrfs-level transparent lzo compressed as a data extent.
60
61 --
62 Duncan - List replies preferred. No HTML msgs.
63 "Every nonfree program has a lord, a master --
64 and if you use the program, he is your master." Richard Stallman