Gentoo Archives: gentoo-user

From: Rich Freeman <rich0@g.o>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Filesystem choice for NVMe SSD
Date: Sun, 24 Jan 2016 01:13:44
Message-Id: CAGfcS_m_=vDCBrm0wz6PCKLPC66PV8UykyzSAC2MWSmnFQA2sw@mail.gmail.com
In Reply to: [gentoo-user] Filesystem choice for NVMe SSD by Andrew Savchenko
1 On Sat, Jan 23, 2016 at 7:44 PM, Andrew Savchenko <bircoph@g.o> wrote:
2 >
3 > a) EXT4 is a good extremely robust solution. Reliability is out of
4 > the questioning: on my old box with bad memory banks it kept my data
5 > safe for years, almost all losses were recoverable. And it has some
6 > SSD-oriented features like discard support, also stripe and stride
7 > with can be aligned to erase block size to optimize erase
8 > operations and reduce wear-out.
9
10 I think EXT4 is the conservative solution. It is also more flexible
11 than xfs, and a decent performer all around.
12
13 > b) In some tests XFS is better than EXT4 ([1] slides 16-18; [2]).
14 > Though I had data loss on XFS on unclear shutdown events in the
15 > past. This was about 5 years ago, so XFS robustness should have
16 > improved, of course, but I still remember the pain :/
17
18 Maybe it improved, but xfs probably hasn't changed much at all. It
19 isn't really a focus of development as far as I'm aware. I probably
20 wouldn't use it. I used to use it, but was frustrated with its
21 inability to shrink and the zero-files feature.
22
23 > c) F2FS looks very interesting, it has really good flash-oriented
24 > design [3]. Also it seems to beat EXT4 on PCIe SSD ([3] chapter
25 > 3.2.2, pages 9-10) and everything other on compile test ([2] page 5)
26 > which should be close to the type of workload I'm interested in
27 > (though all tests in [2] have extra raid layer). The only thing
28 > that bothers me is some data loss reports for F2FS found on the
29 > net, though all I found is dated back 2012-2014 and F2FS have fsck
30 > tool now, thus it should be more reliable these days.
31
32 So, F2FS is of course very promising on flash. It should be the most
33 efficient solution in terms of even wear of your drive. I'd think
34 that lots of short-term files like compiling would actually be a good
35 use case for it, Since discarded files don't need to be rewritten when
36 it rolls over. But, I won't argue with the benchmarks.
37
38 It probably will improve as well as it matures. However, it isn't
39 nearly as mature as ext4 or xfs, so from a data-integrity standpoint
40 you're definitely at higher risk. If you're regularly backing up and
41 don't care about a very low risk of problems, It is probably a good
42 choice.
43
44 > d) I'm not sure about BTRFS, since it is very sophisticated and I'm
45 > not interested in its advanced features such as snapshots,
46 > checksums, subvolumes and so on. In some tests [2] it tends to
47 > achive better performance than ext4, but due to its sophisticated
48 > nature it is bound to add more code paths and latency than other
49 > solutions.
50
51 The only reason to use btrfs at this point are all those advanced
52 features, such as being able to copy a directory with 10 million small
53 files in it in 5 seconds. I'd think that might be useful for
54 development, but of course git does the same thing. Git and btrfs are
55 actually somewhat similar in principle. Take git with the ability to
56 erase blobs and mirror things and that is kind of like btrfs.
57
58 Btrfs is also immature, and while data loss on an n-1 longterm like
59 3.18 is fairly rare it does happen. It is optimized for ssd but
60 generally tends to underperform the write-in-place filesystems, mainly
61 because it isn't optimized, and of course it can't write-in-place.
62
63 If you REALLY don't care about the data integrity features and such
64 then I don't think it is the best solution for you.
65
66 > P.S. Is aligning to erase block size really important for NVMe? I
67 > can't find erase block size for this drive (Samsung MZ-VKV512)
68 > neither in official docs nor on the net...
69
70 Unless the erase blocks are a single sector in size then I'd think
71 that alignment would matter. Now, for F2FS alignment probably matters
72 far less than other filesystems since the only blocks on the entire
73 drive that may potentially be partially erased are the ones that
74 border two log regions. F2FS just writes each block in a region once,
75 and then trims and entire contiguous region when it fills the previous
76 region up. Large contiguous trims with individual blocks being
77 written once are basically a best-case for flash, which is of course
78 why it works that way. You should still ensure it is aligned, but not
79 much will happen if it isn't I'd think.
80
81 For something like ext4 where blocks are constantly overwritten I'd
82 think that poor alignment is going to really hurt your performance.
83 Btrfs might be somewhere in-between - it doesn't overwrite data in
84 place, but it does write all over the disk so it would be constantly
85 be hitting erase block borders if not aligned. That is just a
86 hand-waving argument - I have no idea how they work in practice.
87
88 --
89 Rich

Replies

Subject Author
Re: [gentoo-user] Filesystem choice for NVMe SSD Andrew Savchenko <bircoph@g.o>