Gentoo Archives: gentoo-user

From: Rich Freeman <rich0@g.o>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Re: USB crucial file recovery
Date: Thu, 01 Sep 2016 23:06:39
Message-Id: CAGfcS_mX3N+JwMhsNKZCxqxwxeuDLhKVn9sXXvt9PoAxEyAUMg@mail.gmail.com
In Reply to: [gentoo-user] Re: USB crucial file recovery by Kai Krakow
1 On Thu, Sep 1, 2016 at 6:35 PM, Kai Krakow <hurikhan77@×××××.com> wrote:
2 > Am Tue, 30 Aug 2016 17:59:02 -0400
3 > schrieb Rich Freeman <rich0@g.o>:
4 >
5 >>
6 >> That depends on the mode of operation. In journal=data I believe
7 >> everything gets written twice, which should make it fairly immune to
8 >> most forms of corruption.
9 >
10 > No, journal != data integrity. Journal only ensure that data is written
11 > transactionally. You won't end up with messed up meta data, and from
12 > API perspective and with journal=data, a partial written block of data
13 > will be rewritten after recovering from a crash - up to the last fsync.
14 > If it happens that this last fsync was half way into a file: Well, then
15 > there's only your work written upto the half of the file.
16
17 Well, sure, but all an application needs to do is make sure it calls
18 write on whole files, and not half-files. It doesn't need to fsync as
19 far as I'm aware. It just needs to write consistent files in one
20 system call. Then that write either will or won't make it to disk,
21 but you won't get half of a write.
22
23 > Journals only ensure consistency on API level, not integrity.
24
25 Correct, but this is way better than not journaling or ordering data,
26 which protects the metadata but doesn't ensure your files aren't
27 garbled even if the application is careful.
28
29 >
30 > If you need integrity, so then file system can tell you if your file is
31 > broken or not, you need checksums.
32 >
33
34 Btrfs and zfs fail in the exact same way in this particular regard.
35 If you call write with half of a file, btrfs/zfs will tell you that
36 half of that file was successfully written. But, it won't hold up for
37 the other half of the file that the kernel hasn't been told about.
38
39 The checksumming in these filesystems really only protects data from
40 modification after it is written. Sectors that were only half-written
41 during an outage which have inconsistent checksums probably won't even
42 be looked at during an fsck/mount, because the filesystem is just
43 going to replay the journal and write right over them (or to some new
44 block, still treating the half-written data as unallocated). These
45 filesystems don't go scrubbing the disk to figure out what happened,
46 they just replay the log back to the last checkpoint. The checksums
47 are just used during routine reads to ensure the data wasn't somehow
48 corrupted after it was written, in which case a good copy is used,
49 assuming one exists. If not at least you'll know about the problem.
50
51 > If you need a way to recover from a half written file, you need a CoW
52 > file system where you could, by luck, go back some generations.
53
54 Only if you've kept snapshots, or plan to hex-edit your disk/etc. The
55 solution here is to correctly use the system calls.
56
57 >
58 >> f2fs would also have this benefit. Data is not overwritten in-place
59 >> in a log-based filesystem; they're essentially journaled by their
60 >> design (actually, they're basically what you get if you ditch the
61 >> regular part of the filesystem and keep nothing but the journal).
62 >
63 > This is log-structed, not journalled. You pointed that out, yes, but
64 > you weakened that by writing "basically the same". I think the
65 > difference is important. Mostly because the journal is a fixed area on
66 > the disk, while a log-structured file system has no such journal.
67
68 My point was that they're equivalent from the standpoint that every
69 write either completes or fails and you don't get half-written data.
70 Yes, I know how f2fs actually works, and this wasn't intended to be a
71 primer on log-based filesystems. The COW filesystems have similar
72 benefits since they don't overwrite data in place, other than maybe
73 their superblocks (or whatever you call them). I don't know what the
74 on-disk format of zfs is, but btrfs has multiple copies of the tree
75 root with a generation number so if something dies partway it is
76 really easy for it to figure out where it left off (if none of the
77 roots were updated then any partial tree structures laid down are in
78 unallocated space and just get rewritten on the next commit, and if
79 any were written then you have a fully consistent new tree used to
80 update the remaining roots).
81
82 One of these days I'll have to read up on the on-disk format of zfs as
83 I suspect it would make an interest contrast with btrfs.
84
85 >
86 > This point was raised because it supports checksums, not because it
87 > supports CoW.
88
89 Sure, but both provide benefits in these contexts. And the only COW
90 filesystems are also the only ones I'm aware of (at least in popular
91 use) that have checksums.
92
93 >
94 > Log structered file systems are, btw, interesting for write-mostly
95 > workloads on spinning disks because head movements are minimized.
96 > They are not automatically helping dumb/simple flash translation layers.
97 > This incorporates a little more logic by exploiting the internal
98 > structure of flash (writing only sequentially in page sized blocks,
99 > garbage collection and reuse only on erase block level). F2fs and
100 > bcache (as a caching layer) do this. Not sure about the others.
101
102 Sure. It is just really easy to do big block erases in a log-based
103 filesystem since everything tends to be written (and overwritten)
104 sequentially. You can of course build a log-based filesystem that
105 doesn't perform well on flash. They would still tend to have the
106 benefits of data journaling (for free; the cost is fragmentation which
107 is of course a bigger issue on disks).
108
109 --
110 Rich