Re: [gentoo-user] Re: USB crucial file recovery - gentoo-user

From:	Rich Freeman <rich0@g.o>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Re: USB crucial file recovery
Date:	Thu, 01 Sep 2016 23:06:39
Message-Id:	`CAGfcS_mX3N+JwMhsNKZCxqxwxeuDLhKVn9sXXvt9PoAxEyAUMg@mail.gmail.com`
In Reply to:	[gentoo-user] Re: USB crucial file recovery by Kai Krakow

1

On Thu, Sep 1, 2016 at 6:35 PM, Kai Krakow <hurikhan77@×××××.com> wrote:

2

> Am Tue, 30 Aug 2016 17:59:02 -0400

3

> schrieb Rich Freeman <rich0@g.o>:

4

>

5

>>

6

>> That depends on the mode of operation.  In journal=data I believe

7

>> everything gets written twice, which should make it fairly immune to

8

>> most forms of corruption.

9

>

10

> No, journal != data integrity. Journal only ensure that data is written

11

> transactionally. You won't end up with messed up meta data, and from

12

> API perspective and with journal=data, a partial written block of data

13

> will be rewritten after recovering from a crash - up to the last fsync.

14

> If it happens that this last fsync was half way into a file: Well, then

15

> there's only your work written upto the half of the file.

16

17

Well, sure, but all an application needs to do is make sure it calls

18

write on whole files, and not half-files.  It doesn't need to fsync as

19

far as I'm aware.  It just needs to write consistent files in one

20

system call.  Then that write either will or won't make it to disk,

21

but you won't get half of a write.

22

23

> Journals only ensure consistency on API level, not integrity.

24

25

Correct, but this is way better than not journaling or ordering data,

26

which protects the metadata but doesn't ensure your files aren't

27

garbled even if the application is careful.

28

29

>

30

> If you need integrity, so then file system can tell you if your file is

31

> broken or not, you need checksums.

32

>

33

34

Btrfs and zfs fail in the exact same way in this particular regard.

35

If you call write with half of a file, btrfs/zfs will tell you that

36

half of that file was successfully written.  But, it won't hold up for

37

the other half of the file that the kernel hasn't been told about.

38

39

The checksumming in these filesystems really only protects data from

40

modification after it is written.  Sectors that were only half-written

41

during an outage which have inconsistent checksums probably won't even

42

be looked at during an fsck/mount, because the filesystem is just

43

going to replay the journal and write right over them (or to some new

44

block, still treating the half-written data as unallocated).  These

45

filesystems don't go scrubbing the disk to figure out what happened,

46

they just replay the log back to the last checkpoint.  The checksums

47

are just used during routine reads to ensure the data wasn't somehow

48

corrupted after it was written, in which case a good copy is used,

49

assuming one exists.  If not at least you'll know about the problem.

50

51

> If you need a way to recover from a half written file, you need a CoW

52

> file system where you could, by luck, go back some generations.

53

54

Only if you've kept snapshots, or plan to hex-edit your disk/etc.  The

55

solution here is to correctly use the system calls.

56

57

>

58

>> f2fs would also have this benefit.  Data is not overwritten in-place

59

>> in a log-based filesystem; they're essentially journaled by their

60

>> design (actually, they're basically what you get if you ditch the

61

>> regular part of the filesystem and keep nothing but the journal).

62

>

63

> This is log-structed, not journalled. You pointed that out, yes, but

64

> you weakened that by writing "basically the same". I think the

65

> difference is important. Mostly because the journal is a fixed area on

66

> the disk, while a log-structured file system has no such journal.

67

68

My point was that they're equivalent from the standpoint that every

69

write either completes or fails and you don't get half-written data.

70

Yes, I know how f2fs actually works, and this wasn't intended to be a

71

primer on log-based filesystems.  The COW filesystems have similar

72

benefits since they don't overwrite data in place, other than maybe

73

their superblocks (or whatever you call them).  I don't know what the

74

on-disk format of zfs is, but btrfs has multiple copies of the tree

75

root with a generation number so if something dies partway it is

76

really easy for it to figure out where it left off (if none of the

77

roots were updated then any partial tree structures laid down are in

78

unallocated space and just get rewritten on the next commit, and if

79

any were written then you have a fully consistent new tree used to

80

update the remaining roots).

81

82

One of these days I'll have to read up on the on-disk format of zfs as

83

I suspect it would make an interest contrast with btrfs.

84

85

>

86

> This point was raised because it supports checksums, not because it

87

> supports CoW.

88

89

Sure, but both provide benefits in these contexts.  And the only COW

90

filesystems are also the only ones I'm aware of (at least in popular

91

use) that have checksums.

92

93

>

94

> Log structered file systems are, btw, interesting for write-mostly

95

> workloads on spinning disks because head movements are minimized.

96

> They are not automatically helping dumb/simple flash translation layers.

97

> This incorporates a little more logic by exploiting the internal

98

> structure of flash (writing only sequentially in page sized blocks,

99

> garbage collection and reuse only on erase block level). F2fs and

100

> bcache (as a caching layer) do this. Not sure about the others.

101

102

Sure.  It is just really easy to do big block erases in a log-based

103

filesystem since everything tends to be written (and overwritten)

104

sequentially.  You can of course build a log-based filesystem that

105

doesn't perform well on flash.  They would still tend to have the

106

benefits of data journaling (for free; the cost is fragmentation which

107

is of course a bigger issue on disks).

108

109

--

110

Rich

1	On Thu, Sep 1, 2016 at 6:35 PM, Kai Krakow <hurikhan77@×××××.com> wrote:
2	> Am Tue, 30 Aug 2016 17:59:02 -0400
3	> schrieb Rich Freeman <rich0@g.o>:
4	>
5	>>
6	>> That depends on the mode of operation. In journal=data I believe
7	>> everything gets written twice, which should make it fairly immune to
8	>> most forms of corruption.
9	>
10	> No, journal != data integrity. Journal only ensure that data is written
11	> transactionally. You won't end up with messed up meta data, and from
12	> API perspective and with journal=data, a partial written block of data
13	> will be rewritten after recovering from a crash - up to the last fsync.
14	> If it happens that this last fsync was half way into a file: Well, then
15	> there's only your work written upto the half of the file.
16
17	Well, sure, but all an application needs to do is make sure it calls
18	write on whole files, and not half-files. It doesn't need to fsync as
19	far as I'm aware. It just needs to write consistent files in one
20	system call. Then that write either will or won't make it to disk,
21	but you won't get half of a write.
22
23	> Journals only ensure consistency on API level, not integrity.
24
25	Correct, but this is way better than not journaling or ordering data,
26	which protects the metadata but doesn't ensure your files aren't
27	garbled even if the application is careful.
28
29	>
30	> If you need integrity, so then file system can tell you if your file is
31	> broken or not, you need checksums.
32	>
33
34	Btrfs and zfs fail in the exact same way in this particular regard.
35	If you call write with half of a file, btrfs/zfs will tell you that
36	half of that file was successfully written. But, it won't hold up for
37	the other half of the file that the kernel hasn't been told about.
38
39	The checksumming in these filesystems really only protects data from
40	modification after it is written. Sectors that were only half-written
41	during an outage which have inconsistent checksums probably won't even
42	be looked at during an fsck/mount, because the filesystem is just
43	going to replay the journal and write right over them (or to some new
44	block, still treating the half-written data as unallocated). These
45	filesystems don't go scrubbing the disk to figure out what happened,
46	they just replay the log back to the last checkpoint. The checksums
47	are just used during routine reads to ensure the data wasn't somehow
48	corrupted after it was written, in which case a good copy is used,
49	assuming one exists. If not at least you'll know about the problem.
50
51	> If you need a way to recover from a half written file, you need a CoW
52	> file system where you could, by luck, go back some generations.
53
54	Only if you've kept snapshots, or plan to hex-edit your disk/etc. The
55	solution here is to correctly use the system calls.
56
57	>
58	>> f2fs would also have this benefit. Data is not overwritten in-place
59	>> in a log-based filesystem; they're essentially journaled by their
60	>> design (actually, they're basically what you get if you ditch the
61	>> regular part of the filesystem and keep nothing but the journal).
62	>
63	> This is log-structed, not journalled. You pointed that out, yes, but
64	> you weakened that by writing "basically the same". I think the
65	> difference is important. Mostly because the journal is a fixed area on
66	> the disk, while a log-structured file system has no such journal.
67
68	My point was that they're equivalent from the standpoint that every
69	write either completes or fails and you don't get half-written data.
70	Yes, I know how f2fs actually works, and this wasn't intended to be a
71	primer on log-based filesystems. The COW filesystems have similar
72	benefits since they don't overwrite data in place, other than maybe
73	their superblocks (or whatever you call them). I don't know what the
74	on-disk format of zfs is, but btrfs has multiple copies of the tree
75	root with a generation number so if something dies partway it is
76	really easy for it to figure out where it left off (if none of the
77	roots were updated then any partial tree structures laid down are in
78	unallocated space and just get rewritten on the next commit, and if
79	any were written then you have a fully consistent new tree used to
80	update the remaining roots).
81
82	One of these days I'll have to read up on the on-disk format of zfs as
83	I suspect it would make an interest contrast with btrfs.
84
85	>
86	> This point was raised because it supports checksums, not because it
87	> supports CoW.
88
89	Sure, but both provide benefits in these contexts. And the only COW
90	filesystems are also the only ones I'm aware of (at least in popular
91	use) that have checksums.
92
93	>
94	> Log structered file systems are, btw, interesting for write-mostly
95	> workloads on spinning disks because head movements are minimized.
96	> They are not automatically helping dumb/simple flash translation layers.
97	> This incorporates a little more logic by exploiting the internal
98	> structure of flash (writing only sequentially in page sized blocks,
99	> garbage collection and reuse only on erase block level). F2fs and
100	> bcache (as a caching layer) do this. Not sure about the others.
101
102	Sure. It is just really easy to do big block erases in a log-based
103	filesystem since everything tends to be written (and overwritten)
104	sequentially. You can of course build a log-based filesystem that
105	doesn't perform well on flash. They would still tend to have the
106	benefits of data journaling (for free; the cost is fragmentation which
107	is of course a bigger issue on disks).
108
109	--
110	Rich

Gentoo Archives: gentoo-user