Gentoo Archives: gentoo-user

From: Kai Krakow <hurikhan77@×××××.com>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Re: WARNING: Crucial MX300 drives SUUUUUCK!!!!
Date: Tue, 07 Mar 2017 08:07:49
Message-Id: 20170307090727.70d3a83f@jupiter.sol.kaishome.de
In Reply to: Re: [gentoo-user] Re: WARNING: Crucial MX300 drives SUUUUUCK!!!! by "Poison BL."
1 Am Mon, 6 Mar 2017 09:09:48 -0500
2 schrieb "Poison BL." <poisonbl@×××××.com>:
3
4 > On Mon, Mar 6, 2017 at 2:23 AM, Kai Krakow <hurikhan77@×××××.com>
5 > wrote:
6 >
7 > > Am Tue, 14 Feb 2017 16:14:23 -0500
8 > > schrieb "Poison BL." <poisonbl@×××××.com>:
9 > > > I actually see both sides of it... as nice as it is to have a
10 > > > chance to recover the information from between the last backup
11 > > > and the death of the drive, the reduced chance of corrupt data
12 > > > from a silently failing (spinning) disk making it into backups is
13 > > > a bit of a good balancing point for me.
14 > >
15 > > I've seen bordbackup giving me good protection to this. First, it
16 > > doesn't backup files which are already in the backup. So if data
17 > > silently changed, it won't make it into the backup. Second, it does
18 > > incremental backups. Even if something broke and made it into the
19 > > backup, you can eventually go back weeks or months to get back the
20 > > file. The algorithm is very efficient. And every incremental backup
21 > > is a full backup at the same time - so you thin out backup history
22 > > by deleting any backup at any time (so it's not like traditional
23 > > incremental backup which always needs the parent backup).
24 > >
25 > > OTOH, this means that every data block is only stored once. If
26 > > silent data corruption is hitting here, you loose the complete
27 > > history of this file (and maybe others using the same deduplicated
28 > > block).
29 > >
30 > > For the numbers, I'm storing my 1.7 TB system into a 3 TB disk
31 > > which is 2.2 TB full now. But the backup history is almost 1 year
32 > > now (daily backups).
33 > >
34 > > As a sort of protection against silent data corruption, you could
35 > > rsync borgbackup to a remote location. The differences are usually
36 > > small, so that should be a fast operation. Maybe to some cloud
37 > > storage or RAID protected NAS which can detect and correct silent
38 > > data corruption (like ZFS or btrfs based systems).
39 > >
40 > >
41 > > --
42 > > Regards,
43 > > Kai
44 > >
45 > > Replies to list-only preferred.
46 > >
47 >
48 > That's some impressive backup density... and I haven't looked into
49 > borgbackup, but it sounds like it runs on the same principles as the
50 > rsync+hardlink based scripts I've seen, though those will back up
51 > files that've silently changed, since the checksums won't match any
52 > more, but that won't blow away previous copies of the file either.
53 > I'll have to give it a try!
54
55 Borgbackup seems to check inodes to really fast get a listing of what
56 files changed. It only needs a few minutes to scan through millions of
57 files for me, rsync is way slower, and even "find" is slower I feel.
58 Taking a daily backup of takes usually 8-12 minutes for me (depending
59 on the delta), thinning the backup set from old backups takes another
60 1-2 minutes.
61
62 > As for protecting against the backup set itself getting silent
63 > corruption, an rsync to a remote location would help, but you would
64 > have to ensure it doesn't overwrite anything already there that
65 > may've changed, only create new.
66
67 Use timestamp check only in rsync, not contents check. This should work
68 for borgbackup as it is only creating newer files, never older.
69
70 > Also, making the initial clone would
71 > take ages, I suspect, since it would have to rebuild the hardlink set
72 > for everything (again, assuming that's the trick borgbackup's using).
73
74 No, that's not the trick. Stored files are stored as chunks. Chunks are
75 split based on a moving window checksumming algorithm to detect
76 duplicate file blocks. So, deduplication is not done at file level but
77 subfile level (block level with variable block sizes).
78
79 Additionally, those chunks can be compressed with lz4, gzip, and I
80 think xz (the latter being painfully slow of course).
81
82 > One of the best options is to house the base backup set itself on
83 > something like zfs or btrfs on a system with ecc ram, and maintain
84 > checksums of everything on the side (crc32 would likely suffice, but
85 > sha1's fast enough these days there's almost no excuse not to use
86 > it). It might be possible to task tripwire to keep tabs on that side
87 > of it, now that I consider it. While the filesystem itself in that
88 > case is trying its best to prevent issues, there's always that slim
89 > risk that there's a bug in the filesystem code itself that eats
90 > something, hence the added layer of paranoia. Also, with ZFS for the
91 > base data set,
92 > you gain in-place compression,
93
94 Is already done by borgbackup.
95
96 > dedup
97
98 Is also done by borgbackup.
99
100 > if you're feeling
101 > adventurous
102
103 You don't have to because you can use a more simple filesystem for
104 borgbackup. I'm storing on xfs and yet plan to sync to remote.
105
106 > (not really worth it unless you have multiple very
107 > similar backup sets for different systems), block level checksums,
108 > redundancy across physical disks, in place snapshots, and the ability
109 > to use zfs send/receive to do snapshot backups of the backup set
110 > itself.
111 >
112 > I managed to corrupt some data with zfs (w/ dedup, on gentoo) shared
113 > out over nfs a while back on a box with way too little ram a while
114 > back (nothing important, throwaway VM images), hence the paranoia of
115 > secondary checksum auditing and still replicating the backup set for
116 > things that might be important.
117
118
119 --
120 Regards,
121 Kai
122
123 Replies to list-only preferred.