1 |
Am Mon, 6 Mar 2017 09:09:48 -0500 |
2 |
schrieb "Poison BL." <poisonbl@×××××.com>: |
3 |
|
4 |
> On Mon, Mar 6, 2017 at 2:23 AM, Kai Krakow <hurikhan77@×××××.com> |
5 |
> wrote: |
6 |
> |
7 |
> > Am Tue, 14 Feb 2017 16:14:23 -0500 |
8 |
> > schrieb "Poison BL." <poisonbl@×××××.com>: |
9 |
> > > I actually see both sides of it... as nice as it is to have a |
10 |
> > > chance to recover the information from between the last backup |
11 |
> > > and the death of the drive, the reduced chance of corrupt data |
12 |
> > > from a silently failing (spinning) disk making it into backups is |
13 |
> > > a bit of a good balancing point for me. |
14 |
> > |
15 |
> > I've seen bordbackup giving me good protection to this. First, it |
16 |
> > doesn't backup files which are already in the backup. So if data |
17 |
> > silently changed, it won't make it into the backup. Second, it does |
18 |
> > incremental backups. Even if something broke and made it into the |
19 |
> > backup, you can eventually go back weeks or months to get back the |
20 |
> > file. The algorithm is very efficient. And every incremental backup |
21 |
> > is a full backup at the same time - so you thin out backup history |
22 |
> > by deleting any backup at any time (so it's not like traditional |
23 |
> > incremental backup which always needs the parent backup). |
24 |
> > |
25 |
> > OTOH, this means that every data block is only stored once. If |
26 |
> > silent data corruption is hitting here, you loose the complete |
27 |
> > history of this file (and maybe others using the same deduplicated |
28 |
> > block). |
29 |
> > |
30 |
> > For the numbers, I'm storing my 1.7 TB system into a 3 TB disk |
31 |
> > which is 2.2 TB full now. But the backup history is almost 1 year |
32 |
> > now (daily backups). |
33 |
> > |
34 |
> > As a sort of protection against silent data corruption, you could |
35 |
> > rsync borgbackup to a remote location. The differences are usually |
36 |
> > small, so that should be a fast operation. Maybe to some cloud |
37 |
> > storage or RAID protected NAS which can detect and correct silent |
38 |
> > data corruption (like ZFS or btrfs based systems). |
39 |
> > |
40 |
> > |
41 |
> > -- |
42 |
> > Regards, |
43 |
> > Kai |
44 |
> > |
45 |
> > Replies to list-only preferred. |
46 |
> > |
47 |
> |
48 |
> That's some impressive backup density... and I haven't looked into |
49 |
> borgbackup, but it sounds like it runs on the same principles as the |
50 |
> rsync+hardlink based scripts I've seen, though those will back up |
51 |
> files that've silently changed, since the checksums won't match any |
52 |
> more, but that won't blow away previous copies of the file either. |
53 |
> I'll have to give it a try! |
54 |
|
55 |
Borgbackup seems to check inodes to really fast get a listing of what |
56 |
files changed. It only needs a few minutes to scan through millions of |
57 |
files for me, rsync is way slower, and even "find" is slower I feel. |
58 |
Taking a daily backup of takes usually 8-12 minutes for me (depending |
59 |
on the delta), thinning the backup set from old backups takes another |
60 |
1-2 minutes. |
61 |
|
62 |
> As for protecting against the backup set itself getting silent |
63 |
> corruption, an rsync to a remote location would help, but you would |
64 |
> have to ensure it doesn't overwrite anything already there that |
65 |
> may've changed, only create new. |
66 |
|
67 |
Use timestamp check only in rsync, not contents check. This should work |
68 |
for borgbackup as it is only creating newer files, never older. |
69 |
|
70 |
> Also, making the initial clone would |
71 |
> take ages, I suspect, since it would have to rebuild the hardlink set |
72 |
> for everything (again, assuming that's the trick borgbackup's using). |
73 |
|
74 |
No, that's not the trick. Stored files are stored as chunks. Chunks are |
75 |
split based on a moving window checksumming algorithm to detect |
76 |
duplicate file blocks. So, deduplication is not done at file level but |
77 |
subfile level (block level with variable block sizes). |
78 |
|
79 |
Additionally, those chunks can be compressed with lz4, gzip, and I |
80 |
think xz (the latter being painfully slow of course). |
81 |
|
82 |
> One of the best options is to house the base backup set itself on |
83 |
> something like zfs or btrfs on a system with ecc ram, and maintain |
84 |
> checksums of everything on the side (crc32 would likely suffice, but |
85 |
> sha1's fast enough these days there's almost no excuse not to use |
86 |
> it). It might be possible to task tripwire to keep tabs on that side |
87 |
> of it, now that I consider it. While the filesystem itself in that |
88 |
> case is trying its best to prevent issues, there's always that slim |
89 |
> risk that there's a bug in the filesystem code itself that eats |
90 |
> something, hence the added layer of paranoia. Also, with ZFS for the |
91 |
> base data set, |
92 |
> you gain in-place compression, |
93 |
|
94 |
Is already done by borgbackup. |
95 |
|
96 |
> dedup |
97 |
|
98 |
Is also done by borgbackup. |
99 |
|
100 |
> if you're feeling |
101 |
> adventurous |
102 |
|
103 |
You don't have to because you can use a more simple filesystem for |
104 |
borgbackup. I'm storing on xfs and yet plan to sync to remote. |
105 |
|
106 |
> (not really worth it unless you have multiple very |
107 |
> similar backup sets for different systems), block level checksums, |
108 |
> redundancy across physical disks, in place snapshots, and the ability |
109 |
> to use zfs send/receive to do snapshot backups of the backup set |
110 |
> itself. |
111 |
> |
112 |
> I managed to corrupt some data with zfs (w/ dedup, on gentoo) shared |
113 |
> out over nfs a while back on a box with way too little ram a while |
114 |
> back (nothing important, throwaway VM images), hence the paranoia of |
115 |
> secondary checksum auditing and still replicating the backup set for |
116 |
> things that might be important. |
117 |
|
118 |
|
119 |
-- |
120 |
Regards, |
121 |
Kai |
122 |
|
123 |
Replies to list-only preferred. |