Gentoo Archives: gentoo-user

From: Laurence Perkins <lperkins@×××××××.net>
To: "gentoo-user@l.g.o" <gentoo-user@l.g.o>
Subject: RE: [gentoo-user] Backup program that compresses data but only changes new files.
Date: Mon, 15 Aug 2022 17:24:40
Message-Id: MW2PR07MB405825541C9188C995C2F978D2689@MW2PR07MB4058.namprd07.prod.outlook.com
In Reply to: Re: [gentoo-user] Backup program that compresses data but only changes new files. by Dale
1 >>
2 >
3 >
4 >Duplicity sounds interesting except that I already have the drive encrypted. Keep in mind, these are external drives that I hook up long enough to complete the backups then back in a fire safe they go. The reason I mentioned being like rsync, I don't want to rebuild a backup from scratch each time as that would be time consuming. I thought of using Kbackup ages ago and it rebuilds from scratch each time but it does have the option of compressing. That might work for small stuff but not many TBs of it. Back in the early 90's, I remember using a backup software that was incremental. It would only update files that changed and would do it over several floppy disks and compressed it as well. Something like that nowadays is likely rare if it exists at all since floppies are long dead. I either need to split my backup into two pieces or compress my data. That is why I mentioned if there is a way to backup first part of alphabet in one command, switch disks and then do second part of alphabet to another disk.
5 >
6 >Mostly, I just want to add compression to what I do now. I figure there is a tool for it but no idea what it is called. Another method is splitting into two parts. In the long run, either should work and may end up needing both at some point. :/ If I could add both now, save me some problems later on. I guess.
7 >
8 >I might add, I also thought about using a Raspberry Pi thingy and having sort of a small scale NAS thing. I'm not sure about that thing either tho. Plus, they pricey right now. $$$
9 >
10 >Dale
11 >
12 >:-) :-)
13 >
14
15 Ok, so you have a few options here. Duplicity and Borg seem to be two of the most popular, and with good reason. They are quite powerful.
16
17 Duplicity due to the massive number of storage backends it supports, meaning that the difference between backing up to your on-site disks or shooting it off over the Internet to practically any storage service you care to think of is one parameter. (And I recommend, if nothing else, coordinating with a friend in a different city to do precisely this. Fire safes are good to have, but the contents don't always survive a really big fire.)
18
19 Borg is more picky, it only directly works to a local disk or via ssh. But that's because it has a potent, chunk-based storage algorithm similar to what rsync uses to save transfer bandwidth. It's very good at finding duplicate files, or even duplicate pieces of files, and storing them only once. This makes it amazingly good for things like VM images or other large files which accumulate small changes over time, or full OS backups (you'd be amazed how many duplicate files there are across a Linux OS).
20
21 Now, if you want to stick with old stuff that you thoroughly understand, that's fine too. For a dirt simple program capable of incremental backups and splitting the archive between disks you're looking for...
22
23 wait for it...
24
25 tar.
26
27 It's ability to detect files which have changed is largely dependent on filesystem timestamps and the archive bit, so you have to make sure your usage pattern respects those. And it doesn't really do deduplication. But it actually has a reasonable set of backup features, including archive splitting. Your backup storage doesn't even need to support random access, and doesn't even need a filesystem. A bunch of my backups are on BD-REs You just tell tar how big the disk is, pop it in, and hit go. When it's full it asks for another one. There are a few updated versions of tar which add things like indexes for fast seeking and other features which are handy on large data sets.
28
29 Personally these days I tend to use Borg, because it deduplicates really well, and archives can be thinned out in any order. It's also useful that you can put the backup archive in "append only" mode so that if anyone gets ransomware onto your system it's much more difficult for them to corrupt your backups.
30
31 The other thing is data integrity checking on your storage. Yes, disks have built-in ECC, but it's not terribly good. As annoying as it might be to have to hook up more than one disk at a time, BTRFS RAID triggers not only on complete read failures, but also keeps additional checksums such that it can detect and recover even single bit flips. And it supports in-line compression. (How well that works obviously depends on how compressible your data is.) You can do similar things with LVM and/or mdraid, but the BTRFS checksums are the most comprehensive I've seen so far.
32
33 For optical media there's dvdisaster which can generate Reed-Solomon redundancy data in a variety of ways. (Yes, I know, nobody uses optical any more... But what other storage is easily available that's EMP-proof? Solar flares can be wicked when they happen.)
34
35 And there's one more, that I haven't used in years, and I'm not sure how well it would work with Gentoo, but it was still alive as of 2020. mondorescue.org is an interesting concept where it takes your currently running system and all the data on it and turns it into a bootable image, with disk-spanning as necessary. It's designed primarily for CentOS, and I've only ever used it with Debian, but when it works it makes bare-metal restores really simple. Boot your backup drive, swap disks when prompted if necessary, and when it's done, there you are, everything right where you left it.
36
37 LMP