Gentoo Archives: gentoo-user

From: Dale <rdalek1967@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Backup program that compresses data but only changes new files.
Date: Mon, 15 Aug 2022 07:05:31
Message-Id: e99e39ea-6250-6008-5cd6-37fa84e22c7a@gmail.com
In Reply to: Re: [gentoo-user] Backup program that compresses data but only changes new files. by Rich Freeman
1 Rich Freeman wrote:
2 > On Sun, Aug 14, 2022 at 6:44 PM Dale <rdalek1967@×××××.com> wrote:
3 >> Right now, I'm using rsync which doesn't compress files but does just
4 >> update things that have changed. I'd like to find some way, software
5 >> but maybe there is already a tool I'm unaware of, to compress data and
6 >> work a lot like rsync otherwise.
7 > So, how important is it that it work exactly like rsync?
8 >
9 > I use duplicity, in part because I've been using it forever. Restic
10 > seems to be a similar program most are using these days which I
11 > haven't looked at super-closely but I'd look at that first if starting
12 > out.
13 >
14 > Duplicity uses librsync, so it backs up exactly the same data as rsync
15 > would, except instead of replicating entire files, it creates streams
16 > of data more like something like tar. So if you back up a million
17 > small files you might get out 1-3 big files. It can compress and
18 > encrypt the data as you wish. The downside is that you don't end up
19 > with something that looks like your original files - you have to run
20 > the restore process to extract them all back out. It is extremely
21 > space-efficient though - if 1 byte changes in the middle of a 10GB
22 > file you'll end up just backing up maybe a kilobyte or so (whatever
23 > the block size is), which is just like rsync.
24 >
25 > Typically you rely on metadata to find files that change which is
26 > fast, but I'm guessing you can tell these programs to do a deep scan
27 > which of course requires reading the entire contents, and that will
28 > discover anything that was modified without changing ctime/mtime.
29 >
30 > The output files can be split to any size, and the index info (the
31 > metadata) is separate from the raw data. If you're storing to
32 > offline/remote/cloud/whatever storage typically you keep the metadata
33 > cached locally to speed retrieval and to figure out what files have
34 > changed for incrementals. However, if the local cache isn't there
35 > then it will fetch just the indexes from wherever it is stored
36 > (they're small).
37 >
38 > It has support for many cloud services - I store mine to AWS S3.
39 >
40 > There are also some options that are a little closer to rsync like
41 > rsnapshot and burp. Those don't store compressed (unless there is an
42 > option for that or something), but they do let you rotate through
43 > multiple backups and they'll set up hard links/etc so that they are
44 > de-duplicated. Of course hard links are at the file level so if 1
45 > byte inside a file changes you'll end up with two full copies. It
46 > will still only transfer a single block so the bandwidth requirements
47 > are similar to rsync.
48 >
49
50
51 Duplicity sounds interesting except that I already have the drive
52 encrypted.  Keep in mind, these are external drives that I hook up long
53 enough to complete the backups then back in a fire safe they go.  The
54 reason I mentioned being like rsync, I don't want to rebuild a backup
55 from scratch each time as that would be time consuming.  I thought of
56 using Kbackup ages ago and it rebuilds from scratch each time but it
57 does have the option of compressing.  That might work for small stuff
58 but not many TBs of it.  Back in the early 90's, I remember using a
59 backup software that was incremental.  It would only update files that
60 changed and would do it over several floppy disks and compressed it as
61 well.  Something like that nowadays is likely rare if it exists at all
62 since floppies are long dead.  I either need to split my backup into two
63 pieces or compress my data.  That is why I mentioned if there is a way
64 to backup first part of alphabet in one command, switch disks and then
65 do second part of alphabet to another disk. 
66
67 Mostly, I just want to add compression to what I do now.  I figure there
68 is a tool for it but no idea what it is called.  Another method is
69 splitting into two parts.  In the long run, either should work and may
70 end up needing both at some point.  :/   If I could add both now, save
71 me some problems later on.  I guess.
72
73 I might add, I also thought about using a Raspberry Pi thingy and having
74 sort of a small scale NAS thing.  I'm not sure about that thing either
75 tho.  Plus, they pricey right now.  $$$
76
77 Dale
78
79 :-)  :-)

Replies