1 |
On Sun, Aug 14, 2022 at 6:44 PM Dale <rdalek1967@×××××.com> wrote: |
2 |
> |
3 |
> Right now, I'm using rsync which doesn't compress files but does just |
4 |
> update things that have changed. I'd like to find some way, software |
5 |
> but maybe there is already a tool I'm unaware of, to compress data and |
6 |
> work a lot like rsync otherwise. |
7 |
|
8 |
So, how important is it that it work exactly like rsync? |
9 |
|
10 |
I use duplicity, in part because I've been using it forever. Restic |
11 |
seems to be a similar program most are using these days which I |
12 |
haven't looked at super-closely but I'd look at that first if starting |
13 |
out. |
14 |
|
15 |
Duplicity uses librsync, so it backs up exactly the same data as rsync |
16 |
would, except instead of replicating entire files, it creates streams |
17 |
of data more like something like tar. So if you back up a million |
18 |
small files you might get out 1-3 big files. It can compress and |
19 |
encrypt the data as you wish. The downside is that you don't end up |
20 |
with something that looks like your original files - you have to run |
21 |
the restore process to extract them all back out. It is extremely |
22 |
space-efficient though - if 1 byte changes in the middle of a 10GB |
23 |
file you'll end up just backing up maybe a kilobyte or so (whatever |
24 |
the block size is), which is just like rsync. |
25 |
|
26 |
Typically you rely on metadata to find files that change which is |
27 |
fast, but I'm guessing you can tell these programs to do a deep scan |
28 |
which of course requires reading the entire contents, and that will |
29 |
discover anything that was modified without changing ctime/mtime. |
30 |
|
31 |
The output files can be split to any size, and the index info (the |
32 |
metadata) is separate from the raw data. If you're storing to |
33 |
offline/remote/cloud/whatever storage typically you keep the metadata |
34 |
cached locally to speed retrieval and to figure out what files have |
35 |
changed for incrementals. However, if the local cache isn't there |
36 |
then it will fetch just the indexes from wherever it is stored |
37 |
(they're small). |
38 |
|
39 |
It has support for many cloud services - I store mine to AWS S3. |
40 |
|
41 |
There are also some options that are a little closer to rsync like |
42 |
rsnapshot and burp. Those don't store compressed (unless there is an |
43 |
option for that or something), but they do let you rotate through |
44 |
multiple backups and they'll set up hard links/etc so that they are |
45 |
de-duplicated. Of course hard links are at the file level so if 1 |
46 |
byte inside a file changes you'll end up with two full copies. It |
47 |
will still only transfer a single block so the bandwidth requirements |
48 |
are similar to rsync. |
49 |
|
50 |
-- |
51 |
Rich |