Gentoo Archives: gentoo-user

From: William Kenworthy <billk@×××××××××.au>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Backup program that compresses data but only changes new files.
Date: Mon, 15 Aug 2022 06:02:01
Message-Id: b387c1eb-2116-4112-5ac2-0aadafe45667@iinet.net.au
In Reply to: [gentoo-user] Backup program that compresses data but only changes new files. by Dale
1 On 15/8/22 06:44, Dale wrote:
2 > Howdy,
3 >
4 > With my new fiber internet, my poor disks are getting a work out, and
5 > also filling up.  First casualty, my backup disk.  I have one directory
6 > that is . . . well . . . huge.  It's about 7TBs or so.  This is where it
7 > is right now and it's still trying to pack in files.
8 >
9 >
10 > /dev/mapper/8tb            7.3T  7.1T  201G  98% /mnt/8tb
11 >
12 >
13 > Right now, I'm using rsync which doesn't compress files but does just
14 > update things that have changed.  I'd like to find some way, software
15 > but maybe there is already a tool I'm unaware of, to compress data and
16 > work a lot like rsync otherwise.  I looked in app-backup and there is a
17 > lot of options but not sure which fits best for what I want to do.
18 > Again, backup a directory, compress and only update with changed or new
19 > files.  Generally, it only adds files but sometimes a file gets replaced
20 > as well.  Same name but different size.
21 >
22 > I was trying to go through the list in app-backup one by one but to be
23 > honest, most links included only go to github or something and usually
24 > doesn't tell anything about how it works or anything.  Basically, as far
25 > as seeing if it does what I want, it's useless. It sort of reminds me of
26 > quite a few USE flag descriptions.
27 >
28 > I plan to buy another hard drive pretty soon.  Next month is possible.
29 > If there is nothing available that does what I want, is there a way to
30 > use rsync and have it set to backup files starting with "a" through "k"
31 > to one spot and then backup "l" through "z" to another?  I could then
32 > split the files into two parts.  I use a script to do this now, if one
33 > could call my little things scripts, so even a complicated command could
34 > work, just may need help figuring out the command.
35 >
36 > Thoughts?  Ideas?
37 >
38 > Dale
39 >
40 > :-)  :-)
41 >
42 The questions you need to ask is how compressible is the data and how
43 much duplication is in there.  Rsync's biggest disadvantage is it
44 doesn't keep history, so if you need to restore something from last week
45 you are SOL.  Honestly, rsync is not a backup program and should only be
46 used the way you do for data that don't value as an rsync archive is a
47 disaster waiting to happen from a backup point of view.
48
49 Look into dirvish - uses hard links to keep files current but safe, is
50 easy to restore (looks like a exact copy so you cp the files back if
51 needed.  Downside is it hammers the hard disk and has no compression so
52 its only deduplication via history (my backups stabilised about 2x
53 original size for ~2yrs of history - though you can use something like
54 btrfs which has filesystem level compression.
55
56 My current program is borgbackup which is very sophisticated in how it
57 stores data - its probably your best bet in fact.  I am storing
58 literally tens of Tb of raw data on a 4Tb usb3 disk (going back years
59 and yes, I do restore regularly, and not just for disasters but for
60 space efficient long term storage I access only rarely.
61
62 e.g.:
63
64 A single host:
65
66 ------------------------------------------------------------------------------
67                        Original size      Compressed size Deduplicated size
68 All archives:                3.07 TB              1.96 TB           
69 151.80 GB
70
71                        Unique chunks         Total chunks
72 Chunk index:                 1026085             22285913
73
74
75 Then there is my offline storage - it backs up ~15 hosts (in repos like
76 the above) + data storage like 22 years of email etc. Each host backs up
77 to its own repo then the offline storage backs that up.  The
78 deduplicated size is the actual on disk size ... compression varies as
79 its whatever I used at the time the backup was taken ... currently I
80 have it set to "auto,zstd,11" but it can be mixed in the same repo (a
81 repo is a single backup set - you can nest repos which is what I do - so
82 ~45Tb stored on a 4Tb offline disk).  One advantage of a system like
83 this is chunked data rarely changes, so its only the differences that
84 are backed up (read the borgbackup docs - interesting)
85
86 ------------------------------------------------------------------------------
87                        Original size      Compressed size Deduplicated size
88 All archives:               28.69 TB             28.69 TB             
89 3.81 TB
90
91                        Unique chunks         Total chunks
92 Chunk index:

Replies