Gentoo Archives: gentoo-user

From: John Covici <covici@××××××××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Backup program that compresses data but only changes new files.
Date: Mon, 15 Aug 2022 09:45:45
Message-Id: m34jydj0w1.wl-covici@ccs.covici.com
In Reply to: Re: [gentoo-user] Backup program that compresses data but only changes new files. by Dale
1 On Mon, 15 Aug 2022 04:33:44 -0400,
2 Dale wrote:
3 >
4 > William Kenworthy wrote:
5 > >
6 > > On 15/8/22 06:44, Dale wrote:
7 > >> Howdy,
8 > >>
9 > >> With my new fiber internet, my poor disks are getting a work out, and
10 > >> also filling up.  First casualty, my backup disk.  I have one directory
11 > >> that is . . . well . . . huge.  It's about 7TBs or so.  This is where it
12 > >> is right now and it's still trying to pack in files.
13 > >>
14 > >>
15 > >> /dev/mapper/8tb            7.3T  7.1T  201G  98% /mnt/8tb
16 > >>
17 > >>
18 > >> Right now, I'm using rsync which doesn't compress files but does just
19 > >> update things that have changed.  I'd like to find some way, software
20 > >> but maybe there is already a tool I'm unaware of, to compress data and
21 > >> work a lot like rsync otherwise.  I looked in app-backup and there is a
22 > >> lot of options but not sure which fits best for what I want to do.
23 > >> Again, backup a directory, compress and only update with changed or new
24 > >> files.  Generally, it only adds files but sometimes a file gets replaced
25 > >> as well.  Same name but different size.
26 > >>
27 > >> I was trying to go through the list in app-backup one by one but to be
28 > >> honest, most links included only go to github or something and usually
29 > >> doesn't tell anything about how it works or anything.  Basically, as far
30 > >> as seeing if it does what I want, it's useless. It sort of reminds me of
31 > >> quite a few USE flag descriptions.
32 > >>
33 > >> I plan to buy another hard drive pretty soon.  Next month is possible.
34 > >> If there is nothing available that does what I want, is there a way to
35 > >> use rsync and have it set to backup files starting with "a" through "k"
36 > >> to one spot and then backup "l" through "z" to another?  I could then
37 > >> split the files into two parts.  I use a script to do this now, if one
38 > >> could call my little things scripts, so even a complicated command could
39 > >> work, just may need help figuring out the command.
40 > >>
41 > >> Thoughts?  Ideas?
42 > >>
43 > >> Dale
44 > >>
45 > >> :-)  :-)
46 > >>
47 > > The questions you need to ask is how compressible is the data and how
48 > > much duplication is in there.  Rsync's biggest disadvantage is it
49 > > doesn't keep history, so if you need to restore something from last
50 > > week you are SOL.  Honestly, rsync is not a backup program and should
51 > > only be used the way you do for data that don't value as an rsync
52 > > archive is a disaster waiting to happen from a backup point of view.
53 > >
54 > > Look into dirvish - uses hard links to keep files current but safe, is
55 > > easy to restore (looks like a exact copy so you cp the files back if
56 > > needed.  Downside is it hammers the hard disk and has no compression
57 > > so its only deduplication via history (my backups stabilised about 2x
58 > > original size for ~2yrs of history - though you can use something like
59 > > btrfs which has filesystem level compression.
60 > >
61 > > My current program is borgbackup which is very sophisticated in how it
62 > > stores data - its probably your best bet in fact.  I am storing
63 > > literally tens of Tb of raw data on a 4Tb usb3 disk (going back years
64 > > and yes, I do restore regularly, and not just for disasters but for
65 > > space efficient long term storage I access only rarely.
66 > >
67 > > e.g.:
68 > >
69 > > A single host:
70 > >
71 > > ------------------------------------------------------------------------------
72 > >
73 > >                        Original size      Compressed size Deduplicated
74 > > size
75 > > All archives:                3.07 TB              1.96 TB           
76 > > 151.80 GB
77 > >
78 > >                        Unique chunks         Total chunks
79 > > Chunk index:                 1026085             22285913
80 > >
81 > >
82 > > Then there is my offline storage - it backs up ~15 hosts (in repos
83 > > like the above) + data storage like 22 years of email etc. Each host
84 > > backs up to its own repo then the offline storage backs that up.  The
85 > > deduplicated size is the actual on disk size ... compression varies as
86 > > its whatever I used at the time the backup was taken ... currently I
87 > > have it set to "auto,zstd,11" but it can be mixed in the same repo (a
88 > > repo is a single backup set - you can nest repos which is what I do -
89 > > so ~45Tb stored on a 4Tb offline disk).  One advantage of a system
90 > > like this is chunked data rarely changes, so its only the differences
91 > > that are backed up (read the borgbackup docs - interesting)
92 > >
93 > > ------------------------------------------------------------------------------
94 > >
95 > >                        Original size      Compressed size Deduplicated
96 > > size
97 > > All archives:               28.69 TB             28.69 TB             
98 > > 3.81 TB
99 > >
100 > >                        Unique chunks         Total chunks
101 > > Chunk index:
102 > >
103 > >
104 > >
105 > >
106 >
107 >
108 > For the particular drive in question, it is 99.99% videos.  I don't want
109 > to lose any quality but I'm not sure how much they can be compressed to
110 > be honest.  It could be they are already as compressed as they can be
111 > without losing resolution etc.  I've been lucky so far.  I don't think
112 > I've ever needed anything and did a backup losing what I lost on working
113 > copy.  Example.  I update a video only to find the newer copy is corrupt
114 > and wanting the old one back.  I've done it a time or two but I tend to
115 > find that before I do backups.  Still, it is a downside and something
116 > I've thought about before.  I figure when it does happen, it will be
117 > something hard to replace.  Just letting the devil have his day.  :-(
118 >
119 > For that reason, I find the version type backups interesting.  It is a
120 > safer method.  You can have a new file but also have a older file as
121 > well just in case new file takes a bad turn.  It is a interesting
122 > thought.  It's one not only I should consider but anyone really. 
123 >
124 > As I posted in another reply, I found a 10TB drive that should be here
125 > by the time I do a fresh set of backups.  This will give me more time to
126 > consider things.  Have I said this before a while back???  :/ 
127 >
128
129 zfs would solve your problem of corruption, even without versioning.
130 You do a scrub at short intervals and at least you would know if the
131 file is corrupted. Of course, redundancy is better, such as mirroring
132 and backups take a very short time because sending from one zfs to
133 another it knows exactly what bytes to send.
134
135 --
136 Your life is like a penny. You're going to lose it. The question is:
137 How do
138 you spend it?
139
140 John Covici wb2una
141 covici@××××××××××.com

Replies

Subject Author
Re: [gentoo-user] Backup program that compresses data but only changes new files. Wol <antlists@××××××××××××.uk>