Gentoo Archives: gentoo-user

From: Dale <rdalek1967@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Backup program that compresses data but only changes new files.
Date: Mon, 15 Aug 2022 08:33:51
Message-Id: 72bf015e-a102-1df7-151b-3d4138b6a640@gmail.com
In Reply to: Re: [gentoo-user] Backup program that compresses data but only changes new files. by William Kenworthy
1 William Kenworthy wrote:
2 >
3 > On 15/8/22 06:44, Dale wrote:
4 >> Howdy,
5 >>
6 >> With my new fiber internet, my poor disks are getting a work out, and
7 >> also filling up.  First casualty, my backup disk.  I have one directory
8 >> that is . . . well . . . huge.  It's about 7TBs or so.  This is where it
9 >> is right now and it's still trying to pack in files.
10 >>
11 >>
12 >> /dev/mapper/8tb            7.3T  7.1T  201G  98% /mnt/8tb
13 >>
14 >>
15 >> Right now, I'm using rsync which doesn't compress files but does just
16 >> update things that have changed.  I'd like to find some way, software
17 >> but maybe there is already a tool I'm unaware of, to compress data and
18 >> work a lot like rsync otherwise.  I looked in app-backup and there is a
19 >> lot of options but not sure which fits best for what I want to do.
20 >> Again, backup a directory, compress and only update with changed or new
21 >> files.  Generally, it only adds files but sometimes a file gets replaced
22 >> as well.  Same name but different size.
23 >>
24 >> I was trying to go through the list in app-backup one by one but to be
25 >> honest, most links included only go to github or something and usually
26 >> doesn't tell anything about how it works or anything.  Basically, as far
27 >> as seeing if it does what I want, it's useless. It sort of reminds me of
28 >> quite a few USE flag descriptions.
29 >>
30 >> I plan to buy another hard drive pretty soon.  Next month is possible.
31 >> If there is nothing available that does what I want, is there a way to
32 >> use rsync and have it set to backup files starting with "a" through "k"
33 >> to one spot and then backup "l" through "z" to another?  I could then
34 >> split the files into two parts.  I use a script to do this now, if one
35 >> could call my little things scripts, so even a complicated command could
36 >> work, just may need help figuring out the command.
37 >>
38 >> Thoughts?  Ideas?
39 >>
40 >> Dale
41 >>
42 >> :-)  :-)
43 >>
44 > The questions you need to ask is how compressible is the data and how
45 > much duplication is in there.  Rsync's biggest disadvantage is it
46 > doesn't keep history, so if you need to restore something from last
47 > week you are SOL.  Honestly, rsync is not a backup program and should
48 > only be used the way you do for data that don't value as an rsync
49 > archive is a disaster waiting to happen from a backup point of view.
50 >
51 > Look into dirvish - uses hard links to keep files current but safe, is
52 > easy to restore (looks like a exact copy so you cp the files back if
53 > needed.  Downside is it hammers the hard disk and has no compression
54 > so its only deduplication via history (my backups stabilised about 2x
55 > original size for ~2yrs of history - though you can use something like
56 > btrfs which has filesystem level compression.
57 >
58 > My current program is borgbackup which is very sophisticated in how it
59 > stores data - its probably your best bet in fact.  I am storing
60 > literally tens of Tb of raw data on a 4Tb usb3 disk (going back years
61 > and yes, I do restore regularly, and not just for disasters but for
62 > space efficient long term storage I access only rarely.
63 >
64 > e.g.:
65 >
66 > A single host:
67 >
68 > ------------------------------------------------------------------------------
69 >
70 >                        Original size      Compressed size Deduplicated
71 > size
72 > All archives:                3.07 TB              1.96 TB           
73 > 151.80 GB
74 >
75 >                        Unique chunks         Total chunks
76 > Chunk index:                 1026085             22285913
77 >
78 >
79 > Then there is my offline storage - it backs up ~15 hosts (in repos
80 > like the above) + data storage like 22 years of email etc. Each host
81 > backs up to its own repo then the offline storage backs that up.  The
82 > deduplicated size is the actual on disk size ... compression varies as
83 > its whatever I used at the time the backup was taken ... currently I
84 > have it set to "auto,zstd,11" but it can be mixed in the same repo (a
85 > repo is a single backup set - you can nest repos which is what I do -
86 > so ~45Tb stored on a 4Tb offline disk).  One advantage of a system
87 > like this is chunked data rarely changes, so its only the differences
88 > that are backed up (read the borgbackup docs - interesting)
89 >
90 > ------------------------------------------------------------------------------
91 >
92 >                        Original size      Compressed size Deduplicated
93 > size
94 > All archives:               28.69 TB             28.69 TB             
95 > 3.81 TB
96 >
97 >                        Unique chunks         Total chunks
98 > Chunk index:
99 >
100 >
101 >
102 >
103
104
105 For the particular drive in question, it is 99.99% videos.  I don't want
106 to lose any quality but I'm not sure how much they can be compressed to
107 be honest.  It could be they are already as compressed as they can be
108 without losing resolution etc.  I've been lucky so far.  I don't think
109 I've ever needed anything and did a backup losing what I lost on working
110 copy.  Example.  I update a video only to find the newer copy is corrupt
111 and wanting the old one back.  I've done it a time or two but I tend to
112 find that before I do backups.  Still, it is a downside and something
113 I've thought about before.  I figure when it does happen, it will be
114 something hard to replace.  Just letting the devil have his day.  :-(
115
116 For that reason, I find the version type backups interesting.  It is a
117 safer method.  You can have a new file but also have a older file as
118 well just in case new file takes a bad turn.  It is a interesting
119 thought.  It's one not only I should consider but anyone really. 
120
121 As I posted in another reply, I found a 10TB drive that should be here
122 by the time I do a fresh set of backups.  This will give me more time to
123 consider things.  Have I said this before a while back???  :/ 
124
125 Dale
126
127 :-)  :-) 

Replies

Subject Author
Re: [gentoo-user] Backup program that compresses data but only changes new files. John Covici <covici@××××××××××.com>