Gentoo Archives: gentoo-user

From: Dale <rdalek1967@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] How to compress lots of tarballs
Date: Wed, 29 Sep 2021 20:58:52
Message-Id: 46a2a568-575a-49be-a16b-5c2d59faee54@gmail.com
In Reply to: RE: [gentoo-user] How to compress lots of tarballs by Laurence Perkins
1 Laurence Perkins wrote:
2 >>
3 >> Curious question here. As you may recall, I backup to a external hard drive. Would it make sense to use that software for a external hard drive? Right now, I'm just doing file updates with rsync and the drive is encrypted. Thing is, I'm going to have to split into three drives soon. So, compressing may help. Since it is video files, it may not help much but I'm not sure about that. Just curious.
4 >>
5 >> Dale
6 >>
7 >> :-) :-)
8 >>
9 >>
10 > If I understand correctly you're using rsync+tar and then keeping a set of copies of various ages.
11
12 Actually, it is uncompressed and just stores one version and one copy. 
13
14
15 >
16 > If you lose a single file that you want to restore and have to go hunting for it, with tar you can only list the files in the archive by reading the entire thing into memory and only extract by reading from the beginning until you stumble across the matching filename. So with large archives to hunt through, that could take... a while...
17 >
18 > dar is compatible with tar (Pretty sure, would have to look again, but I remember that being one of its main selling-points) but adds an index at the end of the file allowing listing of the contents and jumping to particular files without having to read the entire thing. Won't help with your space shortage, but will make searching and single-file restores much faster.
19 >
20 > Duplicity and similar has the indices, and additionally a full+incremental scheme. So searching is reasonably quick, and restoring likewise doesn't have to grovel over all the data. It can be slower than tar or dar for restore though because it has to restore first from the full, and then walk through however many incrementals are necessary to get the version you want. This comes with a substantial space savings though as each set of archive files after the full contains only the pieces which actually changed. Coupled with compression, that might solve your space issues for a while longer.
21 >
22 > Borg and similar break the files into variable-size chunks and store each chunk indexed by its content hash. So each chunk gets stored exactly once regardless of how many times it may occur in the data set. Backups then become simply lists of file attributes and what chunks they contain. This results both in storing only changes between backup runs and in deduplication of commonly-occurring data chunks across the entire backup. The database-like structure also means that all backups can be searched and restored from in roughly equal amounts of time and that backup sets can be deleted in any order. Many of them (Borg included) also allow mounting backup sets via FUSE. The disadvantage is that restore requires a compatible version of the backup tool rather than just a generic utility.
23 >
24 > LMP
25
26
27 I guess that is the downside of not having just plain uncompressed
28 files.  Thing is, so far, I've never needed to restore a single file or
29 even several files.  So it's not a big deal for me.  If I accidentally
30 delete something tho, that could be a problem, if it has left the trash
31 already. 
32
33 Since the drive also uses LVM, someone mentioned using snapshots.  Still
34 not real clear on those even tho I've read a bit about them.  Some of
35 the backup technics are confusing to me.  I get plain files, even
36 incremental to a extent but some of the new stuff just muddies the water. 
37
38 I really need to just build a file server, RAID or something.  :/
39
40 Dale
41
42 :-)  :-)

Replies

Subject Author
Re: [gentoo-user] How to compress lots of tarballs Wols Lists <antlists@××××××××××××.uk>