Gentoo Archives: gentoo-user

From: Laurence Perkins <lperkins@×××××××.net>
To: "gentoo-user@l.g.o" <gentoo-user@l.g.o>
Subject: RE: [gentoo-user] How to compress lots of tarballs
Date: Tue, 28 Sep 2021 17:43:16
Message-Id: MW2PR07MB4058563304B1A1A240350CF2D2A89@MW2PR07MB4058.namprd07.prod.outlook.com
In Reply to: Re: [gentoo-user] How to compress lots of tarballs by Peter Humphrey
1 >On Monday, 27 September 2021 14:30:36 BST Peter Humphrey wrote:
2 >> On Monday, 27 September 2021 02:39:19 BST Adam Carter wrote:
3 >> > On Sun, Sep 26, 2021 at 8:57 PM Peter Humphrey
4 ><peter@××××××××××××.uk>
5 >> >
6 >> > wrote:
7 >> > > Hello list,
8 >> > >
9 >> > > I have an external USB-3 drive with various system backups. There
10 >> > > are
11 >> > > 350
12 >> > > .tar files (not .tar.gz etc.), amounting to 2.5TB. I was sure I
13 >> > > wouldn't need to compress them, so I didn't, but now I think I'm
14 >> > > going to have to.
15 >> > > Is there a reasonably efficient way to do this?
16 >> >
17 >> > find <mountpoint> -name \*tar -exec zstd -TN {} \;
18 >> >
19 >> > Where N is the number of cores you want to allocate. zstd -T0 (or
20 >> > just
21 >> > zstdmt) if you want to use all the available cores. I use zstd for
22 >> > everything now as it's as good as or better than all the others in
23 >> > the general case.
24 >> >
25 >> > Parallel means it uses more than one core, so on a modern machine it
26 >> > is much faster.
27 >>
28 >> Thanks to all who've helped. I can't avoid feeling, though, that the
29 >> main bottleneck has been missed: that I have to read and write on a USB-3 drive.
30 >> It's just taken 23 minutes to copy the current system backup from
31 >> USB-3 to SATA SSD: 108GB in 8 .tar files.
32 >
33 >I was premature. In contrast to the 23 minutes to copy the files from USB-3 to internal SSD, zstd -T0 took 3:22 to compress them onto another internal SSD. I watched /bin/top and didn't see more than 250% CPU (this is a 24-CPU box) with next-to-nothing else running. The result was 65G of .tar.zst files.
34 >
35 >So, at negligible cost in CPU load*, I can achieve a 40% saving in space. Of course, I'll have to manage the process myself, and I still have to copy the compressed files back to USB-3 - but then I am retired, so what else do I have to do? :)
36 >
37 >Thanks again, all who've helped.
38 >
39 >* ...so I can continue running my 5 BOINC projects at the same time.
40 >
41 >--
42 >Regards,
43 >Peter.
44
45 There are also backup tools which will handle the compression step for you.
46
47 app-backup/duplicity uses a similar tar file and index system with periodic full and then incremental chains. Plus it keeps a condensed list of file hashes from previous runs so it doesn't have to re-read the entire archive to determine what changed the way rsync does.
48
49 app-backup/borgbackup is more complex, but is very, very good at deduplicating file data, which saves even more space. Furthermore, it can store backups for multiple systems and deduplicate between them, so if you have any other machines you can have backups there as well, potentially at negligble space cost if you have a lot of redundancy.
50
51 LMP

Replies

Subject Author
Re: [gentoo-user] How to compress lots of tarballs Peter Humphrey <peter@××××××××××××.uk>