Re: [gentoo-user] Backup program that compresses data but only changes new files. - gentoo-user

From:	William Kenworthy <billk@×××××××××.au>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Backup program that compresses data but only changes new files.
Date:	Mon, 15 Aug 2022 06:02:01
Message-Id:	`b387c1eb-2116-4112-5ac2-0aadafe45667@iinet.net.au`
In Reply to:	[gentoo-user] Backup program that compresses data but only changes new files. by Dale

1

On 15/8/22 06:44, Dale wrote:

2

> Howdy,

3

>

4

> With my new fiber internet, my poor disks are getting a work out, and

5

> also filling up.  First casualty, my backup disk.  I have one directory

6

> that is . . . well . . . huge.  It's about 7TBs or so.  This is where it

7

> is right now and it's still trying to pack in files.

8

>

9

>

10

> /dev/mapper/8tb            7.3T  7.1T  201G  98% /mnt/8tb

11

>

12

>

13

> Right now, I'm using rsync which doesn't compress files but does just

14

> update things that have changed.  I'd like to find some way, software

15

> but maybe there is already a tool I'm unaware of, to compress data and

16

> work a lot like rsync otherwise.  I looked in app-backup and there is a

17

> lot of options but not sure which fits best for what I want to do.

18

> Again, backup a directory, compress and only update with changed or new

19

> files.  Generally, it only adds files but sometimes a file gets replaced

20

> as well.  Same name but different size.

21

>

22

> I was trying to go through the list in app-backup one by one but to be

23

> honest, most links included only go to github or something and usually

24

> doesn't tell anything about how it works or anything.  Basically, as far

25

> as seeing if it does what I want, it's useless. It sort of reminds me of

26

> quite a few USE flag descriptions.

27

>

28

> I plan to buy another hard drive pretty soon.  Next month is possible.

29

> If there is nothing available that does what I want, is there a way to

30

> use rsync and have it set to backup files starting with "a" through "k"

31

> to one spot and then backup "l" through "z" to another?  I could then

32

> split the files into two parts.  I use a script to do this now, if one

33

> could call my little things scripts, so even a complicated command could

34

> work, just may need help figuring out the command.

35

>

36

> Thoughts?  Ideas?

37

>

38

> Dale

39

>

40

> :-)  :-)

41

>

42

The questions you need to ask is how compressible is the data and how 

43

much duplication is in there.  Rsync's biggest disadvantage is it 

44

doesn't keep history, so if you need to restore something from last week 

45

you are SOL.  Honestly, rsync is not a backup program and should only be 

46

used the way you do for data that don't value as an rsync archive is a 

47

disaster waiting to happen from a backup point of view.

48

49

Look into dirvish - uses hard links to keep files current but safe, is 

50

easy to restore (looks like a exact copy so you cp the files back if 

51

needed.  Downside is it hammers the hard disk and has no compression so 

52

its only deduplication via history (my backups stabilised about 2x 

53

original size for ~2yrs of history - though you can use something like 

54

btrfs which has filesystem level compression.

55

56

My current program is borgbackup which is very sophisticated in how it 

57

stores data - its probably your best bet in fact.  I am storing 

58

literally tens of Tb of raw data on a 4Tb usb3 disk (going back years 

59

and yes, I do restore regularly, and not just for disasters but for 

60

space efficient long term storage I access only rarely.

61

62

e.g.:

63

64

A single host:

65

66

------------------------------------------------------------------------------

67

                        Original size      Compressed size Deduplicated size

68

All archives:                3.07 TB              1.96 TB            

69

151.80 GB

70

71

                        Unique chunks         Total chunks

72

Chunk index:                 1026085             22285913

73

74

75

Then there is my offline storage - it backs up ~15 hosts (in repos like 

76

the above) + data storage like 22 years of email etc. Each host backs up 

77

to its own repo then the offline storage backs that up.  The 

78

deduplicated size is the actual on disk size ... compression varies as 

79

its whatever I used at the time the backup was taken ... currently I 

80

have it set to "auto,zstd,11" but it can be mixed in the same repo (a 

81

repo is a single backup set - you can nest repos which is what I do - so 

82

~45Tb stored on a 4Tb offline disk).  One advantage of a system like 

83

this is chunked data rarely changes, so its only the differences that 

84

are backed up (read the borgbackup docs - interesting)

85

86

------------------------------------------------------------------------------

87

                        Original size      Compressed size Deduplicated size

88

All archives:               28.69 TB             28.69 TB              

89

3.81 TB

90

91

                        Unique chunks         Total chunks

92

Chunk index:

Gentoo Archives: gentoo-user

Replies

1	On 15/8/22 06:44, Dale wrote:
2	> Howdy,
3	>
4	> With my new fiber internet, my poor disks are getting a work out, and
5	> also filling up. First casualty, my backup disk. I have one directory
6	> that is . . . well . . . huge. It's about 7TBs or so. This is where it
7	> is right now and it's still trying to pack in files.
8	>
9	>
10	> /dev/mapper/8tb 7.3T 7.1T 201G 98% /mnt/8tb
11	>
12	>
13	> Right now, I'm using rsync which doesn't compress files but does just
14	> update things that have changed. I'd like to find some way, software
15	> but maybe there is already a tool I'm unaware of, to compress data and
16	> work a lot like rsync otherwise. I looked in app-backup and there is a
17	> lot of options but not sure which fits best for what I want to do.
18	> Again, backup a directory, compress and only update with changed or new
19	> files. Generally, it only adds files but sometimes a file gets replaced
20	> as well. Same name but different size.
21	>
22	> I was trying to go through the list in app-backup one by one but to be
23	> honest, most links included only go to github or something and usually
24	> doesn't tell anything about how it works or anything. Basically, as far
25	> as seeing if it does what I want, it's useless. It sort of reminds me of
26	> quite a few USE flag descriptions.
27	>
28	> I plan to buy another hard drive pretty soon. Next month is possible.
29	> If there is nothing available that does what I want, is there a way to
30	> use rsync and have it set to backup files starting with "a" through "k"
31	> to one spot and then backup "l" through "z" to another? I could then
32	> split the files into two parts. I use a script to do this now, if one
33	> could call my little things scripts, so even a complicated command could
34	> work, just may need help figuring out the command.
35	>
36	> Thoughts? Ideas?
37	>
38	> Dale
39	>
40	> :-) :-)
41	>
42	The questions you need to ask is how compressible is the data and how
43	much duplication is in there. Rsync's biggest disadvantage is it
44	doesn't keep history, so if you need to restore something from last week
45	you are SOL. Honestly, rsync is not a backup program and should only be
46	used the way you do for data that don't value as an rsync archive is a
47	disaster waiting to happen from a backup point of view.
48
49	Look into dirvish - uses hard links to keep files current but safe, is
50	easy to restore (looks like a exact copy so you cp the files back if
51	needed. Downside is it hammers the hard disk and has no compression so
52	its only deduplication via history (my backups stabilised about 2x
53	original size for ~2yrs of history - though you can use something like
54	btrfs which has filesystem level compression.
55
56	My current program is borgbackup which is very sophisticated in how it
57	stores data - its probably your best bet in fact. I am storing
58	literally tens of Tb of raw data on a 4Tb usb3 disk (going back years
59	and yes, I do restore regularly, and not just for disasters but for
60	space efficient long term storage I access only rarely.
61
62	e.g.:
63
64	A single host:
65
66	------------------------------------------------------------------------------
67	Original size Compressed size Deduplicated size
68	All archives: 3.07 TB 1.96 TB
69	151.80 GB
70
71	Unique chunks Total chunks
72	Chunk index: 1026085 22285913
73
74
75	Then there is my offline storage - it backs up ~15 hosts (in repos like
76	the above) + data storage like 22 years of email etc. Each host backs up
77	to its own repo then the offline storage backs that up. The
78	deduplicated size is the actual on disk size ... compression varies as
79	its whatever I used at the time the backup was taken ... currently I
80	have it set to "auto,zstd,11" but it can be mixed in the same repo (a
81	repo is a single backup set - you can nest repos which is what I do - so
82	~45Tb stored on a 4Tb offline disk). One advantage of a system like
83	this is chunked data rarely changes, so its only the differences that
84	are backed up (read the borgbackup docs - interesting)
85
86	------------------------------------------------------------------------------
87	Original size Compressed size Deduplicated size
88	All archives: 28.69 TB 28.69 TB
89	3.81 TB
90
91	Unique chunks Total chunks
92	Chunk index: