Re: [gentoo-user] Backup program that compresses data but only changes new files. - gentoo-user

From:	John Covici <covici@××××××××××.com>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Backup program that compresses data but only changes new files.
Date:	Mon, 15 Aug 2022 09:45:45
Message-Id:	`m34jydj0w1.wl-covici@ccs.covici.com`
In Reply to:	Re: [gentoo-user] Backup program that compresses data but only changes new files. by Dale

1

On Mon, 15 Aug 2022 04:33:44 -0400,

2

Dale wrote:

3

>

4

> William Kenworthy wrote:

5

> >

6

> > On 15/8/22 06:44, Dale wrote:

7

> >> Howdy,

8

> >>

9

> >> With my new fiber internet, my poor disks are getting a work out, and

10

> >> also filling up.  First casualty, my backup disk.  I have one directory

11

> >> that is . . . well . . . huge.  It's about 7TBs or so.  This is where it

12

> >> is right now and it's still trying to pack in files.

13

> >>

14

> >>

15

> >> /dev/mapper/8tb            7.3T  7.1T  201G  98% /mnt/8tb

16

> >>

17

> >>

18

> >> Right now, I'm using rsync which doesn't compress files but does just

19

> >> update things that have changed.  I'd like to find some way, software

20

> >> but maybe there is already a tool I'm unaware of, to compress data and

21

> >> work a lot like rsync otherwise.  I looked in app-backup and there is a

22

> >> lot of options but not sure which fits best for what I want to do.

23

> >> Again, backup a directory, compress and only update with changed or new

24

> >> files.  Generally, it only adds files but sometimes a file gets replaced

25

> >> as well.  Same name but different size.

26

> >>

27

> >> I was trying to go through the list in app-backup one by one but to be

28

> >> honest, most links included only go to github or something and usually

29

> >> doesn't tell anything about how it works or anything.  Basically, as far

30

> >> as seeing if it does what I want, it's useless. It sort of reminds me of

31

> >> quite a few USE flag descriptions.

32

> >>

33

> >> I plan to buy another hard drive pretty soon.  Next month is possible.

34

> >> If there is nothing available that does what I want, is there a way to

35

> >> use rsync and have it set to backup files starting with "a" through "k"

36

> >> to one spot and then backup "l" through "z" to another?  I could then

37

> >> split the files into two parts.  I use a script to do this now, if one

38

> >> could call my little things scripts, so even a complicated command could

39

> >> work, just may need help figuring out the command.

40

> >>

41

> >> Thoughts?  Ideas?

42

> >>

43

> >> Dale

44

> >>

45

> >> :-)  :-)

46

> >>

47

> > The questions you need to ask is how compressible is the data and how

48

> > much duplication is in there.  Rsync's biggest disadvantage is it

49

> > doesn't keep history, so if you need to restore something from last

50

> > week you are SOL.  Honestly, rsync is not a backup program and should

51

> > only be used the way you do for data that don't value as an rsync

52

> > archive is a disaster waiting to happen from a backup point of view.

53

> >

54

> > Look into dirvish - uses hard links to keep files current but safe, is

55

> > easy to restore (looks like a exact copy so you cp the files back if

56

> > needed.  Downside is it hammers the hard disk and has no compression

57

> > so its only deduplication via history (my backups stabilised about 2x

58

> > original size for ~2yrs of history - though you can use something like

59

> > btrfs which has filesystem level compression.

60

> >

61

> > My current program is borgbackup which is very sophisticated in how it

62

> > stores data - its probably your best bet in fact.  I am storing

63

> > literally tens of Tb of raw data on a 4Tb usb3 disk (going back years

64

> > and yes, I do restore regularly, and not just for disasters but for

65

> > space efficient long term storage I access only rarely.

66

> >

67

> > e.g.:

68

> >

69

> > A single host:

70

> >

71

> > ------------------------------------------------------------------------------

72

> >

73

> >                        Original size      Compressed size Deduplicated

74

> > size

75

> > All archives:                3.07 TB              1.96 TB           

76

> > 151.80 GB

77

> >

78

> >                        Unique chunks         Total chunks

79

> > Chunk index:                 1026085             22285913

80

> >

81

> >

82

> > Then there is my offline storage - it backs up ~15 hosts (in repos

83

> > like the above) + data storage like 22 years of email etc. Each host

84

> > backs up to its own repo then the offline storage backs that up.  The

85

> > deduplicated size is the actual on disk size ... compression varies as

86

> > its whatever I used at the time the backup was taken ... currently I

87

> > have it set to "auto,zstd,11" but it can be mixed in the same repo (a

88

> > repo is a single backup set - you can nest repos which is what I do -

89

> > so ~45Tb stored on a 4Tb offline disk).  One advantage of a system

90

> > like this is chunked data rarely changes, so its only the differences

91

> > that are backed up (read the borgbackup docs - interesting)

92

> >

93

> > ------------------------------------------------------------------------------

94

> >

95

> >                        Original size      Compressed size Deduplicated

96

> > size

97

> > All archives:               28.69 TB             28.69 TB             

98

> > 3.81 TB

99

> >

100

> >                        Unique chunks         Total chunks

101

> > Chunk index:

102

> >

103

> >

104

> >

105

> >

106

>

107

>

108

> For the particular drive in question, it is 99.99% videos.  I don't want

109

> to lose any quality but I'm not sure how much they can be compressed to

110

> be honest.  It could be they are already as compressed as they can be

111

> without losing resolution etc.  I've been lucky so far.  I don't think

112

> I've ever needed anything and did a backup losing what I lost on working

113

> copy.  Example.  I update a video only to find the newer copy is corrupt

114

> and wanting the old one back.  I've done it a time or two but I tend to

115

> find that before I do backups.  Still, it is a downside and something

116

> I've thought about before.  I figure when it does happen, it will be

117

> something hard to replace.  Just letting the devil have his day.  :-(

118

>

119

> For that reason, I find the version type backups interesting.  It is a

120

> safer method.  You can have a new file but also have a older file as

121

> well just in case new file takes a bad turn.  It is a interesting

122

> thought.  It's one not only I should consider but anyone really. 

123

>

124

> As I posted in another reply, I found a 10TB drive that should be here

125

> by the time I do a fresh set of backups.  This will give me more time to

126

> consider things.  Have I said this before a while back???  :/ 

127

>

128

129

zfs would solve your problem of corruption, even without versioning.

130

You do a scrub at short intervals and at least you would know if the

131

file is corrupted.  Of course, redundancy is better, such as mirroring

132

and backups take a very short time because sending from one zfs to

133

another it knows exactly what bytes to send.

134

135

--

136

Your life is like a penny.  You're going to lose it.  The question is:

137

How do

138

you spend it?

139

140

         John Covici wb2una

141

         covici@××××××××××.com

Gentoo Archives: gentoo-user

Replies

1	On Mon, 15 Aug 2022 04:33:44 -0400,
2	Dale wrote:
3	>
4	> William Kenworthy wrote:
5	> >
6	> > On 15/8/22 06:44, Dale wrote:
7	> >> Howdy,
8	> >>
9	> >> With my new fiber internet, my poor disks are getting a work out, and
10	> >> also filling up. First casualty, my backup disk. I have one directory
11	> >> that is . . . well . . . huge. It's about 7TBs or so. This is where it
12	> >> is right now and it's still trying to pack in files.
13	> >>
14	> >>
15	> >> /dev/mapper/8tb 7.3T 7.1T 201G 98% /mnt/8tb
16	> >>
17	> >>
18	> >> Right now, I'm using rsync which doesn't compress files but does just
19	> >> update things that have changed. I'd like to find some way, software
20	> >> but maybe there is already a tool I'm unaware of, to compress data and
21	> >> work a lot like rsync otherwise. I looked in app-backup and there is a
22	> >> lot of options but not sure which fits best for what I want to do.
23	> >> Again, backup a directory, compress and only update with changed or new
24	> >> files. Generally, it only adds files but sometimes a file gets replaced
25	> >> as well. Same name but different size.
26	> >>
27	> >> I was trying to go through the list in app-backup one by one but to be
28	> >> honest, most links included only go to github or something and usually
29	> >> doesn't tell anything about how it works or anything. Basically, as far
30	> >> as seeing if it does what I want, it's useless. It sort of reminds me of
31	> >> quite a few USE flag descriptions.
32	> >>
33	> >> I plan to buy another hard drive pretty soon. Next month is possible.
34	> >> If there is nothing available that does what I want, is there a way to
35	> >> use rsync and have it set to backup files starting with "a" through "k"
36	> >> to one spot and then backup "l" through "z" to another? I could then
37	> >> split the files into two parts. I use a script to do this now, if one
38	> >> could call my little things scripts, so even a complicated command could
39	> >> work, just may need help figuring out the command.
40	> >>
41	> >> Thoughts? Ideas?
42	> >>
43	> >> Dale
44	> >>
45	> >> :-) :-)
46	> >>
47	> > The questions you need to ask is how compressible is the data and how
48	> > much duplication is in there. Rsync's biggest disadvantage is it
49	> > doesn't keep history, so if you need to restore something from last
50	> > week you are SOL. Honestly, rsync is not a backup program and should
51	> > only be used the way you do for data that don't value as an rsync
52	> > archive is a disaster waiting to happen from a backup point of view.
53	> >
54	> > Look into dirvish - uses hard links to keep files current but safe, is
55	> > easy to restore (looks like a exact copy so you cp the files back if
56	> > needed. Downside is it hammers the hard disk and has no compression
57	> > so its only deduplication via history (my backups stabilised about 2x
58	> > original size for ~2yrs of history - though you can use something like
59	> > btrfs which has filesystem level compression.
60	> >
61	> > My current program is borgbackup which is very sophisticated in how it
62	> > stores data - its probably your best bet in fact. I am storing
63	> > literally tens of Tb of raw data on a 4Tb usb3 disk (going back years
64	> > and yes, I do restore regularly, and not just for disasters but for
65	> > space efficient long term storage I access only rarely.
66	> >
67	> > e.g.:
68	> >
69	> > A single host:
70	> >
71	> > ------------------------------------------------------------------------------
72	> >
73	> > Original size Compressed size Deduplicated
74	> > size
75	> > All archives: 3.07 TB 1.96 TB
76	> > 151.80 GB
77	> >
78	> > Unique chunks Total chunks
79	> > Chunk index: 1026085 22285913
80	> >
81	> >
82	> > Then there is my offline storage - it backs up ~15 hosts (in repos
83	> > like the above) + data storage like 22 years of email etc. Each host
84	> > backs up to its own repo then the offline storage backs that up. The
85	> > deduplicated size is the actual on disk size ... compression varies as
86	> > its whatever I used at the time the backup was taken ... currently I
87	> > have it set to "auto,zstd,11" but it can be mixed in the same repo (a
88	> > repo is a single backup set - you can nest repos which is what I do -
89	> > so ~45Tb stored on a 4Tb offline disk). One advantage of a system
90	> > like this is chunked data rarely changes, so its only the differences
91	> > that are backed up (read the borgbackup docs - interesting)
92	> >
93	> > ------------------------------------------------------------------------------
94	> >
95	> > Original size Compressed size Deduplicated
96	> > size
97	> > All archives: 28.69 TB 28.69 TB
98	> > 3.81 TB
99	> >
100	> > Unique chunks Total chunks
101	> > Chunk index:
102	> >
103	> >
104	> >
105	> >
106	>
107	>
108	> For the particular drive in question, it is 99.99% videos. I don't want
109	> to lose any quality but I'm not sure how much they can be compressed to
110	> be honest. It could be they are already as compressed as they can be
111	> without losing resolution etc. I've been lucky so far. I don't think
112	> I've ever needed anything and did a backup losing what I lost on working
113	> copy. Example. I update a video only to find the newer copy is corrupt
114	> and wanting the old one back. I've done it a time or two but I tend to
115	> find that before I do backups. Still, it is a downside and something
116	> I've thought about before. I figure when it does happen, it will be
117	> something hard to replace. Just letting the devil have his day. :-(
118	>
119	> For that reason, I find the version type backups interesting. It is a
120	> safer method. You can have a new file but also have a older file as
121	> well just in case new file takes a bad turn. It is a interesting
122	> thought. It's one not only I should consider but anyone really.
123	>
124	> As I posted in another reply, I found a 10TB drive that should be here
125	> by the time I do a fresh set of backups. This will give me more time to
126	> consider things. Have I said this before a while back??? :/
127	>
128
129	zfs would solve your problem of corruption, even without versioning.
130	You do a scrub at short intervals and at least you would know if the
131	file is corrupted. Of course, redundancy is better, such as mirroring
132	and backups take a very short time because sending from one zfs to
133	another it knows exactly what bytes to send.
134
135	--
136	Your life is like a penny. You're going to lose it. The question is:
137	How do
138	you spend it?
139
140	John Covici wb2una
141	covici@××××××××××.com