Gentoo Archives: gentoo-server

From: Alex Efros <powerman@××××××××××××××××××.com>
To: gentoo-server@l.g.o
Subject: Re: [gentoo-server] [OT] Mirroring/backing-up a large 15Million (6TB) file collection
Date: Wed, 19 Apr 2006 12:51:16
Message-Id: 20060419124828.GA1641@home.power
In Reply to: [gentoo-server] Mirroring/backing-up a large 15Million (6TB) file collection by jos houtman
1 Hi!
2
3 On Wed, Apr 19, 2006 at 02:08:56PM +0200, jos houtman wrote:
4 > current situation:
5 > - The collection is stored in a set of subdirectories each containing
6 > 50.000 files. (1-50000,50001-100000, etc). There are ~300 subdirs in use
7 > now.
8 > - Files are never deleted.
9 > - In the future it can happen that files change. my exception is that
10 > atmost a few thousand files a day will change, scattered over the whole
11 > collection with an emphasis on the most recent files.
12 [cut]
13 > Only problem is constructing the list and capturing the knowledge while
14 > it is available, two options exist:
15 > At system level this can be done using for example I-notify, this
16 > requires a user-daemon. If the daemon crashes changes will be missed
17 > though.
18 > At application (the one making the changes) level this can also be done,
19 > when the application crashes no changes are made, so nothing is missed.
20 > But it does require making the backup dependent on the application. Not
21 > an ideal situation.
22
23 At first, this issue isn't Gentoo-specific, so it should at least be
24 marked [OT] in subject, I think. ;-)
25
26 My experience in complex backups says: it's nearly impossible to make
27 effective (fast and reliable) backup for some complex application without
28 writing that application with backup feature in mind.
29
30 In your case that mean, for example: it's probably best solution to
31 backup issue to change a way how files changed so what changed files
32 isn't really CHANGED, but instead new version is just ADDED to collection.
33 This way it will be enough for you to just remember which file was
34 backuped last by previous backup and on next backup continue from that
35 file (I suppose all your files are numbered: "(1-50000,50001-100000, etc)").
36
37 This way backup will not depend on collection size (only on amount of
38 added files) and will not depend on some "special feature" in application
39 (like constructing list of changed files) which may have bugs.
40
41 In case if your application need newer version of file has same name
42 as previous version and this behaviour can't be changed, then you can
43 consider some special solutions like: after ADDING newer version to
44 collection replace previous version by symlink to newer version. To
45 backup these symlinks you will need additional step like:
46 find /collection -type l -print0 | xargs -0 tar ...
47 I've no idea is what "find -type l" will be fast enough for you, but I
48 suppose it will be much much much faster than rsync, just because it
49 don't need to read all files in collection and calculate their checksums.
50
51 --
52 WBR, Alex.
53 --
54 gentoo-server@g.o mailing list