Gentoo Archives: gentoo-server

From: "Björn Gustafsson" <kex3@×××××.nu>
To: gentoo-server@l.g.o
Subject: Re: [gentoo-server] Mirroring/backing-up a large 15Million (6TB) file collection
Date: Wed, 19 Apr 2006 14:03:03
Message-Id: 444642B3.7020903@towel.nu
In Reply to: [gentoo-server] Mirroring/backing-up a large 15Million (6TB) file collection by jos houtman
1 jos houtman wrote:
2 > - The collection is saved on a 9TB system.
3 > - The backups are two off-site 4TB systems, the collections needs to be
4 > split over these.
5 Now what kind of systems are these? Home-grown arrays or "real" ones? In
6 the latter case, are there no vendor-provided approaches to this?
7
8 I'm not sure how this would apply to regular filesystems (no idea which
9 one you use though), but in "larger" (not size-wise) systems, a bitmap
10 of the filesystem is kept in a separate location separate, and disk
11 areas with changed or added files are marked as dirty, and transferred
12 to the remote host either immediately (with synchronous i/o), as soon as
13 possible (async i/o), or when requested (veeeery async i/o ;)). This is
14 rather effective system, with the backup speed mainly dependent on the
15 size you would choose for the bitmap (large bitmap => smaller blocks =>
16 potentially less data) and transfer speed.. Restructuring of data on the
17 physical disk would also create a major update of blocks to be transferred.
18
19 I suppose that that approach on a standard linux filesystem would
20 require some extensive hacking of the fs-code, which probably isn't the
21 first route to try.
22
23 > - Our backup-window is the whole day as long as this does not provide a
24 > performance drain. Reality is that we need to use the quiet night hours
25 > 0 to 8.
26 > - The collection is stored in a set of subdirectories each containing
27 > 50.000 files. (1-50000,50001-100000, etc). There are ~300 subdirs in use
28 > now.
29
30 Marking folders as dirty is another solution, however 50k files is a bit
31 big. Implementing dirty files in chunks of say 50 or 100 would be a
32 half-way solution, but that'd be dependant on the application [see below].
33
34 > Only problem is constructing the list and capturing the knowledge while
35 > it is available, two options exist:
36 > At system level this can be done using for example I-notify, this
37 > requires a user-daemon. If the daemon crashes changes will be missed
38 > though.
39 > At application (the one making the changes) level this can also be done,
40 > when the application crashes no changes are made, so nothing is missed.
41 > But it does require making the backup dependent on the application. Not
42 > an ideal situation.
43 Sure, it's not ideal, but as you put it yourself "when the application
44 crashes no changes are made", so there's no real loss in that case.
45 Provided of course that nobody accidentally comments the wrong lines of
46 code ;)
47
48
49 Not sure if this is of any help to you, I've mainly been involved with
50 these kinds of setups with hardware solutions, so I'm a loss as to how
51 they relate to a software approach to it. And I'm lacking caffeine ;)
52
53 /Björn
54 --
55 gentoo-server@g.o mailing list

Replies