Gentoo Archives: gentoo-server

From:	"Björn Gustafsson" <kex3@×××××.nu>
To:	gentoo-server@l.g.o
Subject:	Re: [gentoo-server] Mirroring/backing-up a large 15Million (6TB) file collection
Date:	Wed, 19 Apr 2006 14:03:03
Message-Id:	`444642B3.7020903@towel.nu`
In Reply to:	[gentoo-server] Mirroring/backing-up a large 15Million (6TB) file collection by jos houtman

1	jos houtman wrote:
2	> - The collection is saved on a 9TB system.
3	> - The backups are two off-site 4TB systems, the collections needs to be
4	> split over these.
5	Now what kind of systems are these? Home-grown arrays or "real" ones? In
6	the latter case, are there no vendor-provided approaches to this?
7
8	I'm not sure how this would apply to regular filesystems (no idea which
9	one you use though), but in "larger" (not size-wise) systems, a bitmap
10	of the filesystem is kept in a separate location separate, and disk
11	areas with changed or added files are marked as dirty, and transferred
12	to the remote host either immediately (with synchronous i/o), as soon as
13	possible (async i/o), or when requested (veeeery async i/o ;)). This is
14	rather effective system, with the backup speed mainly dependent on the
15	size you would choose for the bitmap (large bitmap => smaller blocks =>
16	potentially less data) and transfer speed.. Restructuring of data on the
17	physical disk would also create a major update of blocks to be transferred.
18
19	I suppose that that approach on a standard linux filesystem would
20	require some extensive hacking of the fs-code, which probably isn't the
21	first route to try.
22
23	> - Our backup-window is the whole day as long as this does not provide a
24	> performance drain. Reality is that we need to use the quiet night hours
25	> 0 to 8.
26	> - The collection is stored in a set of subdirectories each containing
27	> 50.000 files. (1-50000,50001-100000, etc). There are ~300 subdirs in use
28	> now.
29
30	Marking folders as dirty is another solution, however 50k files is a bit
31	big. Implementing dirty files in chunks of say 50 or 100 would be a
32	half-way solution, but that'd be dependant on the application [see below].
33
34	> Only problem is constructing the list and capturing the knowledge while
35	> it is available, two options exist:
36	> At system level this can be done using for example I-notify, this
37	> requires a user-daemon. If the daemon crashes changes will be missed
38	> though.
39	> At application (the one making the changes) level this can also be done,
40	> when the application crashes no changes are made, so nothing is missed.
41	> But it does require making the backup dependent on the application. Not
42	> an ideal situation.
43	Sure, it's not ideal, but as you put it yourself "when the application
44	crashes no changes are made", so there's no real loss in that case.
45	Provided of course that nobody accidentally comments the wrong lines of
46	code ;)
47
48
49	Not sure if this is of any help to you, I've mainly been involved with
50	these kinds of setups with hardware solutions, so I'm a loss as to how
51	they relate to a software approach to it. And I'm lacking caffeine ;)
52
53	/Björn
54	--
55	gentoo-server@g.o mailing list

Replies

Subject	Author
Re: [gentoo-server][OT] Mirroring/backing-up a large 15Million (6TB) file collection	jos houtman <jos@×××××.nl>

Report Message

Find on MARC Find on Google Groups