Gentoo Archives: gentoo-server

From: "Longman
To: "'gentoo-server@l.g.o'" <gentoo-server@l.g.o>
Subject: RE: [gentoo-server] Re: [OT] Mirroring/backing-up a large
Date: Wed, 19 Apr 2006 17:13:03
Message-Id: 4BB1E365BF26D311914A00805FA6A1C11FCD2311@admsrvnt02.enet.sharplabs.com
1 > In normal circumstances, databases are more efficient at
2 > handling lookups than filesystems.
3 >
4 > In your image application database, use a timestamp field
5 > that is updated whenever images are added or updated.
6 >
7 > Generate your backup jobs based on queries to this database
8 > instead of requiring rsync to do its differencing thing. For
9 > example you can automate a process that queries the database
10 > for images that have been updated or added since the last
11 > time it ran and generate a file list or backup job that only
12 > copies over new or updated images based on the timestamp.
13 > You would have to somehow map within the database the actual
14 > physical location of the files if you are not already doing
15 > it, in addition to using squid/apache to translate to the client.
16 >
17 > That is the first step.
18
19 MIkey's right here. You cannot expect your filesystem to be able to return
20 the query results you need. You need to take this "out of band". You could
21 even store your data base on a separate filesystem so you don't use I/O in
22 the disk array that you need for the backups.
23
24 > The second step is to ditch storing everything on a single
25 > 9TB system that cannot be backed up efficiently. Distribute
26 > the storage of the images on clusters or whatever. For
27 > example peel of 1TB of images onto a single server, then
28 > update the database (or apache/squid mapping) to point to the
29 > new location. 9 1TB boxes would be far less prone to
30 > catastrophic failure and much easier to
31 > replicate/mirror/backup than a single 9TB box. This is what
32 > I call the "google approach" ;) Use cheap commodity hardware
33 > and smart implementation to distribute/scale the load.
34
35 Many years ago, I was at a Progress database conference and one very useful
36 presentation was about the effect on performance of a large data store
37 without increasing the bandwidth available to that data store. The speaker's
38 example showed how your performance decreases when you have one large
39 database but still only a single channel for access. His point was to
40 increase the number of channels along with the size of the store, otherwise
41 you actually lose performance. This is tantamount to MIkey's discussion
42 above. Spread out your disks if possible. Your problem is that you cannot
43 get more channels into your backup store, so you'll have to think about
44 either a separate local backup SAN or a provider with more bandwidth.
45
46 Bill
47 --
48 gentoo-server@g.o mailing list