1 |
> In normal circumstances, databases are more efficient at |
2 |
> handling lookups than filesystems. |
3 |
> |
4 |
> In your image application database, use a timestamp field |
5 |
> that is updated whenever images are added or updated. |
6 |
> |
7 |
> Generate your backup jobs based on queries to this database |
8 |
> instead of requiring rsync to do its differencing thing. For |
9 |
> example you can automate a process that queries the database |
10 |
> for images that have been updated or added since the last |
11 |
> time it ran and generate a file list or backup job that only |
12 |
> copies over new or updated images based on the timestamp. |
13 |
> You would have to somehow map within the database the actual |
14 |
> physical location of the files if you are not already doing |
15 |
> it, in addition to using squid/apache to translate to the client. |
16 |
> |
17 |
> That is the first step. |
18 |
|
19 |
MIkey's right here. You cannot expect your filesystem to be able to return |
20 |
the query results you need. You need to take this "out of band". You could |
21 |
even store your data base on a separate filesystem so you don't use I/O in |
22 |
the disk array that you need for the backups. |
23 |
|
24 |
> The second step is to ditch storing everything on a single |
25 |
> 9TB system that cannot be backed up efficiently. Distribute |
26 |
> the storage of the images on clusters or whatever. For |
27 |
> example peel of 1TB of images onto a single server, then |
28 |
> update the database (or apache/squid mapping) to point to the |
29 |
> new location. 9 1TB boxes would be far less prone to |
30 |
> catastrophic failure and much easier to |
31 |
> replicate/mirror/backup than a single 9TB box. This is what |
32 |
> I call the "google approach" ;) Use cheap commodity hardware |
33 |
> and smart implementation to distribute/scale the load. |
34 |
|
35 |
Many years ago, I was at a Progress database conference and one very useful |
36 |
presentation was about the effect on performance of a large data store |
37 |
without increasing the bandwidth available to that data store. The speaker's |
38 |
example showed how your performance decreases when you have one large |
39 |
database but still only a single channel for access. His point was to |
40 |
increase the number of channels along with the size of the store, otherwise |
41 |
you actually lose performance. This is tantamount to MIkey's discussion |
42 |
above. Spread out your disks if possible. Your problem is that you cannot |
43 |
get more channels into your backup store, so you'll have to think about |
44 |
either a separate local backup SAN or a provider with more bandwidth. |
45 |
|
46 |
Bill |
47 |
-- |
48 |
gentoo-server@g.o mailing list |