1 |
I've only been able to loosely follow this thread, but what's the layout |
2 |
of the 6TB? I would hope that this isn't an IDE farm! A SAN should be |
3 |
considered. From the sound of it, that kind of budget (~$12K US) isn't |
4 |
there, even for a low-end IBM DS4100 (dual controller). Then again, the |
5 |
HBAs needed to connect the SAN may not work or be supported under |
6 |
Gentoo. What hardware are you using? |
7 |
|
8 |
Also, what's the filesystem? Some filesystems have distinct performance |
9 |
advantages over others for different situations. Google can lead you to |
10 |
whitepapers comparing EXT3, XFS, JFS, Reiser, etc. |
11 |
|
12 |
Lastly, do testing on how your filesystem would perform with more of a |
13 |
hierarchical structure. Instead of burying 50K files in each directory, |
14 |
would a tree of smaller directories perform better for you? We've had |
15 |
performance problems with VxFS on Solaris on a file vault similar to |
16 |
yours when the directory size is larger than 10K-15K files (I forget |
17 |
what the number was). By creating subdirs (and, if needed, sub-subdirs, |
18 |
and sub-sub-subdirs, etc.), and modifying the controlling programs, the |
19 |
access/update times can possibly be dramatically reduced. And, yes, |
20 |
this IS basically manual management of a b-tree index, which databases |
21 |
like Oracle (and perhaps PostgreSQL) do so much better. :) |
22 |
|
23 |
Just some thoughts... |
24 |
|
25 |
Rich |
26 |
|
27 |
|
28 |
-----Original Message----- |
29 |
From: Longman, Bill [mailto:longman@×××××××××.com] |
30 |
Sent: Wednesday, April 19, 2006 12:12 PM |
31 |
To: 'gentoo-server@l.g.o' |
32 |
Subject: RE: [gentoo-server] Re: [OT] Mirroring/backing-up a large |
33 |
|
34 |
|
35 |
> In normal circumstances, databases are more efficient at |
36 |
> handling lookups than filesystems. |
37 |
> |
38 |
> In your image application database, use a timestamp field |
39 |
> that is updated whenever images are added or updated. |
40 |
> |
41 |
> Generate your backup jobs based on queries to this database |
42 |
> instead of requiring rsync to do its differencing thing. For |
43 |
> example you can automate a process that queries the database |
44 |
> for images that have been updated or added since the last |
45 |
> time it ran and generate a file list or backup job that only |
46 |
> copies over new or updated images based on the timestamp. |
47 |
> You would have to somehow map within the database the actual |
48 |
> physical location of the files if you are not already doing |
49 |
> it, in addition to using squid/apache to translate to the client. |
50 |
> |
51 |
> That is the first step. |
52 |
|
53 |
MIkey's right here. You cannot expect your filesystem to be able to |
54 |
return |
55 |
the query results you need. You need to take this "out of band". You |
56 |
could |
57 |
even store your data base on a separate filesystem so you don't use I/O |
58 |
in |
59 |
the disk array that you need for the backups. |
60 |
|
61 |
> The second step is to ditch storing everything on a single |
62 |
> 9TB system that cannot be backed up efficiently. Distribute |
63 |
> the storage of the images on clusters or whatever. For |
64 |
> example peel of 1TB of images onto a single server, then |
65 |
> update the database (or apache/squid mapping) to point to the |
66 |
> new location. 9 1TB boxes would be far less prone to |
67 |
> catastrophic failure and much easier to |
68 |
> replicate/mirror/backup than a single 9TB box. This is what |
69 |
> I call the "google approach" ;) Use cheap commodity hardware |
70 |
> and smart implementation to distribute/scale the load. |
71 |
|
72 |
Many years ago, I was at a Progress database conference and one very |
73 |
useful |
74 |
presentation was about the effect on performance of a large data store |
75 |
without increasing the bandwidth available to that data store. The |
76 |
speaker's |
77 |
example showed how your performance decreases when you have one large |
78 |
database but still only a single channel for access. His point was to |
79 |
increase the number of channels along with the size of the store, |
80 |
otherwise |
81 |
you actually lose performance. This is tantamount to MIkey's discussion |
82 |
above. Spread out your disks if possible. Your problem is that you |
83 |
cannot |
84 |
get more channels into your backup store, so you'll have to think about |
85 |
either a separate local backup SAN or a provider with more bandwidth. |
86 |
|
87 |
Bill |
88 |
-- |
89 |
gentoo-server@g.o mailing list |
90 |
|
91 |
-- |
92 |
gentoo-server@g.o mailing list |