Gentoo Archives: gentoo-server

From: "Jesse
To: gentoo-server@l.g.o
Subject: RE: [gentoo-server] Re: [OT] Mirroring/backing-up a large
Date: Wed, 19 Apr 2006 18:23:32
Message-Id: FB5D3CCFCECC2948B5DCF4CABDBE66975460FC@QTEX1.qg.com
1 I've only been able to loosely follow this thread, but what's the layout
2 of the 6TB? I would hope that this isn't an IDE farm! A SAN should be
3 considered. From the sound of it, that kind of budget (~$12K US) isn't
4 there, even for a low-end IBM DS4100 (dual controller). Then again, the
5 HBAs needed to connect the SAN may not work or be supported under
6 Gentoo. What hardware are you using?
7
8 Also, what's the filesystem? Some filesystems have distinct performance
9 advantages over others for different situations. Google can lead you to
10 whitepapers comparing EXT3, XFS, JFS, Reiser, etc.
11
12 Lastly, do testing on how your filesystem would perform with more of a
13 hierarchical structure. Instead of burying 50K files in each directory,
14 would a tree of smaller directories perform better for you? We've had
15 performance problems with VxFS on Solaris on a file vault similar to
16 yours when the directory size is larger than 10K-15K files (I forget
17 what the number was). By creating subdirs (and, if needed, sub-subdirs,
18 and sub-sub-subdirs, etc.), and modifying the controlling programs, the
19 access/update times can possibly be dramatically reduced. And, yes,
20 this IS basically manual management of a b-tree index, which databases
21 like Oracle (and perhaps PostgreSQL) do so much better. :)
22
23 Just some thoughts...
24
25 Rich
26
27
28 -----Original Message-----
29 From: Longman, Bill [mailto:longman@×××××××××.com]
30 Sent: Wednesday, April 19, 2006 12:12 PM
31 To: 'gentoo-server@l.g.o'
32 Subject: RE: [gentoo-server] Re: [OT] Mirroring/backing-up a large
33
34
35 > In normal circumstances, databases are more efficient at
36 > handling lookups than filesystems.
37 >
38 > In your image application database, use a timestamp field
39 > that is updated whenever images are added or updated.
40 >
41 > Generate your backup jobs based on queries to this database
42 > instead of requiring rsync to do its differencing thing. For
43 > example you can automate a process that queries the database
44 > for images that have been updated or added since the last
45 > time it ran and generate a file list or backup job that only
46 > copies over new or updated images based on the timestamp.
47 > You would have to somehow map within the database the actual
48 > physical location of the files if you are not already doing
49 > it, in addition to using squid/apache to translate to the client.
50 >
51 > That is the first step.
52
53 MIkey's right here. You cannot expect your filesystem to be able to
54 return
55 the query results you need. You need to take this "out of band". You
56 could
57 even store your data base on a separate filesystem so you don't use I/O
58 in
59 the disk array that you need for the backups.
60
61 > The second step is to ditch storing everything on a single
62 > 9TB system that cannot be backed up efficiently. Distribute
63 > the storage of the images on clusters or whatever. For
64 > example peel of 1TB of images onto a single server, then
65 > update the database (or apache/squid mapping) to point to the
66 > new location. 9 1TB boxes would be far less prone to
67 > catastrophic failure and much easier to
68 > replicate/mirror/backup than a single 9TB box. This is what
69 > I call the "google approach" ;) Use cheap commodity hardware
70 > and smart implementation to distribute/scale the load.
71
72 Many years ago, I was at a Progress database conference and one very
73 useful
74 presentation was about the effect on performance of a large data store
75 without increasing the bandwidth available to that data store. The
76 speaker's
77 example showed how your performance decreases when you have one large
78 database but still only a single channel for access. His point was to
79 increase the number of channels along with the size of the store,
80 otherwise
81 you actually lose performance. This is tantamount to MIkey's discussion
82 above. Spread out your disks if possible. Your problem is that you
83 cannot
84 get more channels into your backup store, so you'll have to think about
85 either a separate local backup SAN or a provider with more bandwidth.
86
87 Bill
88 --
89 gentoo-server@g.o mailing list
90
91 --
92 gentoo-server@g.o mailing list

Replies

Subject Author
Re: [gentoo-server] Re: [OT] Mirroring/backing-up a large Steffen Furholm <steffen@××××××××.dk>