Gentoo Archives: gentoo-server

From: jos houtman <jos@×××××.nl>
To: gentoo-server@l.g.o
Subject: Re: [gentoo-server][OT] Mirroring/backing-up a large 15Million (6TB) file collection
Date: Wed, 19 Apr 2006 15:45:02
Message-Id: 44465A3D.1060502@hyves.nl
In Reply to: Re: [gentoo-server] Mirroring/backing-up a large 15Million (6TB) file collection by "Björn Gustafsson"
1 Thank you all for the reply's. let me answer them in one reply.
2
3 Björn Gustafsson wrote:
4
5 > Now what kind of systems are these? Home-grown arrays or "real" ones?
6 > In the latter case, are there no vendor-provided approaches to this?
7
8 These are what you would call homegrown, we are now looking at more
9 serieus systems. but they also cost serious money..
10
11 > I'm not sure how this would apply to regular filesystems (no idea
12 > which one you use though), but in "larger" (not size-wise) systems, a
13 > bitmap of the filesystem is kept in a separate location separate, and
14 > disk areas with changed or added files are marked as dirty, and
15 > transferred to the remote host either immediately (with synchronous
16 > i/o), as soon as possible (async i/o), or when requested (veeeery
17 > async i/o ;)). This is rather effective system, with the backup speed
18 > mainly dependent on the size you would choose for the bitmap (large
19 > bitmap => smaller blocks => potentially less data) and transfer
20 > speed.. Restructuring of data on the physical disk would also create a
21 > major update of blocks to be transferred.
22
23 hmm i like that, is it possible to filter the things that are
24 transferred? for example i dont want to mirror deletions.
25
26 > Marking folders as dirty is another solution, however 50k files is a
27 > bit big. Implementing dirty files in chunks of say 50 or 100 would be
28 > a half-way solution, but that'd be dependant on the application [see
29 > below].
30
31 another nice suggestion, if the list of individual edited files is
32 getting to big, we can indeed start working with groups..
33
34
35 From Alex efros:
36
37 In your case that mean, for example: it's probably best solution to
38 backup issue to change a way how files changed so what changed files
39 isn't really CHANGED, but instead new version is just ADDED to collection.
40 This way it will be enough for you to just remember which file was
41 backuped last by previous backup and on next backup continue from that
42 file (I suppose all your files are numbered: "(1-50000,50001-100000, etc)").
43
44 This way backup will not depend on collection size (only on amount of
45 added files) and will not depend on some "special feature" in application
46 (like constructing list of changed files) which may have bugs.
47
48 I really like this solution. It has several advantages:
49 - it's really simple
50 - it requires no interaction with the application.
51 - it gives a little overhead in diskspace, but thats probably negligable
52 to the 20/30% of picture we dont need, but just don't delete. we dont
53 want to take the risk of deleting wrong files. and ofc if ever the
54 service is misused, we still have evidence.
55 - no need to construct a list. just a last-backupped-photo-id somewhere.
56
57
58 From Mikey:
59
60 Certainly every file is not linked directly in a web page?
61
62 Why not keep links in a database that point to the correct location on disk
63 for the images themselves? Then all you need to do is query the database
64 for a timestamp field that has changed and you know what you need to back
65 up, and at any time you can move the underlying files around and update the
66 links in the database...
67
68 You can be fairly sure that atleast 80% of the photo's is accessible on
69 the website.
70 I dont really understand you here, but i think we allready have what you
71 mean.
72 but for completeness, this is abit how the system works.
73 We allready keep a record of the photo's in the db for
74 bookkeeping/userinfo/accessrights/albums/etc... etc...
75 the actual location of the image-file is determined by the id plus a
76 secret. so image 11809373 can be accessed using
77 http://interval1.rendered.startpda.net/11800001-11850000/11809373_120_120_sfi5.jpeg.
78 This allows to do resizing (120_120), provide the content with a simple
79 system of apache servers and squids.
80
81 jos
82 --
83 gentoo-server@g.o mailing list

Replies

Subject Author
[gentoo-server] Re: [OT] Mirroring/backing-up a large MIkey <mikey@×××××××××××.com>