1 |
Thank you all for the reply's. let me answer them in one reply. |
2 |
|
3 |
Björn Gustafsson wrote: |
4 |
|
5 |
> Now what kind of systems are these? Home-grown arrays or "real" ones? |
6 |
> In the latter case, are there no vendor-provided approaches to this? |
7 |
|
8 |
These are what you would call homegrown, we are now looking at more |
9 |
serieus systems. but they also cost serious money.. |
10 |
|
11 |
> I'm not sure how this would apply to regular filesystems (no idea |
12 |
> which one you use though), but in "larger" (not size-wise) systems, a |
13 |
> bitmap of the filesystem is kept in a separate location separate, and |
14 |
> disk areas with changed or added files are marked as dirty, and |
15 |
> transferred to the remote host either immediately (with synchronous |
16 |
> i/o), as soon as possible (async i/o), or when requested (veeeery |
17 |
> async i/o ;)). This is rather effective system, with the backup speed |
18 |
> mainly dependent on the size you would choose for the bitmap (large |
19 |
> bitmap => smaller blocks => potentially less data) and transfer |
20 |
> speed.. Restructuring of data on the physical disk would also create a |
21 |
> major update of blocks to be transferred. |
22 |
|
23 |
hmm i like that, is it possible to filter the things that are |
24 |
transferred? for example i dont want to mirror deletions. |
25 |
|
26 |
> Marking folders as dirty is another solution, however 50k files is a |
27 |
> bit big. Implementing dirty files in chunks of say 50 or 100 would be |
28 |
> a half-way solution, but that'd be dependant on the application [see |
29 |
> below]. |
30 |
|
31 |
another nice suggestion, if the list of individual edited files is |
32 |
getting to big, we can indeed start working with groups.. |
33 |
|
34 |
|
35 |
From Alex efros: |
36 |
|
37 |
In your case that mean, for example: it's probably best solution to |
38 |
backup issue to change a way how files changed so what changed files |
39 |
isn't really CHANGED, but instead new version is just ADDED to collection. |
40 |
This way it will be enough for you to just remember which file was |
41 |
backuped last by previous backup and on next backup continue from that |
42 |
file (I suppose all your files are numbered: "(1-50000,50001-100000, etc)"). |
43 |
|
44 |
This way backup will not depend on collection size (only on amount of |
45 |
added files) and will not depend on some "special feature" in application |
46 |
(like constructing list of changed files) which may have bugs. |
47 |
|
48 |
I really like this solution. It has several advantages: |
49 |
- it's really simple |
50 |
- it requires no interaction with the application. |
51 |
- it gives a little overhead in diskspace, but thats probably negligable |
52 |
to the 20/30% of picture we dont need, but just don't delete. we dont |
53 |
want to take the risk of deleting wrong files. and ofc if ever the |
54 |
service is misused, we still have evidence. |
55 |
- no need to construct a list. just a last-backupped-photo-id somewhere. |
56 |
|
57 |
|
58 |
From Mikey: |
59 |
|
60 |
Certainly every file is not linked directly in a web page? |
61 |
|
62 |
Why not keep links in a database that point to the correct location on disk |
63 |
for the images themselves? Then all you need to do is query the database |
64 |
for a timestamp field that has changed and you know what you need to back |
65 |
up, and at any time you can move the underlying files around and update the |
66 |
links in the database... |
67 |
|
68 |
You can be fairly sure that atleast 80% of the photo's is accessible on |
69 |
the website. |
70 |
I dont really understand you here, but i think we allready have what you |
71 |
mean. |
72 |
but for completeness, this is abit how the system works. |
73 |
We allready keep a record of the photo's in the db for |
74 |
bookkeeping/userinfo/accessrights/albums/etc... etc... |
75 |
the actual location of the image-file is determined by the id plus a |
76 |
secret. so image 11809373 can be accessed using |
77 |
http://interval1.rendered.startpda.net/11800001-11850000/11809373_120_120_sfi5.jpeg. |
78 |
This allows to do resizing (120_120), provide the content with a simple |
79 |
system of apache servers and squids. |
80 |
|
81 |
jos |
82 |
-- |
83 |
gentoo-server@g.o mailing list |