1 |
Am 02.10.2010 14:11, schrieb Volker Armin Hemmann: |
2 |
> On Saturday 02 October 2010, Florian Philipp wrote: |
3 |
[...] |
4 |
>> |
5 |
>> Assumptions: |
6 |
>> |
7 |
>> 1. Seek time is constant. For HDDs we can take an average value. Of |
8 |
>> course this doesn't work for tapes. They have a seek time which |
9 |
>> increases linearly with the distance between the fragments. |
10 |
> |
11 |
> I think you misunderstood my remark. |
12 |
> |
13 |
> Tapes try to stream. Take an old DLT drive with 5-10mb/sec streaming speed. |
14 |
> Slow, isn't it? |
15 |
> |
16 |
> But when you do a backup on such an old tape even with a modern harddisk you |
17 |
> have problems keeping it streaming. As soon as you hit a directory with many |
18 |
> small files - like ~/Mail or /usr/portage you are screwed. |
19 |
> |
20 |
> Yes, you have wonderfull 100mb/sec when you read a big, fat file. Or a single |
21 |
> small file. But when you have houndreds, thousands or hundreds of thousands of |
22 |
> small files, harddisks suck. |
23 |
> And your tape drive has to stop and rewind every couple of seconds because |
24 |
> your harddisks were not able to keep up the required 10mb/sec. Trueley |
25 |
> pathetic. |
26 |
|
27 |
Well, that's exactly what my little math shows. When you read 4kB files, |
28 |
you can end up with 0.0065 * 50 MB/s = 0.32 MB/s effective throughput |
29 |
(worst case). |
30 |
|
31 |
> |
32 |
> Besides, seek times are not constant ;) |
33 |
> |
34 |
|
35 |
Sure they aren't. That's why it is stated as an assumption. It is just a |
36 |
model. Like every model it has its limits.[1] It doesn't take into |
37 |
account prefetching, caching and NCQ/TCQ, for example. |
38 |
|
39 |
Still it is a valid assumption: *On average*, the read/write head has to |
40 |
move around half the radius of the platter to reach its next position |
41 |
and it has to wait for half a rotation until the right block is under |
42 |
the head. If we assume that fragments are uniformly distributed over the |
43 |
whole disk, we can simply take an average value for seek times. |
44 |
|
45 |
The model also doesn't take into account that even with no |
46 |
fragmentation, there might be some seek operations: Blocks on an HDD are |
47 |
organized in rings (tracks), not as a spiral like the sound track on an |
48 |
good old LP. That means that at some point, the r/w head has to switch |
49 |
to the next track when the file does not reside on one track alone. |
50 |
|
51 |
[1] A bit off-topic: I work in applied sciences and engineering. There |
52 |
I've learned two basic rules about models: 1. Truth doesn't matter, |
53 |
usefulness does. 2. Every model has its limits. Knowing these limits is |
54 |
the single most important important thing when using a model. |