1 |
On Sun, Feb 7, 2010 at 11:39 AM, Willie Wong <wwong@××××××××××××××.edu> wrote: |
2 |
> On Sun, Feb 07, 2010 at 08:27:46AM -0800, Mark Knecht wrote: |
3 |
>> <QUOTE> |
4 |
>> 4KB physical sectors: KNOW WHAT YOU'RE DOING! |
5 |
>> |
6 |
>> Pros: Quiet, cool-running, big cache |
7 |
>> |
8 |
>> Cons: The 4KB physical sectors are a problem waiting to happen. If you |
9 |
>> misalign your partitions, disk performance can suffer. I ran |
10 |
>> benchmarks in Linux using a number of filesystems, and I found that |
11 |
>> with most filesystems, read performance and write performance with |
12 |
>> large files didn't suffer with misaligned partitions, but writes of |
13 |
>> many small files (unpacking a Linux kernel archive) could take several |
14 |
>> times as long with misaligned partitions as with aligned partitions. |
15 |
>> WD's advice about who needs to be concerned is overly simplistic, |
16 |
>> IMHO, and it's flat-out wrong for Linux, although it's probably |
17 |
>> accurate for 90% of buyers (those who run Windows or Mac OS and use |
18 |
>> their standard partitioning tools). If you're not part of that 90%, |
19 |
>> though, and if you don't fully understand this new technology and how |
20 |
>> to handle it, buy a drive with conventional 512-byte sectors! |
21 |
>> </QUOTE> |
22 |
>> |
23 |
>> Now, I don't mind getting a bit dirty learning to use this |
24 |
>> correctly but I'm wondering what that means in a practical sense. |
25 |
>> Reading the mke2fs man page the word 'sector' doesn't come up. It's my |
26 |
>> understanding the Linux 'blocks' are groups of sectors. True? If the |
27 |
>> disk must use 4K sectors then what - the smallest block has to be 4K |
28 |
>> and I'm using 1 sector per block? It seems that ext3 doesn't support |
29 |
>> anything larger than 4K? |
30 |
> |
31 |
> The problem is not when you are making the filesystem with mke2fs, but |
32 |
> when you partitioned the disk using fdisk. I'm sure I am making some |
33 |
> small mistakes in the explanation below, but it goes something like |
34 |
> this: |
35 |
> |
36 |
> a) The harddrive with 4K sectors allows the head to efficiently |
37 |
> read/write 4K sized blocks at a time. |
38 |
> b) However, to be compatible in hardware, the harddrive allows 512B |
39 |
> sized blocks to be addressed. In reality, this means that you can |
40 |
> individually address the 8 512B-sized chunks of the 4K sized blocks, |
41 |
> but each will count as a separate operation. To illustrate: say the |
42 |
> hardware has some sector X of size 4K. It has 8 addressable slots |
43 |
> inside X1 ... X8 each of size 512B. If your OS clusters read/writes on |
44 |
> the 512B level, it will send 8 commands to read the info in those 8 |
45 |
> blocks separately. If your OS clusters in 4K, it will send one |
46 |
> command. So in the stupid analysis I give here, it will take 8 times |
47 |
> as long for the 512B addressing to read the same data, since it will |
48 |
> take 8 passes, and each time inefficiently reading only 1/8 of the |
49 |
> data required. Now in reality, drives are smarter than that: if all 8 |
50 |
> of those are sent in sequence, sometimes the drives will cluster them |
51 |
> together in one read. |
52 |
> c) A problem occurs, however, when your OS deals with 4K clusters but |
53 |
> when you make the partition, the partition is offset! Imagine the |
54 |
> physical read sectors of your disk looking like |
55 |
> |
56 |
> AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD |
57 |
> |
58 |
> but when you make your partitions, somehow you partitioned it |
59 |
> |
60 |
> ....YYYYYYYYZZZZZZZZWWWWWWWW.... |
61 |
> |
62 |
> This is possible because the drive allows addressing by 512K chunks. |
63 |
> So for some reason one of your partitions starts halfway inside a |
64 |
> physical sector. What is the problem with this? Now suppose your OS |
65 |
> sends data to be written to the ZZZZZZZZ block. If it were completely |
66 |
> aligned, the drive will just go kink-move the head to the block, and |
67 |
> overwrite it with this information. But since half of the block is |
68 |
> over the BBBB phsical sector, and half over CCCC, what the disk now |
69 |
> needs to do is to |
70 |
> |
71 |
> pass 1) read BBBBBBBB |
72 |
> pass 2) modify the second half of BBBB to match the first half of ZZZZ |
73 |
> pass 3) write BBBBBBBB |
74 |
> pass 4) read CCCCCCCC |
75 |
> pass 5) modify the first half of CCCC to match the second half of ZZZZ |
76 |
> pass 6) write CCCCCCCC |
77 |
> |
78 |
> Or what is known as a read-modify-write operation. Thus the disk |
79 |
> becomes a lot less efficient. |
80 |
> |
81 |
> ---------- |
82 |
> |
83 |
> Now, I don't know if this is the actual problem is causing your |
84 |
> performance problems. But this may be it. When you use fdisk, it |
85 |
> defaults to aligning the partition to cylinder boundaries, and use the |
86 |
> default (from ancient times) value of 63 x (512B sized) sectors per |
87 |
> track. Since 63 is not evenly divisible by 8, you see that quite |
88 |
> likely some of your partitions are not aligned to the physical sector |
89 |
> boundaries. |
90 |
> |
91 |
> If you use cfdisk, you can try to change the geometry with the command |
92 |
> g. Or you can use the command u to change the units used in the |
93 |
> partitioning to either sectors or megabytes, and make sure your |
94 |
> partition sizes are a multiple of 8 in the former, or an integer in |
95 |
> the latter. |
96 |
> |
97 |
> Again, take what I wrote with a grain of salt: this information came |
98 |
> from the research I did a little while back after reading the slashdot |
99 |
> article on this 4K switch. So being my own understanding, it may not |
100 |
> completely be correct. |
101 |
> |
102 |
> HTH, |
103 |
> |
104 |
> W |
105 |
> -- |
106 |
> Willie W. Wong wwong@××××××××××××××.edu |
107 |
> Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire |
108 |
> et vice versa ~~~ I. Newton |
109 |
> |
110 |
> |
111 |
|
112 |
Willie, |
113 |
Thanks. Your description above is pretty much consistent (I think) |
114 |
with the information I found at the WD site explaining how the data is |
115 |
being physically packed on the drive. Being that I have the OS set up |
116 |
on a different drive I was able to blow away all the partitions so I |
117 |
just created 1 large 1T partition but I think that doesn't deal with |
118 |
the exact problem you outline. |
119 |
|
120 |
I'll have to study how to change the geometry. I do see that cfdisk |
121 |
is reporting 255/63/121601. Am I to choose a size that __smaller__ |
122 |
than 63 but a multiple of 8? I.e. - 56? And then if I do that does the |
123 |
partitioning of the drive just ignore those last 7 sectors and reduce |
124 |
capacity by 56/63 or about 11%? |
125 |
|
126 |
Or is it legal to push the number of sectors up to 64? I would have |
127 |
thought that the sector count would be driven by really low level |
128 |
formatting and I shouldn't be messing with that. |
129 |
|
130 |
Assuming I have done what you are suggesting then with 7 |
131 |
blocks/track then I need to choose the starting positions of each |
132 |
partition to be aligned to the start of a new 8 sector blocks? |
133 |
|
134 |
It's very strange that the disk industry chose anything that's not |
135 |
2^X but I guess they did. |
136 |
|
137 |
As per your and Volker's suggestions I'm going to study the proper |
138 |
way to align partitions before I do anything more. I did find a small |
139 |
program called 'fio' that does some interesting drive testing |
140 |
including seek time testing. I need to study how to really use it |
141 |
though. It can set up multiple threads to simulate loads that are more |
142 |
real-world like. |
143 |
|
144 |
Thanks to you both for the responses. |
145 |
|
146 |
Cheers, |
147 |
Mark |