1 |
On Sun, Feb 07, 2010 at 08:27:46AM -0800, Mark Knecht wrote: |
2 |
> <QUOTE> |
3 |
> 4KB physical sectors: KNOW WHAT YOU'RE DOING! |
4 |
> |
5 |
> Pros: Quiet, cool-running, big cache |
6 |
> |
7 |
> Cons: The 4KB physical sectors are a problem waiting to happen. If you |
8 |
> misalign your partitions, disk performance can suffer. I ran |
9 |
> benchmarks in Linux using a number of filesystems, and I found that |
10 |
> with most filesystems, read performance and write performance with |
11 |
> large files didn't suffer with misaligned partitions, but writes of |
12 |
> many small files (unpacking a Linux kernel archive) could take several |
13 |
> times as long with misaligned partitions as with aligned partitions. |
14 |
> WD's advice about who needs to be concerned is overly simplistic, |
15 |
> IMHO, and it's flat-out wrong for Linux, although it's probably |
16 |
> accurate for 90% of buyers (those who run Windows or Mac OS and use |
17 |
> their standard partitioning tools). If you're not part of that 90%, |
18 |
> though, and if you don't fully understand this new technology and how |
19 |
> to handle it, buy a drive with conventional 512-byte sectors! |
20 |
> </QUOTE> |
21 |
> |
22 |
> Now, I don't mind getting a bit dirty learning to use this |
23 |
> correctly but I'm wondering what that means in a practical sense. |
24 |
> Reading the mke2fs man page the word 'sector' doesn't come up. It's my |
25 |
> understanding the Linux 'blocks' are groups of sectors. True? If the |
26 |
> disk must use 4K sectors then what - the smallest block has to be 4K |
27 |
> and I'm using 1 sector per block? It seems that ext3 doesn't support |
28 |
> anything larger than 4K? |
29 |
|
30 |
The problem is not when you are making the filesystem with mke2fs, but |
31 |
when you partitioned the disk using fdisk. I'm sure I am making some |
32 |
small mistakes in the explanation below, but it goes something like |
33 |
this: |
34 |
|
35 |
a) The harddrive with 4K sectors allows the head to efficiently |
36 |
read/write 4K sized blocks at a time. |
37 |
b) However, to be compatible in hardware, the harddrive allows 512B |
38 |
sized blocks to be addressed. In reality, this means that you can |
39 |
individually address the 8 512B-sized chunks of the 4K sized blocks, |
40 |
but each will count as a separate operation. To illustrate: say the |
41 |
hardware has some sector X of size 4K. It has 8 addressable slots |
42 |
inside X1 ... X8 each of size 512B. If your OS clusters read/writes on |
43 |
the 512B level, it will send 8 commands to read the info in those 8 |
44 |
blocks separately. If your OS clusters in 4K, it will send one |
45 |
command. So in the stupid analysis I give here, it will take 8 times |
46 |
as long for the 512B addressing to read the same data, since it will |
47 |
take 8 passes, and each time inefficiently reading only 1/8 of the |
48 |
data required. Now in reality, drives are smarter than that: if all 8 |
49 |
of those are sent in sequence, sometimes the drives will cluster them |
50 |
together in one read. |
51 |
c) A problem occurs, however, when your OS deals with 4K clusters but |
52 |
when you make the partition, the partition is offset! Imagine the |
53 |
physical read sectors of your disk looking like |
54 |
|
55 |
AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD |
56 |
|
57 |
but when you make your partitions, somehow you partitioned it |
58 |
|
59 |
....YYYYYYYYZZZZZZZZWWWWWWWW.... |
60 |
|
61 |
This is possible because the drive allows addressing by 512K chunks. |
62 |
So for some reason one of your partitions starts halfway inside a |
63 |
physical sector. What is the problem with this? Now suppose your OS |
64 |
sends data to be written to the ZZZZZZZZ block. If it were completely |
65 |
aligned, the drive will just go kink-move the head to the block, and |
66 |
overwrite it with this information. But since half of the block is |
67 |
over the BBBB phsical sector, and half over CCCC, what the disk now |
68 |
needs to do is to |
69 |
|
70 |
pass 1) read BBBBBBBB |
71 |
pass 2) modify the second half of BBBB to match the first half of ZZZZ |
72 |
pass 3) write BBBBBBBB |
73 |
pass 4) read CCCCCCCC |
74 |
pass 5) modify the first half of CCCC to match the second half of ZZZZ |
75 |
pass 6) write CCCCCCCC |
76 |
|
77 |
Or what is known as a read-modify-write operation. Thus the disk |
78 |
becomes a lot less efficient. |
79 |
|
80 |
---------- |
81 |
|
82 |
Now, I don't know if this is the actual problem is causing your |
83 |
performance problems. But this may be it. When you use fdisk, it |
84 |
defaults to aligning the partition to cylinder boundaries, and use the |
85 |
default (from ancient times) value of 63 x (512B sized) sectors per |
86 |
track. Since 63 is not evenly divisible by 8, you see that quite |
87 |
likely some of your partitions are not aligned to the physical sector |
88 |
boundaries. |
89 |
|
90 |
If you use cfdisk, you can try to change the geometry with the command |
91 |
g. Or you can use the command u to change the units used in the |
92 |
partitioning to either sectors or megabytes, and make sure your |
93 |
partition sizes are a multiple of 8 in the former, or an integer in |
94 |
the latter. |
95 |
|
96 |
Again, take what I wrote with a grain of salt: this information came |
97 |
from the research I did a little while back after reading the slashdot |
98 |
article on this 4K switch. So being my own understanding, it may not |
99 |
completely be correct. |
100 |
|
101 |
HTH, |
102 |
|
103 |
W |
104 |
-- |
105 |
Willie W. Wong wwong@××××××××××××××.edu |
106 |
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire |
107 |
et vice versa ~~~ I. Newton |