Gentoo Archives: gentoo-user

From: Willie Wong <wwong@××××××××××××××.EDU>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far
Date: Sun, 07 Feb 2010 20:10:06
Message-Id: 20100207193947.GB30196@math.princeton.edu
In Reply to: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far by Mark Knecht
1 On Sun, Feb 07, 2010 at 08:27:46AM -0800, Mark Knecht wrote:
2 > <QUOTE>
3 > 4KB physical sectors: KNOW WHAT YOU'RE DOING!
4 >
5 > Pros: Quiet, cool-running, big cache
6 >
7 > Cons: The 4KB physical sectors are a problem waiting to happen. If you
8 > misalign your partitions, disk performance can suffer. I ran
9 > benchmarks in Linux using a number of filesystems, and I found that
10 > with most filesystems, read performance and write performance with
11 > large files didn't suffer with misaligned partitions, but writes of
12 > many small files (unpacking a Linux kernel archive) could take several
13 > times as long with misaligned partitions as with aligned partitions.
14 > WD's advice about who needs to be concerned is overly simplistic,
15 > IMHO, and it's flat-out wrong for Linux, although it's probably
16 > accurate for 90% of buyers (those who run Windows or Mac OS and use
17 > their standard partitioning tools). If you're not part of that 90%,
18 > though, and if you don't fully understand this new technology and how
19 > to handle it, buy a drive with conventional 512-byte sectors!
20 > </QUOTE>
21 >
22 > Now, I don't mind getting a bit dirty learning to use this
23 > correctly but I'm wondering what that means in a practical sense.
24 > Reading the mke2fs man page the word 'sector' doesn't come up. It's my
25 > understanding the Linux 'blocks' are groups of sectors. True? If the
26 > disk must use 4K sectors then what - the smallest block has to be 4K
27 > and I'm using 1 sector per block? It seems that ext3 doesn't support
28 > anything larger than 4K?
29
30 The problem is not when you are making the filesystem with mke2fs, but
31 when you partitioned the disk using fdisk. I'm sure I am making some
32 small mistakes in the explanation below, but it goes something like
33 this:
34
35 a) The harddrive with 4K sectors allows the head to efficiently
36 read/write 4K sized blocks at a time.
37 b) However, to be compatible in hardware, the harddrive allows 512B
38 sized blocks to be addressed. In reality, this means that you can
39 individually address the 8 512B-sized chunks of the 4K sized blocks,
40 but each will count as a separate operation. To illustrate: say the
41 hardware has some sector X of size 4K. It has 8 addressable slots
42 inside X1 ... X8 each of size 512B. If your OS clusters read/writes on
43 the 512B level, it will send 8 commands to read the info in those 8
44 blocks separately. If your OS clusters in 4K, it will send one
45 command. So in the stupid analysis I give here, it will take 8 times
46 as long for the 512B addressing to read the same data, since it will
47 take 8 passes, and each time inefficiently reading only 1/8 of the
48 data required. Now in reality, drives are smarter than that: if all 8
49 of those are sent in sequence, sometimes the drives will cluster them
50 together in one read.
51 c) A problem occurs, however, when your OS deals with 4K clusters but
52 when you make the partition, the partition is offset! Imagine the
53 physical read sectors of your disk looking like
54
55 AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD
56
57 but when you make your partitions, somehow you partitioned it
58
59 ....YYYYYYYYZZZZZZZZWWWWWWWW....
60
61 This is possible because the drive allows addressing by 512K chunks.
62 So for some reason one of your partitions starts halfway inside a
63 physical sector. What is the problem with this? Now suppose your OS
64 sends data to be written to the ZZZZZZZZ block. If it were completely
65 aligned, the drive will just go kink-move the head to the block, and
66 overwrite it with this information. But since half of the block is
67 over the BBBB phsical sector, and half over CCCC, what the disk now
68 needs to do is to
69
70 pass 1) read BBBBBBBB
71 pass 2) modify the second half of BBBB to match the first half of ZZZZ
72 pass 3) write BBBBBBBB
73 pass 4) read CCCCCCCC
74 pass 5) modify the first half of CCCC to match the second half of ZZZZ
75 pass 6) write CCCCCCCC
76
77 Or what is known as a read-modify-write operation. Thus the disk
78 becomes a lot less efficient.
79
80 ----------
81
82 Now, I don't know if this is the actual problem is causing your
83 performance problems. But this may be it. When you use fdisk, it
84 defaults to aligning the partition to cylinder boundaries, and use the
85 default (from ancient times) value of 63 x (512B sized) sectors per
86 track. Since 63 is not evenly divisible by 8, you see that quite
87 likely some of your partitions are not aligned to the physical sector
88 boundaries.
89
90 If you use cfdisk, you can try to change the geometry with the command
91 g. Or you can use the command u to change the units used in the
92 partitioning to either sectors or megabytes, and make sure your
93 partition sizes are a multiple of 8 in the former, or an integer in
94 the latter.
95
96 Again, take what I wrote with a grain of salt: this information came
97 from the research I did a little while back after reading the slashdot
98 article on this 4K switch. So being my own understanding, it may not
99 completely be correct.
100
101 HTH,
102
103 W
104 --
105 Willie W. Wong wwong@××××××××××××××.edu
106 Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
107 et vice versa ~~~ I. Newton

Replies