Gentoo Archives: gentoo-user

From: Mark Knecht <markknecht@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far
Date: Sun, 07 Feb 2010 21:03:32
Message-Id: 5bdc1c8b1002071231pd809728y69e4f5e7eede9918@mail.gmail.com
In Reply to: Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far by Willie Wong
1 On Sun, Feb 7, 2010 at 11:39 AM, Willie Wong <wwong@××××××××××××××.edu> wrote:
2 > On Sun, Feb 07, 2010 at 08:27:46AM -0800, Mark Knecht wrote:
3 >> <QUOTE>
4 >> 4KB physical sectors: KNOW WHAT YOU'RE DOING!
5 >>
6 >> Pros: Quiet, cool-running, big cache
7 >>
8 >> Cons: The 4KB physical sectors are a problem waiting to happen. If you
9 >> misalign your partitions, disk performance can suffer. I ran
10 >> benchmarks in Linux using a number of filesystems, and I found that
11 >> with most filesystems, read performance and write performance with
12 >> large files didn't suffer with misaligned partitions, but writes of
13 >> many small files (unpacking a Linux kernel archive) could take several
14 >> times as long with misaligned partitions as with aligned partitions.
15 >> WD's advice about who needs to be concerned is overly simplistic,
16 >> IMHO, and it's flat-out wrong for Linux, although it's probably
17 >> accurate for 90% of buyers (those who run Windows or Mac OS and use
18 >> their standard partitioning tools). If you're not part of that 90%,
19 >> though, and if you don't fully understand this new technology and how
20 >> to handle it, buy a drive with conventional 512-byte sectors!
21 >> </QUOTE>
22 >>
23 >>    Now, I don't mind getting a bit dirty learning to use this
24 >> correctly but I'm wondering what that means in a practical sense.
25 >> Reading the mke2fs man page the word 'sector' doesn't come up. It's my
26 >> understanding the Linux 'blocks' are groups of sectors. True? If the
27 >> disk must use 4K sectors then what - the smallest block has to be 4K
28 >> and I'm using 1 sector per block? It seems that ext3 doesn't support
29 >> anything larger than 4K?
30 >
31 > The problem is not when you are making the filesystem with mke2fs, but
32 > when you partitioned the disk using fdisk. I'm sure I am making some
33 > small mistakes in the explanation below, but it goes something like
34 > this:
35 >
36 > a) The harddrive with 4K sectors allows the head to efficiently
37 > read/write 4K sized blocks at a time.
38 > b) However, to be compatible in hardware, the harddrive allows 512B
39 > sized blocks to be addressed. In reality, this means that you can
40 > individually address the 8 512B-sized chunks of the 4K sized blocks,
41 > but each will count as a separate operation. To illustrate: say the
42 > hardware has some sector X of size 4K. It has 8 addressable slots
43 > inside X1 ... X8 each of size 512B. If your OS clusters read/writes on
44 > the 512B level, it will send 8 commands to read the info in those 8
45 > blocks separately. If your OS clusters in 4K, it will send one
46 > command. So in the stupid analysis I give here, it will take 8 times
47 > as long for the 512B addressing to read the same data, since it will
48 > take 8 passes, and each time inefficiently reading only 1/8 of the
49 > data required. Now in reality, drives are smarter than that: if all 8
50 > of those are sent in sequence, sometimes the drives will cluster them
51 > together in one read.
52 > c) A problem occurs, however, when your OS deals with 4K clusters but
53 > when you make the partition, the partition is offset! Imagine the
54 > physical read sectors of your disk looking like
55 >
56 > AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD
57 >
58 > but when you make your partitions, somehow you partitioned it
59 >
60 > ....YYYYYYYYZZZZZZZZWWWWWWWW....
61 >
62 > This is possible because the drive allows addressing by 512K chunks.
63 > So for some reason one of your partitions starts halfway inside a
64 > physical sector. What is the problem with this? Now suppose your OS
65 > sends data to be written to the ZZZZZZZZ block. If it were completely
66 > aligned, the drive will just go kink-move the head to the block, and
67 > overwrite it with this information. But since half of the block is
68 > over the BBBB phsical sector, and half over CCCC, what the disk now
69 > needs to do is to
70 >
71 > pass 1) read BBBBBBBB
72 > pass 2) modify the second half of BBBB to match the first half of ZZZZ
73 > pass 3) write BBBBBBBB
74 > pass 4) read CCCCCCCC
75 > pass 5) modify the first half of CCCC to match the second half of ZZZZ
76 > pass 6) write CCCCCCCC
77 >
78 > Or what is known as a read-modify-write operation. Thus the disk
79 > becomes a lot less efficient.
80 >
81 > ----------
82 >
83 > Now, I don't know if this is the actual problem is causing your
84 > performance problems. But this may be it. When you use fdisk, it
85 > defaults to aligning the partition to cylinder boundaries, and use the
86 > default (from ancient times) value of 63 x (512B sized) sectors per
87 > track. Since 63 is not evenly divisible by 8, you see that quite
88 > likely some of your partitions are not aligned to the physical sector
89 > boundaries.
90 >
91 > If you use cfdisk, you can try to change the geometry with the command
92 > g. Or you can use the command u to change the units used in the
93 > partitioning to either sectors or megabytes, and make sure your
94 > partition sizes are a multiple of 8 in the former, or an integer in
95 > the latter.
96 >
97 > Again, take what I wrote with a grain of salt: this information came
98 > from the research I did a little while back after reading the slashdot
99 > article on this 4K switch. So being my own understanding, it may not
100 > completely be correct.
101 >
102 > HTH,
103 >
104 > W
105 > --
106 > Willie W. Wong                                     wwong@××××××××××××××.edu
107 > Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
108 >         et vice versa   ~~~  I. Newton
109 >
110 >
111
112 Willie,
113 Thanks. Your description above is pretty much consistent (I think)
114 with the information I found at the WD site explaining how the data is
115 being physically packed on the drive. Being that I have the OS set up
116 on a different drive I was able to blow away all the partitions so I
117 just created 1 large 1T partition but I think that doesn't deal with
118 the exact problem you outline.
119
120 I'll have to study how to change the geometry. I do see that cfdisk
121 is reporting 255/63/121601. Am I to choose a size that __smaller__
122 than 63 but a multiple of 8? I.e. - 56? And then if I do that does the
123 partitioning of the drive just ignore those last 7 sectors and reduce
124 capacity by 56/63 or about 11%?
125
126 Or is it legal to push the number of sectors up to 64? I would have
127 thought that the sector count would be driven by really low level
128 formatting and I shouldn't be messing with that.
129
130 Assuming I have done what you are suggesting then with 7
131 blocks/track then I need to choose the starting positions of each
132 partition to be aligned to the start of a new 8 sector blocks?
133
134 It's very strange that the disk industry chose anything that's not
135 2^X but I guess they did.
136
137 As per your and Volker's suggestions I'm going to study the proper
138 way to align partitions before I do anything more. I did find a small
139 program called 'fio' that does some interesting drive testing
140 including seek time testing. I need to study how to really use it
141 though. It can set up multiple threads to simulate loads that are more
142 real-world like.
143
144 Thanks to you both for the responses.
145
146 Cheers,
147 Mark

Replies

Subject Author
Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far Kyle Bader <kyle.bader@×××××.com>