Gentoo Archives: gentoo-user

From:	Mark Knecht <markknecht@×××××.com>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far
Date:	Sun, 07 Feb 2010 22:08:48
Message-Id:	`5bdc1c8b1002071342v6c81cf13gde7bcef72be5017b@mail.gmail.com`
In Reply to:	Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far by Willie Wong

1	On Sun, Feb 7, 2010 at 11:39 AM, Willie Wong <wwong@××××××××××××××.edu> wrote:
2	> On Sun, Feb 07, 2010 at 08:27:46AM -0800, Mark Knecht wrote:
3	>> <QUOTE>
4	>> 4KB physical sectors: KNOW WHAT YOU'RE DOING!
5	>>
6	>> Pros: Quiet, cool-running, big cache
7	>>
8	>> Cons: The 4KB physical sectors are a problem waiting to happen. If you
9	>> misalign your partitions, disk performance can suffer. I ran
10	>> benchmarks in Linux using a number of filesystems, and I found that
11	>> with most filesystems, read performance and write performance with
12	>> large files didn't suffer with misaligned partitions, but writes of
13	>> many small files (unpacking a Linux kernel archive) could take several
14	>> times as long with misaligned partitions as with aligned partitions.
15	>> WD's advice about who needs to be concerned is overly simplistic,
16	>> IMHO, and it's flat-out wrong for Linux, although it's probably
17	>> accurate for 90% of buyers (those who run Windows or Mac OS and use
18	>> their standard partitioning tools). If you're not part of that 90%,
19	>> though, and if you don't fully understand this new technology and how
20	>> to handle it, buy a drive with conventional 512-byte sectors!
21	>> </QUOTE>
22	>>
23	>> Now, I don't mind getting a bit dirty learning to use this
24	>> correctly but I'm wondering what that means in a practical sense.
25	>> Reading the mke2fs man page the word 'sector' doesn't come up. It's my
26	>> understanding the Linux 'blocks' are groups of sectors. True? If the
27	>> disk must use 4K sectors then what - the smallest block has to be 4K
28	>> and I'm using 1 sector per block? It seems that ext3 doesn't support
29	>> anything larger than 4K?
30	>
31	> The problem is not when you are making the filesystem with mke2fs, but
32	> when you partitioned the disk using fdisk. I'm sure I am making some
33	> small mistakes in the explanation below, but it goes something like
34	> this:
35	>
36	> a) The harddrive with 4K sectors allows the head to efficiently
37	> read/write 4K sized blocks at a time.
38	> b) However, to be compatible in hardware, the harddrive allows 512B
39	> sized blocks to be addressed. In reality, this means that you can
40	> individually address the 8 512B-sized chunks of the 4K sized blocks,
41	> but each will count as a separate operation. To illustrate: say the
42	> hardware has some sector X of size 4K. It has 8 addressable slots
43	> inside X1 ... X8 each of size 512B. If your OS clusters read/writes on
44	> the 512B level, it will send 8 commands to read the info in those 8
45	> blocks separately. If your OS clusters in 4K, it will send one
46	> command. So in the stupid analysis I give here, it will take 8 times
47	> as long for the 512B addressing to read the same data, since it will
48	> take 8 passes, and each time inefficiently reading only 1/8 of the
49	> data required. Now in reality, drives are smarter than that: if all 8
50	> of those are sent in sequence, sometimes the drives will cluster them
51	> together in one read.
52	> c) A problem occurs, however, when your OS deals with 4K clusters but
53	> when you make the partition, the partition is offset! Imagine the
54	> physical read sectors of your disk looking like
55	>
56	> AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD
57	>
58	> but when you make your partitions, somehow you partitioned it
59	>
60	> ....YYYYYYYYZZZZZZZZWWWWWWWW....
61	>
62	> This is possible because the drive allows addressing by 512K chunks.
63	> So for some reason one of your partitions starts halfway inside a
64	> physical sector. What is the problem with this? Now suppose your OS
65	> sends data to be written to the ZZZZZZZZ block. If it were completely
66	> aligned, the drive will just go kink-move the head to the block, and
67	> overwrite it with this information. But since half of the block is
68	> over the BBBB phsical sector, and half over CCCC, what the disk now
69	> needs to do is to
70	>
71	> pass 1) read BBBBBBBB
72	> pass 2) modify the second half of BBBB to match the first half of ZZZZ
73	> pass 3) write BBBBBBBB
74	> pass 4) read CCCCCCCC
75	> pass 5) modify the first half of CCCC to match the second half of ZZZZ
76	> pass 6) write CCCCCCCC
77	>
78	> Or what is known as a read-modify-write operation. Thus the disk
79	> becomes a lot less efficient.
80	>
81	> ----------
82	>
83	> Now, I don't know if this is the actual problem is causing your
84	> performance problems. But this may be it. When you use fdisk, it
85	> defaults to aligning the partition to cylinder boundaries, and use the
86	> default (from ancient times) value of 63 x (512B sized) sectors per
87	> track. Since 63 is not evenly divisible by 8, you see that quite
88	> likely some of your partitions are not aligned to the physical sector
89	> boundaries.
90	>
91	> If you use cfdisk, you can try to change the geometry with the command
92	> g. Or you can use the command u to change the units used in the
93	> partitioning to either sectors or megabytes, and make sure your
94	> partition sizes are a multiple of 8 in the former, or an integer in
95	> the latter.
96	>
97	> Again, take what I wrote with a grain of salt: this information came
98	> from the research I did a little while back after reading the slashdot
99	> article on this 4K switch. So being my own understanding, it may not
100	> completely be correct.
101	>
102	> HTH,
103	>
104	> W
105	> --
106	> Willie W. Wong wwong@××××××××××××××.edu
107	> Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
108	> et vice versa ~~~ I. Newton
109	>
110
111	Hi Willie,
112	OK - it turns out if I start fdisk using the -u option it show me
113	sector numbers. Looking at the original partition put on just using
114	default values it had the starting sector was 63 - probably about the
115	worst value it could be. As a test I blew away that partition and
116	created a new one starting at 64 instead and the untar results are
117	vastly improved - down to roughly 20 seconds from 8-10 minutes. That's
118	roughly twice as fast as the old 120GB SATA2 drive I was using to test
119	the system out while I debugged this issue.
120
121	There's still some variability but there's probably other things
122	running on the box - screen savers and stuff - that account for some
123	of that.
124
125	I'm still a little fuzzy about what happens to the extra sectors at
126	the end of a track. Are they used and I pay for a little bit of
127	overhead reading data off of them or are they ignored and I lose
128	capacity? I think it must be the former as my partition isn't all that
129	much less than 1TB.
130
131	Again, many thanks to you and Volker for point this issue out.
132
133	Cheers,
134	Mark
135
136	gandalf TestMount # fdisk -u /dev/sdb
137
138	The number of cylinders for this disk is set to 121601.
139	There is nothing wrong with that, but this is larger than 1024,
140	and could in certain setups cause problems with:
141	1) software that runs at boot time (e.g., old versions of LILO)
142	2) booting and partitioning software from other OSs
143	(e.g., DOS FDISK, OS/2 FDISK)
144
145	Command (m for help): p
146
147	Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
148	255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
149	Units = sectors of 1 * 512 = 512 bytes
150	Disk identifier: 0x67929f10
151
152	Device Boot Start End Blocks Id System
153	/dev/sdb1 64 1953525167 976762552 83 Linux
154
155	Command (m for help): q
156
157	gandalf TestMount # df -H
158	Filesystem Size Used Avail Use% Mounted on
159	/dev/sda3 110G 8.6G 96G 9% /
160	udev 11M 177k 11M 2% /dev
161	shm 2.0G 0 2.0G 0% /dev/shm
162	/dev/sdb1 985G 210M 935G 1% /mnt/TestMount
163	gandalf TestMount #
164
165
166
167	gandalf TestMount # mkdir usr
168	gandalf TestMount # time tar xjf /portage-latest.tar.bz2 -C /mnt/TestMount/usr
169
170	real 0m23.275s
171	user 0m8.614s
172	sys 0m2.644s
173	gandalf TestMount # time rm -rf /mnt/TestMount/usr/
174
175	real 0m3.720s
176	user 0m0.118s
177	sys 0m1.822s
178	gandalf TestMount # mkdir usr
179	gandalf TestMount # time tar xjf /portage-latest.tar.bz2 -C /mnt/TestMount/usr
180
181	real 0m13.828s
182	user 0m8.911s
183	sys 0m2.653s
184	gandalf TestMount # time rm -rf /mnt/TestMount/usr/
185
186	real 0m19.718s
187	user 0m0.128s
188	sys 0m2.025s
189	gandalf TestMount # mkdir usr
190	gandalf TestMount # time tar xjf /portage-latest.tar.bz2 -C /mnt/TestMount/usr
191
192	real 0m25.777s
193	user 0m8.579s
194	sys 0m2.660s
195	gandalf TestMount # time rm -rf /mnt/TestMount/usr/
196
197	real 0m2.564s
198	user 0m0.112s
199	sys 0m1.805s
200	gandalf TestMount #

Replies

Subject	Author
Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far	Willie Wong <wwong@××××××××××××××.EDU>
Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far	Valmor de Almeida <val.gentoo@×××××.com>
Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far	Frank Steinmetzger <Warp_7@×××.de>
Re: [gentoo-user] 1-Terabyte drives - 4K sector sizes? -> bar performance so far	Frank Steinmetzger <Warp_7@×××.de>

Report Message

Find on MARC Find on Google Groups