Gentoo Archives: gentoo-amd64

From:	Mark Knecht <markknecht@×××××.com>
To:	Gentoo AMD64 <gentoo-amd64@l.g.o>
Subject:	Re: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value?
Date:	Fri, 21 Jun 2013 17:40:55
Message-Id:	`CAK2H+ecNGaQ7BfAaWtLkQhg3T-pC56tejDKL6kK+qydLa8YWyg@mail.gmail.com`
In Reply to:	[gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value? by Duncan <1i5t5.duncan@cox.net>

1	On Fri, Jun 21, 2013 at 12:31 AM, Duncan <1i5t5.duncan@×××.net> wrote:
2	> Mark Knecht posted on Thu, 20 Jun 2013 12:10:04 -0700 as excerpted:
3	>
4	>> Does anyone know of info on how the starting sector number might
5	>> impact RAID performance under Gentoo? The drives are WD-500G RE3 drives
6	>> shown here:
7	>>
8	>> http://www.amazon.com/Western-Digital-WD5002ABYS-3-5-inch-Enterprise/dp/
9	> B001EMZPD0/ref=cm_cr_pr_product_top
10	>>
11	>> These are NOT 4k sector sized drives.
12	>>
13	>> Specifically I'm a 5-drive RAID6 for about 1.45TB of storage. My
14	>> benchmarking seems abysmal at around 40MB/S using dd copying large
15	>> files.
16	>> It's higher, around 80MB/S if the file being transferred is coming from
17	>> an SSD, but even 80MB/S seems slow to me. I see a LOT of wait time in
18	>> top.
19	>> And my 'large file' copies might not be large enough as the machine has
20	>> 24GB of DRAM and I've only been copying 21GB so it's possible some of
21	>> that is cached.
22	>
23	> I /suspect/ that the problem isn't striping, tho that can be a factor,
24	> but rather, your choice of raid6. Note that I personally ran md/raid-6
25	> here for awhile, so I know a bit of what I'm talking about. I didn't
26	> realize the full implications of what I was setting up originally, or I'd
27	> have not chosen raid6 in the first place, but live and learn as they say,
28	> and that I did.
29	>
30	> General rule, raid6 is abysmal for writing and gets dramatically worse as
31	> fragmentation sets in, tho reading is reasonable. The reason is that in
32	> ordered to properly parity-check and write out less-than-full-stripe
33	> writes, the system must effectively read-in the existing data and merge
34	> it with the new data, then recalculate the parity, before writing the new
35	> data AND 100% of the (two-way in raid-6) parity. Further, because raid
36	> sits below the filesystem level, it knows nothing about what parts of the
37	> filesystem are actually used, and must read and write the FULL data
38	> stripe (perhaps minus the new data bit, I'm not sure), including parts
39	> that will be empty on a freshly formatted filesystem.
40	>
41	> So with 4k block sizes on a 5-device raid6, you'd have 20k stripes, 12k
42	> in data across three devices, and 8k of parity across the other two
43	> devices. Now you go to write a 1k file, but in ordered to do so the full
44	> 12k of existing data must be read in, even on an empty filesystem,
45	> because the RAID doesn't know it's empty! Then the new data must be
46	> merged in and new checksums created, then the full 20k must be written
47	> back out, certainly the 8k of parity, but also likely the full 12k of
48	> data even if most of it is simply rewrite, but almost certainly at least
49	> the 4k strip on the device the new data is written to.
50	>
51	<SNIP>
52
53
54	Hi Duncan,
55	Wonderful post but much too long to carry on a conversation
56	in-line. As you sound pretty sure of your understanding/history I'll
57	assume you're right 100% of the time, but only maybe 80% of the post
58	feels right to me at this time so let's assume I have much to learn
59	and go from there. I expect that others here are in a similar
60	situation to me - they use RAID but are laboring with little hard data
61	on what different portions of the system are doing and how to optimize
62	it. I certainly feel that's true in my case. I hope this thread over
63	the near or far term future might help a bit for me and potentially
64	others.
65
66	In thinking about this issue this morning I think it's important to
67	me to get down to basics and verify as much as possible, step-by-step,
68	so that I don't layer good work on top of bad assumptions. To that
69	end, and before I move too much farther forward, let me document a few
70	things about my system and the hardware available to work with and see
71	if you, Rich, Bob, Volker or anyone else wants to chime in about what
72	is correct, not correct or a better way to use it.
73
74	Basic Machine - ASUS Rampage II Extreme motherboard (4/1/2010) + 24GB
75	DDR3 + Core i7-980x Extreme 12 core processor
76	1 SDD - 120GB SATA3 on it's own controller
77	5+ HDD - WD5002ABYS RAID Edition 3 SATA3 drives using Intel integrated
78	controllers
79
80	(NOTE: I can possibly go to a 6-drive RAID if I made some changes in
81	the box but that's for later)
82
83	According to the WD spec
84	(http://www.wdc.com/en/library/spec/2879-701281.pdf) the 500GB drives
85	sustain 113MB/S to the drive. Using hdparm I measure 107MB/S or higher
86	for all 5 drives:
87
88	c2RAID6 ~ # hdparm -tT /dev/sdb
89
90	/dev/sdb:
91	Timing cached reads: 17374 MB in 2.00 seconds = 8696.12 MB/sec
92	Timing buffered disk reads: 322 MB in 3.00 seconds = 107.20 MB/sec
93	c2RAID6 ~ #
94
95	The SDD on it's own PCI Express controller clocks in at about 250MB/S for reads.
96
97	c2RAID6 ~ # hdparm -tT /dev/sda
98
99	/dev/sda:
100	Timing cached reads: 17492 MB in 2.00 seconds = 8754.42 MB/sec
101	Timing buffered disk reads: 760 MB in 3.00 seconds = 253.28 MB/sec
102	c2RAID6 ~ #
103
104
105	TESTING: I'm using dd to test. It gives an easy to read anyway result
106	and seems to be used a lot. I can use bonnie++ or IOzone later but I
107	don't think that's necessary quite yet. Being that I have 24GB and
108	don't want cached data to effect the test speeds I do the following:
109
110	1) Using dd I created a 50GB file for copying using the following commands:
111
112	cd /mnt/fastVM
113	dd if=/dev/random of=random1 bs=1000 count=0 seek=$[1000100050]
114
115	mark@c2RAID6 /VirtualMachines/bonnie $ ls -alh /mnt/fastVM/ran*
116	-rw-r--r-- 1 mark mark 47G Jun 21 07:10 /mnt/fastVM/random1
117	mark@c2RAID6 /VirtualMachines/bonnie $
118
119	2) To ensure that nothing is cached and the copies are (hopefully)
120	completely fair as root I do the following between each test:
121
122	sync
123	free -h
124	echo 3 > /proc/sys/vm/drop_caches
125	free -h
126
127	An example:
128
129	c2RAID6 ~ # sync
130	c2RAID6 ~ # free -h
131	total used free shared buffers cached
132	Mem: 23G 23G 129M 0B 8.5M 21G
133	-/+ buffers/cache: 1.6G 21G
134	Swap: 12G 0B 12G
135	c2RAID6 ~ # echo 3 > /proc/sys/vm/drop_caches
136	c2RAID6 ~ # free -h
137	total used free shared buffers cached
138	Mem: 23G 2.6G 20G 0B 884K 1.3G
139	-/+ buffers/cache: 1.3G 22G
140	Swap: 12G 0B 12G
141	c2RAID6 ~ #
142
143	3) As a first test I copy using dd the 50GB file from the SDD to the
144	RAID6. As long as reading the SDD is much faster than writing the
145	RAID6 then it should be a test of primarily the RAID6 write speed:
146
147	mark@c2RAID6 /VirtualMachines/bonnie $ dd if=/mnt/fastVM/random1 of=SDDCopy
148	97656250+0 records in
149	97656250+0 records out
150	50000000000 bytes (50 GB) copied, 339.173 s, 147 MB/s
151	mark@c2RAID6 /VirtualMachines/bonnie $
152
153	If I clear cache as above and rerun the test it's always 145-155MB/S
154
155	4) As a second test I read from the RAID6 and write back to the RAID6.
156	I see MUCH lower speeds, again repeatable:
157
158	mark@c2RAID6 /VirtualMachines/bonnie $ dd if=SDDCopy of=HDDWrite
159	97656250+0 records in
160	97656250+0 records out
161	50000000000 bytes (50 GB) copied, 1187.07 s, 42.1 MB/s
162	mark@c2RAID6 /VirtualMachines/bonnie $
163
164	5) As a final test, and just looking for problems if any, I do an SDD
165	to SDD copy which clocked in at close to 200MB/S
166
167	mark@c2RAID6 /mnt/fastVM $ dd if=random1 of=SDDCopy
168	97656250+0 records in
169	97656250+0 records out
170	50000000000 bytes (50 GB) copied, 251.105 s, 199 MB/s
171	mark@c2RAID6 /mnt/fastVM $
172
173	So, being that this RAID6 was grown yesterday from something that
174	has existed for a year or two I'm not sure of it's fragmentation, or
175	even how to determine that at this time. However it seems my problem
176	are RAID6 reads, not RAID6 writes, at least to new an probably never
177	used disk space.
178
179	I will also report more later but I can state that just using top
180	there's never much CPU usage doing this but a LOT of WAIT time when
181	reading the RAID6. It really appears the system is spinning it's
182	wheels waiting for the RAID to get data from the disk.
183
184	One place where I wanted to double check your thinking. My thought
185	is that a RAID1 will _NEVER_ outperform the hdparm -tT read speeds as
186	it has to read from three drives and make sure they are all good
187	before returning data to the user. I don't see how that could ever be
188	faster than what a single drive file system could do which for these
189	drives would be the 113MB/S WD spec number, correct? As I'm currently
190	getting 145MB/S it appears on the surface that the RAID6 is providing
191	some value, at least in these early days of use. Maybe it will degrade
192	over time though.
193
194	Comments?
195
196	Cheers,
197	Mark

Replies

Subject	Author
Re: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value?	Bob Sanders <rsanders@×××.com>
Re: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value?	Rich Freeman <rich0@g.o>
[gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value?	Duncan <1i5t5.duncan@×××.net>

Report Message

Find on MARC Find on Google Groups