1 |
On Fri, Jun 21, 2013 at 12:31 AM, Duncan <1i5t5.duncan@×××.net> wrote: |
2 |
> Mark Knecht posted on Thu, 20 Jun 2013 12:10:04 -0700 as excerpted: |
3 |
> |
4 |
>> Does anyone know of info on how the starting sector number might |
5 |
>> impact RAID performance under Gentoo? The drives are WD-500G RE3 drives |
6 |
>> shown here: |
7 |
>> |
8 |
>> http://www.amazon.com/Western-Digital-WD5002ABYS-3-5-inch-Enterprise/dp/ |
9 |
> B001EMZPD0/ref=cm_cr_pr_product_top |
10 |
>> |
11 |
>> These are NOT 4k sector sized drives. |
12 |
>> |
13 |
>> Specifically I'm a 5-drive RAID6 for about 1.45TB of storage. My |
14 |
>> benchmarking seems abysmal at around 40MB/S using dd copying large |
15 |
>> files. |
16 |
>> It's higher, around 80MB/S if the file being transferred is coming from |
17 |
>> an SSD, but even 80MB/S seems slow to me. I see a LOT of wait time in |
18 |
>> top. |
19 |
>> And my 'large file' copies might not be large enough as the machine has |
20 |
>> 24GB of DRAM and I've only been copying 21GB so it's possible some of |
21 |
>> that is cached. |
22 |
> |
23 |
> I /suspect/ that the problem isn't striping, tho that can be a factor, |
24 |
> but rather, your choice of raid6. Note that I personally ran md/raid-6 |
25 |
> here for awhile, so I know a bit of what I'm talking about. I didn't |
26 |
> realize the full implications of what I was setting up originally, or I'd |
27 |
> have not chosen raid6 in the first place, but live and learn as they say, |
28 |
> and that I did. |
29 |
> |
30 |
> General rule, raid6 is abysmal for writing and gets dramatically worse as |
31 |
> fragmentation sets in, tho reading is reasonable. The reason is that in |
32 |
> ordered to properly parity-check and write out less-than-full-stripe |
33 |
> writes, the system must effectively read-in the existing data and merge |
34 |
> it with the new data, then recalculate the parity, before writing the new |
35 |
> data AND 100% of the (two-way in raid-6) parity. Further, because raid |
36 |
> sits below the filesystem level, it knows nothing about what parts of the |
37 |
> filesystem are actually used, and must read and write the FULL data |
38 |
> stripe (perhaps minus the new data bit, I'm not sure), including parts |
39 |
> that will be empty on a freshly formatted filesystem. |
40 |
> |
41 |
> So with 4k block sizes on a 5-device raid6, you'd have 20k stripes, 12k |
42 |
> in data across three devices, and 8k of parity across the other two |
43 |
> devices. Now you go to write a 1k file, but in ordered to do so the full |
44 |
> 12k of existing data must be read in, even on an empty filesystem, |
45 |
> because the RAID doesn't know it's empty! Then the new data must be |
46 |
> merged in and new checksums created, then the full 20k must be written |
47 |
> back out, certainly the 8k of parity, but also likely the full 12k of |
48 |
> data even if most of it is simply rewrite, but almost certainly at least |
49 |
> the 4k strip on the device the new data is written to. |
50 |
> |
51 |
<SNIP> |
52 |
|
53 |
|
54 |
Hi Duncan, |
55 |
Wonderful post but much too long to carry on a conversation |
56 |
in-line. As you sound pretty sure of your understanding/history I'll |
57 |
assume you're right 100% of the time, but only maybe 80% of the post |
58 |
feels right to me at this time so let's assume I have much to learn |
59 |
and go from there. I expect that others here are in a similar |
60 |
situation to me - they use RAID but are laboring with little hard data |
61 |
on what different portions of the system are doing and how to optimize |
62 |
it. I certainly feel that's true in my case. I hope this thread over |
63 |
the near or far term future might help a bit for me and potentially |
64 |
others. |
65 |
|
66 |
In thinking about this issue this morning I think it's important to |
67 |
me to get down to basics and verify as much as possible, step-by-step, |
68 |
so that I don't layer good work on top of bad assumptions. To that |
69 |
end, and before I move too much farther forward, let me document a few |
70 |
things about my system and the hardware available to work with and see |
71 |
if you, Rich, Bob, Volker or anyone else wants to chime in about what |
72 |
is correct, not correct or a better way to use it. |
73 |
|
74 |
Basic Machine - ASUS Rampage II Extreme motherboard (4/1/2010) + 24GB |
75 |
DDR3 + Core i7-980x Extreme 12 core processor |
76 |
1 SDD - 120GB SATA3 on it's own controller |
77 |
5+ HDD - WD5002ABYS RAID Edition 3 SATA3 drives using Intel integrated |
78 |
controllers |
79 |
|
80 |
(NOTE: I can possibly go to a 6-drive RAID if I made some changes in |
81 |
the box but that's for later) |
82 |
|
83 |
According to the WD spec |
84 |
(http://www.wdc.com/en/library/spec/2879-701281.pdf) the 500GB drives |
85 |
sustain 113MB/S to the drive. Using hdparm I measure 107MB/S or higher |
86 |
for all 5 drives: |
87 |
|
88 |
c2RAID6 ~ # hdparm -tT /dev/sdb |
89 |
|
90 |
/dev/sdb: |
91 |
Timing cached reads: 17374 MB in 2.00 seconds = 8696.12 MB/sec |
92 |
Timing buffered disk reads: 322 MB in 3.00 seconds = 107.20 MB/sec |
93 |
c2RAID6 ~ # |
94 |
|
95 |
The SDD on it's own PCI Express controller clocks in at about 250MB/S for reads. |
96 |
|
97 |
c2RAID6 ~ # hdparm -tT /dev/sda |
98 |
|
99 |
/dev/sda: |
100 |
Timing cached reads: 17492 MB in 2.00 seconds = 8754.42 MB/sec |
101 |
Timing buffered disk reads: 760 MB in 3.00 seconds = 253.28 MB/sec |
102 |
c2RAID6 ~ # |
103 |
|
104 |
|
105 |
TESTING: I'm using dd to test. It gives an easy to read anyway result |
106 |
and seems to be used a lot. I can use bonnie++ or IOzone later but I |
107 |
don't think that's necessary quite yet. Being that I have 24GB and |
108 |
don't want cached data to effect the test speeds I do the following: |
109 |
|
110 |
1) Using dd I created a 50GB file for copying using the following commands: |
111 |
|
112 |
cd /mnt/fastVM |
113 |
dd if=/dev/random of=random1 bs=1000 count=0 seek=$[1000*1000*50] |
114 |
|
115 |
mark@c2RAID6 /VirtualMachines/bonnie $ ls -alh /mnt/fastVM/ran* |
116 |
-rw-r--r-- 1 mark mark 47G Jun 21 07:10 /mnt/fastVM/random1 |
117 |
mark@c2RAID6 /VirtualMachines/bonnie $ |
118 |
|
119 |
2) To ensure that nothing is cached and the copies are (hopefully) |
120 |
completely fair as root I do the following between each test: |
121 |
|
122 |
sync |
123 |
free -h |
124 |
echo 3 > /proc/sys/vm/drop_caches |
125 |
free -h |
126 |
|
127 |
An example: |
128 |
|
129 |
c2RAID6 ~ # sync |
130 |
c2RAID6 ~ # free -h |
131 |
total used free shared buffers cached |
132 |
Mem: 23G 23G 129M 0B 8.5M 21G |
133 |
-/+ buffers/cache: 1.6G 21G |
134 |
Swap: 12G 0B 12G |
135 |
c2RAID6 ~ # echo 3 > /proc/sys/vm/drop_caches |
136 |
c2RAID6 ~ # free -h |
137 |
total used free shared buffers cached |
138 |
Mem: 23G 2.6G 20G 0B 884K 1.3G |
139 |
-/+ buffers/cache: 1.3G 22G |
140 |
Swap: 12G 0B 12G |
141 |
c2RAID6 ~ # |
142 |
|
143 |
3) As a first test I copy using dd the 50GB file from the SDD to the |
144 |
RAID6. As long as reading the SDD is much faster than writing the |
145 |
RAID6 then it should be a test of primarily the RAID6 write speed: |
146 |
|
147 |
mark@c2RAID6 /VirtualMachines/bonnie $ dd if=/mnt/fastVM/random1 of=SDDCopy |
148 |
97656250+0 records in |
149 |
97656250+0 records out |
150 |
50000000000 bytes (50 GB) copied, 339.173 s, 147 MB/s |
151 |
mark@c2RAID6 /VirtualMachines/bonnie $ |
152 |
|
153 |
If I clear cache as above and rerun the test it's always 145-155MB/S |
154 |
|
155 |
4) As a second test I read from the RAID6 and write back to the RAID6. |
156 |
I see MUCH lower speeds, again repeatable: |
157 |
|
158 |
mark@c2RAID6 /VirtualMachines/bonnie $ dd if=SDDCopy of=HDDWrite |
159 |
97656250+0 records in |
160 |
97656250+0 records out |
161 |
50000000000 bytes (50 GB) copied, 1187.07 s, 42.1 MB/s |
162 |
mark@c2RAID6 /VirtualMachines/bonnie $ |
163 |
|
164 |
5) As a final test, and just looking for problems if any, I do an SDD |
165 |
to SDD copy which clocked in at close to 200MB/S |
166 |
|
167 |
mark@c2RAID6 /mnt/fastVM $ dd if=random1 of=SDDCopy |
168 |
97656250+0 records in |
169 |
97656250+0 records out |
170 |
50000000000 bytes (50 GB) copied, 251.105 s, 199 MB/s |
171 |
mark@c2RAID6 /mnt/fastVM $ |
172 |
|
173 |
So, being that this RAID6 was grown yesterday from something that |
174 |
has existed for a year or two I'm not sure of it's fragmentation, or |
175 |
even how to determine that at this time. However it seems my problem |
176 |
are RAID6 reads, not RAID6 writes, at least to new an probably never |
177 |
used disk space. |
178 |
|
179 |
I will also report more later but I can state that just using top |
180 |
there's never much CPU usage doing this but a LOT of WAIT time when |
181 |
reading the RAID6. It really appears the system is spinning it's |
182 |
wheels waiting for the RAID to get data from the disk. |
183 |
|
184 |
One place where I wanted to double check your thinking. My thought |
185 |
is that a RAID1 will _NEVER_ outperform the hdparm -tT read speeds as |
186 |
it has to read from three drives and make sure they are all good |
187 |
before returning data to the user. I don't see how that could ever be |
188 |
faster than what a single drive file system could do which for these |
189 |
drives would be the 113MB/S WD spec number, correct? As I'm currently |
190 |
getting 145MB/S it appears on the surface that the RAID6 is providing |
191 |
some value, at least in these early days of use. Maybe it will degrade |
192 |
over time though. |
193 |
|
194 |
Comments? |
195 |
|
196 |
Cheers, |
197 |
Mark |