Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value?
Date: Fri, 28 Jun 2013 03:36:35
Message-Id: pan$dcf36$c3d7c9cc$7915dd10$d07d87ae@cox.net
In Reply to: Re: [gentoo-amd64] Re: Is my RAID performance bad possibly due to starting sector value? by Mark Knecht
1 Mark Knecht posted on Sat, 22 Jun 2013 18:48:15 -0700 as excerpted:
2
3 > Duncan,
4
5 Again, following up now that it's my "weekend" and I have a chance...
6
7 > Actually, using your idea of piping things to /dev/null it appears
8 > that the random number generator itself is only capable of 15MB/S on my
9 > machine. It doesn't change much based on block size of number of bytes
10 > I pipe.
11
12 =:^(
13
14 Well, you tried.
15
16 > If this speed is representative of how well that works then I think
17 > I have to use a file. It appears this guy gets similar values:
18 >
19 > http://www.globallinuxsecurity.pro/quickly-fill-a-disk-with-random-bits-
20 without-dev-urandom/
21
22 Wow, that's a very nice idea he has there! I'll have to remember that!
23 The same idea should work for creating any relatively large random file,
24 regardless of final use. Just crypt-setup the thing and dd /dev/zero
25 into it.
26
27 FWIW, you're doing better than my system does, however. I seem to run
28 about 13 MB/s from /dev/urandom (upto 13.7 depending on blocksize). And
29 back to the random vs urandom discussion, random totally blocked here
30 after a few dozen bytes, waiting for more random data to be generated.
31 So the fact that you actually got a usefully sized file out of it does
32 indicate that you must have hardware random and that it's apparently
33 working well.
34
35 > On the other hand, piping /dev/zero appears to be very fast -
36 > basically the speed of the processor I think:
37 >
38 > $ dd if=/dev/zero of=/dev/null bs=4096 count=$[1000]
39 > 1000+0 records in 1000+0 records out 4096000 bytes (4.1 MB) copied,
40 > 0.000622594 s, 6.6 GB/s
41
42 What's most interesting to me when I tried that here is that unlike
43 urandom, zero's output varies DRAMATICALLY by blocksize. With
44 bs=$((1024*1024)) (aka 1MB), I get 14.3 GB/s, tho at the default bs=512,
45 I get only 1.2 GB/s. (Trying a few more values, 1024*512 gives me very
46 similar 14.5 GB/s, 1024*64 is already down to 13.2 GB/s, 1024*128=13.9
47 and 1024*256=14.1, while on the high side 1024*1024*2 is already down to
48 10.2 GB/s. So quarter MB to one MB seems the ideal range, on my
49 hardware.)
50
51 But of course, if your device is compressible-data speed-sensitive, as
52 are say the sandforce-controller-based ssds, /dev/zero isn't going to
53 give you anything like the real-world benchmark random data would (tho it
54 should be a great best-case compressible-data test). Tho it's unlikely
55 to matter on most spinning rust, AFAIK, and SSDs like my Corsair Neutrons
56 (Link_A_Media/LAMD-based controller), which have as a bullet-point
57 feature that they're data compression agnostic, unlike the sandforce-
58 based SSDs.
59
60 Since /dev/zero is so fast, I'd probably do a few initial tests to
61 determine whether compressible data makes a difference on what you're
62 testing, then use /dev/zero if it doesn't appear to, to get a reasonable
63 base config, then finally double-check that against random data again.
64
65 Meanwhile, here's another idea for random data, seeing as /dev/urandom is
66 speed limited. Upto your memory constraints anyway, you should be able
67 to dd if=/dev/urandom of=/some/file/on/tmpfs . Then you can
68 dd if=/tmpfs/file, of=/dev/test/target, or if you want a bigger file than
69 a direct tmpfs file will let you use, try something like this:
70
71 cat /tmpfs/file /tmpfs/file /tmpfs/file | dd of=/dev/test/target
72
73 ... which would give you 3X the data size of /tmpfs/file.
74
75 (Man, testing that with a 10 GB tmpfs file (on a 12 GB tmpfs /tmp), I can
76 see see how slow that 13 MB/s /dev/urandom actually is as I'm creating
77 it! OUCH! I waited awhile before I started typing this comment... I've
78 been typing slowly and looking at the usage graph as I type, and I'm
79 still only at maybe 8 gigs, depending on where my cache usage was when I
80 started, right now!)
81
82 cd /tmp
83
84 dd if=/dev/urandom of=/tmp/10gig.testfile bs=$((1024*1024)) count=10240
85
86 (10240 records, 10737418240 bytes, but it says 11 GB copied, I guess dd
87 uses 10^3 multipliers, anyway, ~783 s, 13.7 MB/s)
88
89 ls -l 10gig.testfile
90
91 (confirm the size, 10737418240 bytes)
92
93 cat 10gig.testfile 10gig.testfile 10gig.testfile \
94 10gig.testfile 10gig.testfile | dd of=/dev/null
95
96 (that's 5x, yielding 50 GB power of 2, 104857600+0 records, 53687091200
97 bytes, ~140s, 385 MB/s at the default 512-byte blocksize)
98
99 Wow, what a difference block size makes there, too! Trying the above cat/
100 dd with bs=$((1024*1024)) (1MB) yields ~30s, 1.8 GB/s!
101
102 1GB block size (1024*1024*1024) yields about the same, 30s, 1.8 GB/s.
103
104 LOL dd didn't like my idea to try a 10 GB buffer size!
105
106 dd: memory exhausted by input buffer of size 10737418240 bytes (10 GiB)
107
108 (No wonder, as that'd be 10GB in tmpfs/cache and a 10GB buffer, and I'm
109 /only/ running 16 gigs RAM and no swap! But it won't take 2 GB either.
110 Checking, looks like as my normal user I'm running a ulimit of 1-gig
111 memory size, 2-gig virtual-size, so I'm sort of surprised it took the 1GB
112 buffer... maybe that counts against virtual only or something? )
113
114 Low side again, ~90s, 599 MB/s @ 1KB (1024 byte) bs, already a dramatic
115 improvement from the 140s 385 MB/s of the default 512-byte block.
116
117 2KB bs yields 52s, 1 GB/s
118
119 16KB bs yields 31s, 1.7 GB/s, near optimum already.
120
121 High side again, 1024*1024*4 (4MB) bs appears to be best-case, just under
122 29s, 1.9 GB/s. Going to 8MB takes another second, 1.8 GB/s again, which
123 is not a big surprise given that the memory page size is 4MB, so that's
124 an unsurprising peak performance point.
125
126 FWIW, cat seems to run just over 100% single-core saturation while dd
127 seems to run just under, @97% or so.
128
129 Running two instances in parallel (using the peak 4MB block size, 1.9 GB/
130 s with a single run) seems to cut performance some, but not nearly in
131 half. (I got 1.5 GB/s and 1.6 GB/s, but I started one then switched to a
132 different terminal to start the other, so they only overlapped by maybe
133 30s or so of the 35s on each.).
134
135 OK, so that's all memory/cpu since neither end is actual storage, but
136 that does give me a reasonable base against which to benchmark actual
137 storage (rust or ssd), if I wished.
138
139 What's interesting is that by, I guess pure coincidence, my 385 MB/s
140 original 512-byte blocksize figure is reasonably close to what the SSD
141 read benchmarks are with hddparm. IIRC the hdparm/ssd numbers were some
142 higher, but not so much so (470 MB/sec I just tested). But the bus speed
143 maxes out not /too/ far above that (500-600 MB/sec, theoretically 600 MB/
144 sec on SATA-600, but real world obviously won't /quite/ hit that, IIRC
145 best numbers I've seen anywhere are 585 or so).
146
147 So now I guess I send this and do some more testing of real device, now
148 that you've provoked my curiosity and I have the 50 GB (mostly)
149 pseudorandom file sitting in tmpfs already. Maybe I'll post those
150 results later.
151
152 --
153 Duncan - List replies preferred. No HTML msgs.
154 "Every nonfree program has a lord, a master --
155 and if you use the program, he is your master." Richard Stallman

Replies