Gentoo Archives: gentoo-user

From: Mark Knecht <markknecht@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] PacketShader - firewall using GPU
Date: Fri, 23 Sep 2011 15:18:18
Message-Id: CAK2H+ee=nBx8ZDU1AtfnoH0gg4Q2xNHyaR3-102A4+TZJmvViA@mail.gmail.com
In Reply to: Re: [gentoo-user] PacketShader - firewall using GPU by Michael Mol
1 On Fri, Sep 23, 2011 at 6:49 AM, Michael Mol <mikemol@×××××.com> wrote:
2 > On Fri, Sep 23, 2011 at 12:06 AM, Pandu Poluan <pandu@××××××.info> wrote:
3 >> Saw this on the pfSense list:
4 >>
5 >> http://shader.kaist.edu/packetshader/
6 >>
7 >> anyone interested in trying?
8 >
9 > I see a lot of graphs touting high throughput, but what about latency?
10 > That's the kind of stuff that gets in my way when I'm messing with
11 > things like VOIP.
12 >
13 > My first thought when I saw they were using a GPU for processing was
14 > concerns about latency:
15 > 1) RTT between a video card and the CPU will cause an increase in
16 > latency from doing processing on-CPU. Maybe DMA between the video card
17 > and NICs could help with this, but I don't know. Certainly newer CPUs
18 > with on-die GPUs will have an advantage here.
19 > 2) GPGPU coding favors batch processing over small streams. That's
20 > part of its nature, after all. That means that processed packets would
21 > come out of the GPU side of the engine in bursts.
22 >
23 > They also tout a huge preallocated packet buffer, and I'm not sure
24 > that's a good thing, either. It may or may not cause latency problems,
25 > depending on how they use it.
26 >
27 > They don't talk about latency at all, except for one sentence:
28 > "Forwarding table lookup is highly memory-intensive, and GPU can
29 > acclerate it with both latency hiding capability and bandwidth."
30 >
31 > --
32 > :wq
33
34 While I'm not a programmer at all I have been playing with some CUDA
35 programming this year. The couple of comments below are based around
36 that GPU framework and might differ for others.
37
38 1) I don't think the GPU latencies are much different than CPU
39 latencies. A lot of it can be done with DMA so that the CPU is hardly
40 involved once the pointers are set up. Of course it depends on the
41 system but the GPU is pretty close to the action so it should be quite
42 fast getting started.
43
44 2) The big deal with GPUs is that they really pay off when you need to
45 do a lot of the same calculations on different data in parallel. A
46 book I read + some online stuff suggested they didn't pay off speed
47 wise until you were doing at least 100 operations in parallel.
48
49 3) You do have to get the data into the GPU so for things that used
50 fixed data blocks, like shading graphical elements, that data can be
51 loaded once and reused over and over. That can be very fast. In my
52 case it's financial data getting evaluated 1000 ways so that's
53 effective. For data like a packet I don't know how many ways there are
54 to evaluate that so I cannot suggest what the value would be.
55
56 None the less it's an interesting idea and certainly offloads computer
57 cycles that might be better used for other things.
58
59 My NVidia 465GTX has 352 CUDA cores while the GS8200 has only 8 so
60 there can be a huge difference based on what GPU you have available.
61
62 Just some thoughts,
63 Mark

Replies

Subject Author
Re: [gentoo-user] PacketShader - firewall using GPU Michael Mol <mikemol@×××××.com>
[gentoo-user] Re: PacketShader - firewall using GPU James <wireless@×××××××××××.com>