1 |
On Fri, Sep 23, 2011 at 6:49 AM, Michael Mol <mikemol@×××××.com> wrote: |
2 |
> On Fri, Sep 23, 2011 at 12:06 AM, Pandu Poluan <pandu@××××××.info> wrote: |
3 |
>> Saw this on the pfSense list: |
4 |
>> |
5 |
>> http://shader.kaist.edu/packetshader/ |
6 |
>> |
7 |
>> anyone interested in trying? |
8 |
> |
9 |
> I see a lot of graphs touting high throughput, but what about latency? |
10 |
> That's the kind of stuff that gets in my way when I'm messing with |
11 |
> things like VOIP. |
12 |
> |
13 |
> My first thought when I saw they were using a GPU for processing was |
14 |
> concerns about latency: |
15 |
> 1) RTT between a video card and the CPU will cause an increase in |
16 |
> latency from doing processing on-CPU. Maybe DMA between the video card |
17 |
> and NICs could help with this, but I don't know. Certainly newer CPUs |
18 |
> with on-die GPUs will have an advantage here. |
19 |
> 2) GPGPU coding favors batch processing over small streams. That's |
20 |
> part of its nature, after all. That means that processed packets would |
21 |
> come out of the GPU side of the engine in bursts. |
22 |
> |
23 |
> They also tout a huge preallocated packet buffer, and I'm not sure |
24 |
> that's a good thing, either. It may or may not cause latency problems, |
25 |
> depending on how they use it. |
26 |
> |
27 |
> They don't talk about latency at all, except for one sentence: |
28 |
> "Forwarding table lookup is highly memory-intensive, and GPU can |
29 |
> acclerate it with both latency hiding capability and bandwidth." |
30 |
> |
31 |
> -- |
32 |
> :wq |
33 |
|
34 |
While I'm not a programmer at all I have been playing with some CUDA |
35 |
programming this year. The couple of comments below are based around |
36 |
that GPU framework and might differ for others. |
37 |
|
38 |
1) I don't think the GPU latencies are much different than CPU |
39 |
latencies. A lot of it can be done with DMA so that the CPU is hardly |
40 |
involved once the pointers are set up. Of course it depends on the |
41 |
system but the GPU is pretty close to the action so it should be quite |
42 |
fast getting started. |
43 |
|
44 |
2) The big deal with GPUs is that they really pay off when you need to |
45 |
do a lot of the same calculations on different data in parallel. A |
46 |
book I read + some online stuff suggested they didn't pay off speed |
47 |
wise until you were doing at least 100 operations in parallel. |
48 |
|
49 |
3) You do have to get the data into the GPU so for things that used |
50 |
fixed data blocks, like shading graphical elements, that data can be |
51 |
loaded once and reused over and over. That can be very fast. In my |
52 |
case it's financial data getting evaluated 1000 ways so that's |
53 |
effective. For data like a packet I don't know how many ways there are |
54 |
to evaluate that so I cannot suggest what the value would be. |
55 |
|
56 |
None the less it's an interesting idea and certainly offloads computer |
57 |
cycles that might be better used for other things. |
58 |
|
59 |
My NVidia 465GTX has 352 CUDA cores while the GS8200 has only 8 so |
60 |
there can be a huge difference based on what GPU you have available. |
61 |
|
62 |
Just some thoughts, |
63 |
Mark |