1 |
On Sunday 08 May 2005 23:37, pageexec@××××××××.hu wrote: |
2 |
> > Are there any numbers (benchmarks) about the performance penalty of |
3 |
> > pageexec and/or segmexec on intel x86 machines? |
4 |
> |
5 |
> i remember only the kernel compiles on P3, for SEGMEXEC the slowdown |
6 |
> was around 1-2%, for PAGEEXEC on 2.2/2.4 it was around 30-40% and |
7 |
> on 2.6 it was 2-3%. |
8 |
|
9 |
why is there such a difference between using PAGEEXEC on 2.4 and using it on |
10 |
2.6? |
11 |
|
12 |
> i recall spender benchmarking PAGEEXEC on an |
13 |
> athlon and 2.4 and it was something like 20-25%. on P4 PAGEEXEC is |
14 |
> very bad (maybe a 100x slowdown, i don't think anyone bothered to |
15 |
> benchmark it precisely ;-), you don't want to use it. |
16 |
> |
17 |
> > The idea that I have is that page-exec on x86 involves a page-fault |
18 |
> > for every (execute) access to a new page that will be treated by |
19 |
> > pax... and that is performance-wise .. bad.. |
20 |
> |
21 |
> not quite, you get an extra page fault for every data access to |
22 |
> a page that the DTLB doesn't yet have an entry for. the larger |
23 |
> the DTLB the smaller the number of these extra page faults (that's |
24 |
> why the athlon is better than any intel). |
25 |
> |
26 |
> > And that segmexec is a diferent approach that involves, mirroring the |
27 |
> > process address space on two segments with diferent "write" |
28 |
> > permissions, and compairing those two, to check if there was any |
29 |
> > overwrite of the code segment. |
30 |
> |
31 |
> nope, the difference is not in writability (executable pages are |
32 |
> non-writable, regardless where they are), it's about being present |
33 |
> or not in the 'upper' half of the address space (which happens to |
34 |
> be the code segment), hence being present there equals to being |
35 |
> executable, non-executable otherwise. |
36 |
> |
37 |
> > This would mean doubling the mem-usage, at least for the code-segment |
38 |
> > in segmexec mode. |
39 |
> |
40 |
> what's doubled is the virtual memory usage (or you can say that the |
41 |
> address space is halved), the underlying physical memory usage doesn't |
42 |
> change (that's the whole point of vma mirroring - it creates two virtual |
43 |
> mappings for the same physical page). |
44 |
> |
45 |
> > And in arches that suport no-exec pages (has sparc or amd64), what are |
46 |
> > the performance penalties? Anyone can give me some pointers? |
47 |
> |
48 |
> except for ppc there should be nothing measurable there (well, maybe |
49 |
> some contrived benchmark can show something on sparc/sparc64 because |
50 |
> the fast path of the TLB load handler is 2 instructions longer, but |
51 |
> i'd hardly call that 'penalty'). |
52 |
> |
53 |
> > stuff like: kernel compiles, mysql benches, or... any other benchmark |
54 |
> > is good for me.. just to "grasp" a idea.. |
55 |
> |
56 |
> maybe http://www.grsecurity.net./grsecurity-slide_files/frame.htm helps |
57 |
> although it's quite old and benchmarks only PAGEEXEC. |
58 |
|
59 |
-- |
60 |
|
61 |
Pedro João Lopes Venda |
62 |
email: pjvenda < at > arrakis.dhis.org |
63 |
http://arrakis.dhis.org |