Gentoo Archives: gentoo-user

From: Florian Philipp <lists@×××××××××××.net>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] hard drive encryption
Date: Tue, 13 Mar 2012 20:16:33
Message-Id: 4F5FAAD0.2040300@binarywings.net
In Reply to: Re: [gentoo-user] hard drive encryption by Michael Mol
1 Am 13.03.2012 20:38, schrieb Michael Mol:
2 > On Tue, Mar 13, 2012 at 3:07 PM, Stroller
3 > <stroller@××××××××××××××××××.uk> wrote:
4 >>
5 >> On 13 March 2012, at 18:18, Michael Mol wrote:
6 >>> ...
7 >>>> So I assume the i586 version is better for you --- unless GCC
8 >>>> suddenly got a lot better at optimizing code.
9 >>>
10 >>> Since when, exactly? GCC isn't the best compiler at optimization,
11 >>> but I fully expect current versions to produce better code for
12 >>> x86-64 than hand-tuned i586. Wider registers, more registers,
13 >>> crypto acceleration instructions and SIMD instructions are all
14 >>> very nice to have. I don't know the specifics of AES, though, or
15 >>> what kind of crypto algorithm it is, so it's entirely possible
16 >>> that one can't effectively parallelize it except in some
17 >>> relatively unique circumstances.
18 >>
19 >> Do you have much experience of writing assembler?
20 >>
21 >> I don't, and I'm not an expert on this, but I've read the odd blog
22 >> article on this subject over the years.
23 >
24 > Similar level of experience here. I can read it, even debug it from
25 > time to time. A few regular bloggers on the subject are like candy.
26 > And I used to have pagetable.org, Ars's Technopaedia and specsheets
27 > for early x86 and motorola processors memorized. For the past couple
28 > years, I've been focusing on reading blogs of language and compiler
29 > authors, academics involved in proofing, testing and improving them,
30 > etc.
31 >
32 >>
33 >> What I've read often has the programmer looking at the compiled gcc
34 >> bytecode and examining what it does. The compiler might not care
35 >> how many registers it uses, and thus a variable might find itself
36 >> frequently swapped back into RAM; the programmer does not have any
37 >> control over the compiler, and IIRC some flags reserve a register
38 >> for degugging (IIRC -fomit-frame-pointer disables this). I think
39 >> it's possible to use registers more efficiently by swapping them
40 >> (??) or by using bitwise comparisons and other tricks.
41 >
42 > Sure; it's cheaper to null out a register by XORing it with itself
43 > than setting it to 0.
44 >
45 >>
46 >> Assembler optimisation is only used on sections of code that are at
47 >> the core of a loop - that are called hundreds or thousands (even
48 >> millions?) of times during the program's execution. It's not for
49 >> code, such as reading the .config file or initialisation, which is
50 >> only called once. Because the code in the core of the loop is
51 >> called so often, you don't have to achieve much of an optimisation
52 >> for the aggregate to be much more considerable.
53 >
54 > Sure; optimize the hell out of the code where you spend most of your
55 > time. I wasn't aware that gcc passed up on safe optimization
56 > opportunities, though.
57 >
58 >>
59 >> The operations in question may only be constitute a few lines of C,
60 >> or a handful of machine operations, so it boils down to an
61 >> algorithm that a human programmer is capable of getting a grip on
62 >> and comprehending. Whilst compilers are clearly more efficient for
63 >> large programs, on this micro scale, humans are more clever and
64 >> creative than machines.
65 >
66 > I disagree. With defined semantics for the source and target, a
67 > computer's cleverness is limited only by the computational and
68 > memory expense of its search algorithms. Humans get through this by
69 > making habit various optimizations, but those habits become less
70 > useful as additional paths and instructions are added. As system
71 > complexity increases, humans operate on personally cached techniques
72 > derived from simpler systems. I would expect very, very few people to
73 > be intimately familiar with the the majority of optimization
74 > possibilities present on an amdfam10 processor or a core2. Compiler's
75 > aren't necessarily familiar with them, either; they're just quicker
76 > at discovering them, given knowledge of the individual instructions
77 > and the rules of language semantics.
78 >
79 >>
80 >> Encryption / decryption is an example of code that lends itself to
81 >> this kind of optimisation. In particular AES was designed, I
82 >> believe, to be amenable to implementation in this way. The reason
83 >> for that was that it was desirable to have it run on embedded
84 >> devices and on dedicated chips. So it boils down to a simple
85 >> bitswap operation (??) - the plaintext is modified by the
86 >> encryption key, input and output as a fast stream. Each byte goes
87 >> in, each byte goes out, the same function performed on each one.
88 >
89 > I'd be willing to posit that you're right here, though if there
90 > isn't a per-byte feedback mechanism, SIMD instructions would come
91 > into serious play. But I expect there's a per-byte feedback
92 > mechanism, so parallelization would likely come in the form of
93 > processing simultaneous streams.
94 >
95 >>
96 >> Another operation that lends itself to assembler optimisation is
97 >> video decoding - the video is encoded only once, and then may be
98 >> played back hundreds or millions of times by different people. The
99 >> same operations must be repeated a number of times on each frame,
100 >> then c 25 - 60 frames are decoded per second, so at least 90,000
101 >> frames per hour. Again, the smallest optimisation is worthwhile.
102 >
103 > Absolutely. My position, though, is that compilers are quicker and
104 > more capable of discovering optimization possibilities than humans
105 > are, when the target architecture changes. Sure, you've got several
106 > dozen video codecs in, say, ffmpeg, and perhaps it all boils down to
107 > less than a dozen very common cases of inner loop code. With
108 > hand-tuned optimization, you'd need to fork your assembly patch for
109 > each new processor feature that comes out, and then work to find the
110 > most efficient way to execute code on that processor.
111 >
112 > There's also cases where processor features get changed. I don't
113 > remember the name of the instruction (it had something to do with
114 > stack operations) in x86, but Intel switched it from a 0-cycle
115 > instruction to something more expensive. Any code which assumed that
116 > instruction was a 0-cycle instruction now became less efficient. A
117 > compiler (presuming it has a knowledge of the target processor's
118 > instruction set properties) would have an easier time coping with
119 > that change than a human would.
120 >
121 > I'm not saying humans are useless; this is just one of those areas
122 > which is sufficiently complex-yet-deterministic that sufficient
123 > knowledge of the source and target environments would give a
124 > computer the edge over a human in finding the optimal sequence of
125 > CPU instructions.
126 >
127
128 This thread is becoming ridiculously long. Just as a last side-note:
129
130 One of the primary reasons that the IA64 architecture failed was that it
131 relied on the compiler to optimize the code in order to exploit the
132 massive instruction-level parallelism the CPU offered. Compilers never
133 became good enough for the job. Of course, that happended in the
134 nineties and we have much better compilers now (and x86 is easier to
135 handle for compilers). But on the other hand: That was Intel's next big
136 thing and if they couldn't make the compilers work, I have no reason to
137 believe in their efficiency now.
138
139 Regards,
140 Florian Philipp

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-user] hard drive encryption Florian Philipp <lists@×××××××××××.net>