Gentoo Archives: gentoo-user

From: Michael Mol <mikemol@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] hard drive encryption
Date: Tue, 13 Mar 2012 19:39:55
Message-Id: CA+czFiAN4wekKZoyq+tazViksqAuBp3aBMR3v=5Yj_uZUGZgsw@mail.gmail.com
In Reply to: Re: [gentoo-user] hard drive encryption by Stroller
1 On Tue, Mar 13, 2012 at 3:07 PM, Stroller
2 <stroller@××××××××××××××××××.uk> wrote:
3 >
4 > On 13 March 2012, at 18:18, Michael Mol wrote:
5 >> ...
6 >>> So I assume the i586
7 >>> version is better for you --- unless GCC suddenly got a lot better at
8 >>> optimizing code.
9 >>
10 >> Since when, exactly? GCC isn't the best compiler at optimization, but
11 >> I fully expect current versions to produce better code for x86-64 than
12 >> hand-tuned i586. Wider registers, more registers, crypto acceleration
13 >> instructions and SIMD instructions are all very nice to have. I don't
14 >> know the specifics of AES, though, or what kind of crypto algorithm it
15 >> is, so it's entirely possible that one can't effectively parallelize
16 >> it except in some relatively unique circumstances.
17 >
18 > Do you have much experience of writing assembler?
19 >
20 > I don't, and I'm not an expert on this, but I've read the odd blog article on this subject over the years.
21
22 Similar level of experience here. I can read it, even debug it from
23 time to time. A few regular bloggers on the subject are like candy.
24 And I used to have pagetable.org, Ars's Technopaedia and specsheets
25 for early x86 and motorola processors memorized. For the past couple
26 years, I've been focusing on reading blogs of language and compiler
27 authors, academics involved in proofing, testing and improving them,
28 etc.
29
30 >
31 > What I've read often has the programmer looking at the compiled gcc bytecode and examining what it does. The compiler might not care how many registers it uses, and thus a variable might find itself frequently swapped back into RAM; the programmer does not have any control over the compiler, and IIRC some flags reserve a register for degugging (IIRC -fomit-frame-pointer disables this). I think it's possible to use registers more efficiently by swapping them (??) or by using bitwise comparisons and other tricks.
32
33 Sure; it's cheaper to null out a register by XORing it with itself
34 than setting it to 0.
35
36 >
37 > Assembler optimisation is only used on sections of code that are at the core of a loop - that are called hundreds or thousands (even millions?) of times during the program's execution. It's not for code, such as reading the .config file or initialisation, which is only called once. Because the code in the core of the loop is called so often, you don't have to achieve much of an optimisation for the aggregate to be much more considerable.
38
39 Sure; optimize the hell out of the code where you spend most of your
40 time. I wasn't aware that gcc passed up on safe optimization
41 opportunities, though.
42
43 >
44 > The operations in question may only be constitute a few lines of C, or a handful of machine operations, so it boils down to an algorithm that a human programmer is capable of getting a grip on and comprehending. Whilst compilers are clearly more efficient for large programs, on this micro scale, humans are more clever and creative than machines.
45
46 I disagree. With defined semantics for the source and target, a
47 computer's cleverness is limited only by the computational and memory
48 expense of its search algorithms. Humans get through this by making
49 habit various optimizations, but those habits become less useful as
50 additional paths and instructions are added. As system complexity
51 increases, humans operate on personally cached techniques derived from
52 simpler systems. I would expect very, very few people to be intimately
53 familiar with the the majority of optimization possibilities present
54 on an amdfam10 processor or a core2. Compiler's aren't necessarily
55 familiar with them, either; they're just quicker at discovering them,
56 given knowledge of the individual instructions and the rules of
57 language semantics.
58
59 >
60 > Encryption / decryption is an example of code that lends itself to this kind of optimisation. In particular AES was designed, I believe, to be amenable to implementation in this way. The reason for that was that it was desirable to have it run on embedded devices and on dedicated chips. So it boils down to a simple bitswap operation (??) - the plaintext is modified by the encryption key, input and output as a fast stream. Each byte goes in, each byte goes out, the same function performed on each one.
61
62 I'd be willing to posit that you're right here, though if there isn't
63 a per-byte feedback mechanism, SIMD instructions would come into
64 serious play. But I expect there's a per-byte feedback mechanism, so
65 parallelization would likely come in the form of processing
66 simultaneous streams.
67
68 >
69 > Another operation that lends itself to assembler optimisation is video decoding - the video is encoded only once, and then may be played back hundreds or millions of times by different people. The same operations must be repeated a number of times on each frame, then c 25 - 60 frames are decoded per second, so at least 90,000 frames per hour. Again, the smallest optimisation is worthwhile.
70
71 Absolutely. My position, though, is that compilers are quicker and
72 more capable of discovering optimization possibilities than humans
73 are, when the target architecture changes. Sure, you've got several
74 dozen video codecs in, say, ffmpeg, and perhaps it all boils down to
75 less than a dozen very common cases of inner loop code. With
76 hand-tuned optimization, you'd need to fork your assembly patch for
77 each new processor feature that comes out, and then work to find the
78 most efficient way to execute code on that processor.
79
80 There's also cases where processor features get changed. I don't
81 remember the name of the instruction (it had something to do with
82 stack operations) in x86, but Intel switched it from a 0-cycle
83 instruction to something more expensive. Any code which assumed that
84 instruction was a 0-cycle instruction now became less efficient. A
85 compiler (presuming it has a knowledge of the target processor's
86 instruction set properties) would have an easier time coping with that
87 change than a human would.
88
89 I'm not saying humans are useless; this is just one of those areas
90 which is sufficiently complex-yet-deterministic that sufficient
91 knowledge of the source and target environments would give a computer
92 the edge over a human in finding the optimal sequence of CPU
93 instructions.
94
95 --
96 :wq

Replies

Subject Author
Re: [gentoo-user] hard drive encryption Florian Philipp <lists@×××××××××××.net>