Gentoo Archives: gentoo-user

From:	Florian Philipp <lists@×××××××××××.net>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] hard drive encryption
Date:	Tue, 13 Mar 2012 20:03:36
Message-Id:	`4F5FA7C0.5020909@binarywings.net`
In Reply to:	Re: [gentoo-user] hard drive encryption by Stroller

1	Am 13.03.2012 20:07, schrieb Stroller:
2	>
3	> On 13 March 2012, at 18:18, Michael Mol wrote:
4	>> ...
5	>>> So I assume the i586 version is better for you --- unless GCC
6	>>> suddenly got a lot better at optimizing code.
7	>>
8	>> Since when, exactly? GCC isn't the best compiler at optimization,
9	>> but I fully expect current versions to produce better code for
10	>> x86-64 than hand-tuned i586. Wider registers, more registers,
11	>> crypto acceleration instructions and SIMD instructions are all very
12	>> nice to have. I don't know the specifics of AES, though, or what
13	>> kind of crypto algorithm it is, so it's entirely possible that one
14	>> can't effectively parallelize it except in some relatively unique
15	>> circumstances.
16	>
17	> Do you have much experience of writing assembler?
18	>
19	> I don't, and I'm not an expert on this, but I've read the odd blog
20	> article on this subject over the years.
21	>
22	> What I've read often has the programmer looking at the compiled gcc
23	> bytecode and examining what it does. The compiler might not care how
24	> many registers it uses, and thus a variable might find itself
25	> frequently swapped back into RAM; the programmer does not have any
26	> control over the compiler, and IIRC some flags reserve a register for
27	> degugging (IIRC -fomit-frame-pointer disables this). I think it's
28	> possible to use registers more efficiently by swapping them (??) or
29	> by using bitwise comparisons and other tricks.
30	>
31
32	You recall correctly about the frame pointer.
33
34	Concerning the register usage: I'm no expert in this field, either, but
35	I think the main issue is not simply register allocation but branch and
36	exception prediction and so on. The compiler can either optimize for a
37	seamless continuation if the jump happens or if it doesn't. A human or a
38	just-in-time compiler can better handle these cases by predicting the
39	outcome of -- in the case of a JIT -- analyze the outcome of the first
40	few iterations.
41
42	OT: IIRC, register reuse is also the main performance problem of
43	state-of-the-art javascript engines, at the moment. Concerning the code
44	they compile at runtime, they are nearly as good as `gcc -O0` but they
45	have the same problem concerning registers (GCC with -O0 produces code
46	that works exactly as you describe above: Storing the result after every
47	computation and loading it again).
48
49	> Assembler optimisation is only used on sections of code that are at
50	> the core of a loop - that are called hundreds or thousands (even
51	> millions?) of times during the program's execution. It's not for
52	> code, such as reading the .config file or initialisation, which is
53	> only called once. Because the code in the core of the loop is called
54	> so often, you don't have to achieve much of an optimisation for the
55	> aggregate to be much more considerable.
56	>
57	> The operations in question may only be constitute a few lines of C,
58	> or a handful of machine operations, so it boils down to an algorithm
59	> that a human programmer is capable of getting a grip on and
60	> comprehending. Whilst compilers are clearly more efficient for large
61	> programs, on this micro scale, humans are more clever and creative
62	> than machines.
63	>
64	> Encryption / decryption is an example of code that lends itself to
65	> this kind of optimisation. In particular AES was designed, I believe,
66	> to be amenable to implementation in this way. The reason for that was
67	> that it was desirable to have it run on embedded devices and on
68	> dedicated chips. So it boils down to a simple bitswap operation (??)
69	> - the plaintext is modified by the encryption key, input and output
70	> as a fast stream. Each byte goes in, each byte goes out, the same
71	> function performed on each one.
72	>
73
74	Well, sort of. First of, you are right, AES was designed with hardware
75	implementations in mind.
76
77	The algorithm boils down to a number of substitution and permutation
78	networks and XOR operations (I assume that's what you meant with byte
79	swap). If you look at the portable C code
80	(/usr/src/linux/crypto/aes_generic.c), you can see that it mostly
81	consists of lookup tables and XORs.
82
83	The thing about "each byte goes in, each byte goes out", however, is a
84	bit wrong. What you think of is a stream cipher like RC4. AES is a block
85	cipher. These use an (in this case 128 bit long) input string and XOR it
86	with the encryption (sub-)key and shuffle it around according to the
87	exact algorithm.
88
89	> Another operation that lends itself to assembler optimisation is
90	> video decoding - the video is encoded only once, and then may be
91	> played back hundreds or millions of times by different people. The
92	> same operations must be repeated a number of times on each frame,
93	> then c 25 - 60 frames are decoded per second, so at least 90,000
94	> frames per hour. Again, the smallest optimisation is worthwhile.
95	>
96	> Stroller.
97	>
98	>

Attachments

File name	MIME type
signature.asc	application/pgp-signature

Report Message

Find on MARC Find on Google Groups