Gentoo Archives: gentoo-commits

From: Sven Vermeulen <sven.vermeulen@××××××.be>
To: gentoo-commits@l.g.o
Subject: [gentoo-commits] proj/hardened-docs:master commit in: xml/integrity/
Date: Mon, 30 Jul 2012 19:24:35
Message-Id: 1343676169.380cd1dcbd2b712ca5a850f77cb7aedbe83818d9.SwifT@gentoo
1 commit: 380cd1dcbd2b712ca5a850f77cb7aedbe83818d9
2 Author: Sven Vermeulen <sven.vermeulen <AT> siphos <DOT> be>
3 AuthorDate: Mon Jul 30 19:22:49 2012 +0000
4 Commit: Sven Vermeulen <sven.vermeulen <AT> siphos <DOT> be>
5 CommitDate: Mon Jul 30 19:22:49 2012 +0000
6 URL: http://git.overlays.gentoo.org/gitweb/?p=proj/hardened-docs.git;a=commit;h=380cd1dc
7
8 Adding concepts guide for integrity subproject
9
10 ---
11 xml/integrity/concepts.xml | 524 ++++++++++++++++++++++++++++++++++++++++++++
12 1 files changed, 524 insertions(+), 0 deletions(-)
13
14 diff --git a/xml/integrity/concepts.xml b/xml/integrity/concepts.xml
15 new file mode 100644
16 index 0000000..c8859f9
17 --- /dev/null
18 +++ b/xml/integrity/concepts.xml
19 @@ -0,0 +1,524 @@
20 +<?xml version='1.0' encoding='UTF-8'?>
21 +<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
22 +<!-- $Header$ -->
23 +
24 +<guide lang="en">
25 +<title>Integrity - Introduction and Concepts</title>
26 +
27 +<author title="Author">
28 + <mail link="swift"/>
29 +</author>
30 +
31 +<abstract>
32 +Integrity validation is a wide field in which many technologies play a role.
33 +This guide aims to offer a high-level view on what integrity validation is all
34 +about and how the various technologies work together to achieve a (hopefully)
35 +more secure environment to work in.
36 +</abstract>
37 +
38 +<!-- The content of this document is licensed under the CC-BY-SA license -->
39 +<!-- See http://creativecommons.org/licenses/by-sa/3.0 -->
40 +<license version="3.0" />
41 +
42 +<version>1</version>
43 +<date>2012-07-30</date>
44 +
45 +<chapter>
46 +<title>It is about trust</title>
47 +<section>
48 +<title>Introduction</title>
49 +<body>
50 +
51 +<p>
52 +Integrity is about trusting components within your environment, and in our case
53 +the workstations, servers and machines you work on. You definitely want to be
54 +certain that the workstation you type your credentials on to log on to the
55 +infrastructure is not compromised in any way. This "trust" in your environment
56 +is a combination of various factors: physical security, system security patching
57 +process, secure configuration, access controls and more.
58 +</p>
59 +
60 +<p>
61 +Integrity plays a role in this security field: it tries to ensure that the
62 +systems have not been tampered with by malicious people or organizations. And
63 +this tamperproof-ness extends to a wide range of components that need to be
64 +validated. You probably want to be certain that the binaries that are ran (and
65 +libraries that are loaded) are those you built yourself (in case of Gentoo) or
66 +were provided to you by someone (or something) you trust. And that the Linux
67 +kernel you booted (and the modules that are loaded) are those you made, and not
68 +someone else.
69 +</p>
70 +
71 +<p>
72 +Most people trust themselves and look at integrity as if it needs to prove that
73 +things are still as you've built them. But to support this claim, the systems you
74 +use to ensure integrity need to be trusted too: you want to make sure that
75 +whatever system is in place to offer you the final yes/no on the integrity only
76 +uses trusted information (did it really validate the binary) and services (is it
77 +not running on a compromised system). To support these claims, many ideas,
78 +technologies, processes and algorithms have passed the review.
79 +</p>
80 +
81 +<p>
82 +In this document, we will talk about a few of those, and how they play in the
83 +Gentoo Hardened Integrity subprojects' vision and roadmap.
84 +</p>
85 +
86 +</body>
87 +</section>
88 +</chapter>
89 +
90 +<chapter>
91 +<title>Hash results</title>
92 +<section>
93 +<title>Algorithmically validating a file's content</title>
94 +<body>
95 +
96 +<p>
97 +Hashes are a primary method for validating if a file (or other resource) has
98 +not been changed since it was first inspected. A hash is the result of a
99 +mathematical calculation on the content of a file (most often a number or
100 +ordered set of numbers), and exhibits the following properties:
101 +</p>
102 +
103 +<ul>
104 + <li>
105 + The resulting number is represented in a <e>small (often fixed-size) length</e>.
106 + This is necessary to allow fast verification if two hash values are the same
107 + or not, but also to allow storing the value in a secure location (which is,
108 + more than often, much more restricted in space).
109 + </li>
110 + <li>
111 + The hash function always <e>returns the same hash</e> (output) when the file it
112 + inspects has not been changed (input). Otherwise it'll be impossible to
113 + ensure that the file content hasn't changed.
114 + </li>
115 + <li>
116 + The hash function is fast to run (the calculation of a hash result does not
117 + take up too much time or even resources). Without this property, it would
118 + take too long to generate and even validate hash results, leading to users
119 + being malcontent (and more likely to disable the validation alltogether).
120 + </li>
121 + <li>
122 + The hash result <e>cannot be used to reconstruct</e> the file. Although this is
123 + often seen as a result of the first property (small length), it is important
124 + because hash results are often also seen as a "public validation" of data
125 + that is otherwise private in nature. In other words, many processes relie on
126 + the inability of users (or hackers) to reverse-engineer information based on
127 + its hash result. A good example are passwords and password databases, which
128 + <e>should</e> store hashes of the passwords, not the passwords themselves.
129 + </li>
130 + <li>
131 + Given a hash result, it is near impossible to find another file with the
132 + same hash result (or to create such a file yourself). Since the hash result
133 + is limited in space, there are many inputs that will map onto the same
134 + hash result. The power of a good hash function is that it is not feasible to
135 + find them (or calculate them) except by brute force. When such a match is
136 + found, it is called a <e>collision</e>.
137 + </li>
138 +</ul>
139 +
140 +<p>
141 +Compared with checksums, hashes try to be more cryptographically secure (and as
142 +such more effort is made in the last property to make sure collisions are very
143 +hard to obtain). Some even try to generate hash results in a way that the
144 +duration to calculate hashes cannot be used to obtain information from the data
145 +(such as if it contains more 0s than 1s, etc.)
146 +</p>
147 +
148 +</body>
149 +</section>
150 +<section>
151 +<title>Hashes in integrity validation</title>
152 +<body>
153 +
154 +<p>
155 +Integrity validation services are often based on hash generation and validation.
156 +Tools such as <uri link="http://www.tripwire.org/">tripwire</uri> or <uri
157 +link="http://aide.sourceforge.net/">AIDE</uri> generate hashes of files and
158 +directories on your systems and then ask you to store them safely. When you want
159 +the integrity of your systems checked, you provide this information to the
160 +program (most likely in a read-only manner since you don't want this list to
161 +be modified while validating) which then recalculates the hashes of the files
162 +and compares them with the given list. Any changes in files are detected and can
163 +be reported to you (or the administrator).
164 +</p>
165 +
166 +<p>
167 +A popular hash functions is SHA-1 (which you can generate and validate using the
168 +<c>sha1sum</c> command) which gained momentum after MD5 (using <c>md5sum</c>)
169 +was found to be less secure (nowadays collisions in MD5 are easy to generate).
170 +SHA-2 also exists (but is less popular than SHA-1) and can be played with using
171 +the commands <c>sha224sum</c>, <c>sha256sum</c>, <c>sha384sum</c> and
172 +<c>sha512sum</c>.
173 +</p>
174 +
175 +<pre caption="Generating the SHA-1 sum of a file">
176 +~$ <i>sha1sum ~/Downloads/pastie-4301043.rb</i>
177 +6b9b4e0946044ec752992c2afffa7be103c2e748 /home/swift/Downloads/pastie-4301043.rb
178 +</pre>
179 +
180 +</body>
181 +</section>
182 +<section>
183 +<title>Hashes are a means, not a solution</title>
184 +<body>
185 +
186 +<p>
187 +Hashes, in the field of integrity validation, are a means to compare data and
188 +integrity in a relatively fast way. However, by itself hashes cannot be used to
189 +provide integrity assurance towards the administrator. Take the use of
190 +<c>sha1sum</c> by itself for instance.
191 +</p>
192 +
193 +<p>
194 +You are not guaranteed that the <c>sha1sum</c> application behaves correctly
195 +(and as such has or hasn't been tampered with). You can't use <c>sha1sum</c>
196 +against itself since malicious modifications of the command can easily just
197 +return (print out) the expected SHA-1 sum rather than the real one. A way to
198 +thwart this is to provide the binary together with the hash values on read-only
199 +media.
200 +</p>
201 +
202 +<p>
203 +But then you're still not certain that it is that application that is executed:
204 +a modified system might have you think it is executing that application, but
205 +instead is using a different application. To provide this level of trust, you
206 +need to get insurance from a higher-positioned, trusted service that the right
207 +application is being ran. Running with a trusted kernel helps here (but might
208 +not provide 100% closure on it) but you most likely need assistance from the
209 +hardware (we will talk about the Trusted Platform Module later).
210 +</p>
211 +
212 +<p>
213 +Likewise, you are not guaranteed that it is still your file with hash results
214 +that is being used to verify the integrity of a file. Another file (with
215 +modified content) may be bind-mounted on top of it. To support integrity
216 +validation with a trusted information source, some solutions use HMAC digests
217 +instead of plain hashes.
218 +</p>
219 +
220 +<p>
221 +Finally, checksums should not only be taken on file level, but also its
222 +attributes (which are often used to provide access controls or even toggle
223 +particular security measures on/off on a file, such as is the case with PaX
224 +markings), directories (holding information about directory updates such
225 +as file adds or removals) and privileges. These are things that a program like
226 +<c>sha1sum</c> doesn't offer (but tools like AIDE do).
227 +</p>
228 +
229 +</body>
230 +</section>
231 +</chapter>
232 +
233 +<chapter>
234 +<title>Hash-based Message Authentication Codes</title>
235 +<section>
236 +<title>Trusting the hash result</title>
237 +<body>
238 +
239 +<p>
240 +In order to trust a hash result, some solutions use HMAC digests instead. An
241 +HMAC digest combines a regular hash function (and its properties) with a
242 +a secret cryptographic key. As such, the function generates the hash of the
243 +content of a file together with the secret cryptographic key. This not only
244 +provides integrity validation of the file, but also a signature telling the
245 +verification tool that the hash was made by a trusted application (one that
246 +knows the cryptographic key) in the past and has not been tampered with.
247 +</p>
248 +
249 +<p>
250 +By using HMAC digests, malicious users will find it more difficult to modify
251 +code and then present a "fake" hash results file since the user cannot reproduce
252 +the secret cryptographic key that needs to be added to generate this new hash
253 +result. When you see terms like <e>HMAC-SHA1</e> it means that a SHA-1 hash
254 +result is used together with a cryptographic key.
255 +</p>
256 +
257 +</body>
258 +</section>
259 +<section>
260 +<title>Managing the keys</title>
261 +<body>
262 +
263 +<p>
264 +Using keys to "protect" the hash results introduces another level of complexity:
265 +how do you properly, securely store the keys and access them only when needed?
266 +You cannot just embed the key in the hash list (since a tampered system might
267 +read it out when you are verifying the system, generate its own results file and
268 +have you check against that instead). Likewise you can't just embed the key in
269 +the application itself, because a tampered system might just read out the
270 +application binary to find the key (and once compromised, you might need to
271 +rebuild the application completely with a new key).
272 +</p>
273 +
274 +<p>
275 +You might be tempted to just provide the key as a command-line argument, but
276 +then again you are not certain that a malicious user is idling on your system,
277 +waiting to capture this valuable information from the output of <c>ps</c>, etc.
278 +</p>
279 +
280 +<p>
281 +Again rises the need to trust a higher-level component. When you trust the
282 +kernel, you might be able to use the kernel key ring for this.
283 +</p>
284 +
285 +</body>
286 +</section>
287 +</chapter>
288 +
289 +<chapter>
290 +<title>Using private/public key cryptography</title>
291 +<section>
292 +<title>Validating integrity using public keys</title>
293 +<body>
294 +
295 +<p>
296 +One way to work around the vulnerability of having the malicious user getting
297 +hold of the secret key is to not rely on the key for the authentication of the
298 +hash result in the first place when verifying the integrity of the system. This
299 +can be accomplised if you, instead of using just an HMAC, you also encrypt HMAC
300 +digest with a private key.
301 +</p>
302 +
303 +<p>
304 +During validation of the hashes, you decrypt the HMAC with the public key (not
305 +the private key) and use this to generate the HMAC digests again to validate.
306 +</p>
307 +
308 +<p>
309 +In this approach, an attacker cannot forge a fake HMAC since forgery requires
310 +access to the private key, and the private key is never used on the system to
311 +validate signatures. And as long as no collisions occur, he also cannot reuse
312 +the encrypted HMAC values (which you could consider to be a replay attack).
313 +</p>
314 +
315 +</body>
316 +</section>
317 +<section>
318 +<title>Ensuring the key integrity</title>
319 +<body>
320 +
321 +<p>
322 +Of course, this still requires that the public key is not modifyable by a
323 +tampered system: a fake list of hash results can be made using a different
324 +private key, and the moment the tool wants to decrypt the encrypted values, the
325 +tampered system replaces the public key with its own public key, and the system
326 +is again vulnerable.
327 +</p>
328 +
329 +</body>
330 +</section>
331 +</chapter>
332 +
333 +<chapter>
334 +<title>Trust chain</title>
335 +<section>
336 +<title>Handing over trust</title>
337 +<body>
338 +
339 +<p>
340 +As you've noticed from the methods and services above, you always need to have
341 +something you trust and that you can build on. If you trust nothing, you can't
342 +validate anything since nothing can be trusted to return a valid response. And
343 +to trust something means you also want to have confidence that that system
344 +itself uses trusted resources.
345 +</p>
346 +
347 +<p>
348 +For many users, the hardware level is something they trust. After all, as long
349 +as no burglar has come in the house and tampered with the hardware itself, it is
350 +reasonable to expect that the hardware is still the same. In effect, the users
351 +trust that the physical protection of their house is sufficient for them.
352 +</p>
353 +
354 +<p>
355 +For companies, the physical protection of the working environment is not
356 +sufficient for ultimate trust. They want to make sure that the hardware is not
357 +tampered with (or different hardware is suddenly used), specifically when that
358 +company uses laptops instead of (less portable) workstations.
359 +</p>
360 +
361 +<p>
362 +The more you don't trust, the more things you need to take care of in order to
363 +be confident that the system is not tampered with. In the Gentoo Hardened
364 +Integrity subproject we will use the following "order" of resources:
365 +</p>
366 +
367 +<ul>
368 + <li>
369 + <e>System root-owned files and root-running processes</e>. In most cases
370 + and most households, properly configured and protected systems will trust
371 + root-owned files and processes. Any request for integrity validation of
372 + the system is usually applied against user-provided files (no-one tampered
373 + with the user account or specific user files) and not against the system
374 + itself.
375 + </li>
376 + <li>
377 + <e>Operating system kernel</e> (in our case the Linux kernel). Although some
378 + precautions need to be taken, a properly configured and protected kernel can
379 + provide a higher trust level. Integrity validation on kernel level can offer
380 + a higher trust in the systems' integrity, although you must be aware that
381 + most kernels still reside on the system itself.
382 + </li>
383 + <li>
384 + <e>Live environments</e>. A bootable (preferably) read-only medium can be
385 + used to boot up a validation environment that scans and verifies the
386 + integrity of the system-under-investigation. In this case, even tampered
387 + kernel boot images can be detected, and by taking proper precautions when
388 + running the validation (such as ensuring no network access is enabled from
389 + the boot up until the final compliance check has occurred) you can make
390 + yourself confident of the state of the entire system.
391 + </li>
392 + <li>
393 + <e>Hypervisor level</e>. Hypervisors are by many organizations seen as
394 + trusted resources (the isolation of a virtual environment is hard to break
395 + out of). Integrity validation on the hypervisor level can therefor provide
396 + confidence, especially when "chaining trusts": the hypervisor first
397 + validates the kernel to boot, and then boots this (now trusted) kernel which
398 + loads up the rest of the system.
399 + </li>
400 + <li>
401 + <e>Hardware level</e>. Whereas hypervisors are still "just software", you
402 + can lift up trust up to the hardware level and use the hardware-offered
403 + integrity features to provide you with confidence that the system you are
404 + about to boot has not been tampered with.
405 + </li>
406 +</ul>
407 +
408 +<p>
409 +In the Gentoo Hardened Integrity subproject, we aim to eventually support all
410 +these levels (and perhaps more) to provide you as a user the tools and methods
411 +you need to validate the integrity of your system, up to the point that you
412 +trust. The less you trust, the more complex a trust chain might become to
413 +validate (and manage), but we will not limit our research and support to a
414 +single technology (or chain of technologies).
415 +</p>
416 +
417 +<p>
418 +Chaining trust is an important aspect to keep things from becoming too complex
419 +and unmanageable. It also allows users to just "drop in" at the level of trust
420 +they feel is sufficient, rather than requiring technologies for higher levels.
421 +</p>
422 +
423 +<p>
424 +For instance:
425 +</p>
426 +
427 +<ul>
428 + <li>
429 + A hardware component that you trust (like a <e>Trusted Platform Module</e>
430 + or a specific BIOS-supported functionality) verifies the integrity of the
431 + boot regions on your disk. When ok, it passes control over to the
432 + bootloader.
433 + </li>
434 + <li>
435 + The bootloader now validates the integrity of its configuration and of the
436 + files (kernel and initramfs) it is told to boot up. If it checks out, it
437 + boots the kernel and hands over control to this kernel.
438 + </li>
439 + <li>
440 + The kernel, together with the initial ram file system, verifies the
441 + integrity of the system components (and for instance SELinux policy) before
442 + the initial ram system changes to the real system and boots up the
443 + (verified) init system.
444 + </li>
445 + <li>
446 + The (root-running) init system validates the integrity of the services it
447 + wants to start before handing over control of the system to the user.
448 + </li>
449 +</ul>
450 +
451 +<p>
452 +An even longer chain can be seen with hypervisors:
453 +</p>
454 +
455 +<ul>
456 + <li>
457 + Hardware validates boot loader
458 + </li>
459 + <li>
460 + Boot loader validates hypervisor kernel and system
461 + </li>
462 + <li>
463 + Hypervisor validates kernel(s) of the images (or the entire images)
464 + </li>
465 + <li>
466 + Hypervisor-managed virtual environment starts the image
467 + </li>
468 + <li>
469 + ...
470 + </li>
471 +</ul>
472 +
473 +</body>
474 +</section>
475 +<section>
476 +<title>Integrity on serviced platforms</title>
477 +<body>
478 +
479 +<p>
480 +Sometimes you cannot trust higher positioned components, but still want to be
481 +assured that your service is not tampered with. An example would be when you are
482 +hosting a system in a remote, non-accessible data center or when you manage an
483 +image hosted by a virtualized hosting provider (I don't want to say "cloud"
484 +here, but it fits).
485 +</p>
486 +
487 +<p>
488 +In these cases, you want a level of assurance that your own image has not been
489 +tampered with while being offline (you can imagine manipulating the guest image,
490 +injecting trojans or other backdoors, and then booting the image) or even while
491 +running the system. Instead of trusting the higher components, you try to deal
492 +with a level of distrust that you want to manage.
493 +</p>
494 +
495 +<p>
496 +Providing you with some confidence at this level too is our goal within the
497 +Gentoo Hardened Integrity subproject.
498 +</p>
499 +
500 +</body>
501 +</section>
502 +<section>
503 +<title>From measurement to protection</title>
504 +<body>
505 +
506 +<p>
507 +When dealing with integrity (and trust chains), the idea behind the top-down
508 +trust chain is that higher level components first measure the integrity of the
509 +next component, validate (and take appropriate action) and then hand over
510 +control to this component. This is what we call <e>protection</e> or
511 +<e>integrity enforcement</e> of resources.
512 +</p>
513 +
514 +<p>
515 +If the system cannot validate the integrity, or the system is too volatile to
516 +enforce this integrity from a higher level, it is necessary to provide a trusted
517 +method for other services to validate the integrity. In this case, the system
518 +<e>attests</e> the state of the underlying component(s) towards a third party
519 +service, which <e>appraises</e> this state against a known "good" value.
520 +</p>
521 +
522 +<p>
523 +In the case of our HMAC-based checks, there is no enforcement of integrity of
524 +the files, but the tool itself attests the state of the resources by generating
525 +new HMAC digests and validating (appraising) it against the list of HMAC digests
526 +it took before.
527 +</p>
528 +
529 +</body>
530 +</section>
531 +</chapter>
532 +
533 +<chapter>
534 +<title>An implementation: the Trusted Computing Group functionality</title>
535 +<section>
536 +<title>Trusted Platform Module</title>
537 +<body>
538 +
539 +</body>
540 +</section>
541 +</chapter>
542 +
543 +</guide>