1 |
Hi all, |
2 |
|
3 |
I just pushed the last parts of the bootstrap changes I announced almost a |
4 |
year ago [1]. Parts of these changes have been live for a few weeks. These |
5 |
changes have been pretty well tested and have solved many bootstrapping |
6 |
problems, and they make the bootstrap procedure more future proof. The main |
7 |
practical consequence of this is that the bootstrap procedure no longer |
8 |
depends on gcc 4.2 as a temporary component, which has resolved a lot of |
9 |
minor problems and nasty bootstrap hacks. This mail serves as a summary of |
10 |
what has changed, and why. |
11 |
|
12 |
The old bootstrap procedure consists of four major steps: |
13 |
- in stage1, tools that portage needs to do its job but which are not yet |
14 |
present are compiled; examples are make, wget, and patch. These tools are |
15 |
built by hand rather than by ebuild, and are installed in ${EPREFIX}/tmp. |
16 |
They are compiled using the host compiler. |
17 |
- in stage2, a version of portage is manually installed in ${EPREFIX}. This |
18 |
version will later get forcibly overwritten by `emerge portage`, so to avoid |
19 |
orphaned files it is essential this is exactly the same version as the |
20 |
version installed later. |
21 |
- in the beginning of stage3, a toolchain (gcc, binutils, bash, and relevant |
22 |
dependencies) are built by portage in ${EPREFIX}. All of this is built using |
23 |
the host compiler. |
24 |
- in the remainder of the bootstrap, consisting of the second part of stage3 |
25 |
and beyond, the native toolchain from the beginning of stage3 is used to |
26 |
build (and, as necessary, rebuild) a complete system. Somewhere along the |
27 |
way, the ${EPREFIX}/tmp tools are removed. |
28 |
|
29 |
A key property of this scheme is that the early stage3 tools are all built |
30 |
using the host compiler. As long as all the components in early stage3 |
31 |
consist of only executables without any libraries, this is fine. Once you |
32 |
want to use libraries in early stage3, however, things become problematic. |
33 |
For stuff compiled with the host compiler need not be ABI-compatible with |
34 |
stuff compiled with the prefix compiler; the host compiler might use a |
35 |
different bit size (-m64 versus -m32), softfloat instead of hardfloat, an |
36 |
entirely different non-gcc compiler that isn't binary compatible with gcc, |
37 |
whatever. A consequence of this incompatibility is that you can't upgrade |
38 |
those libraries or the executables that link against them. For instance, if |
39 |
early stage3 would contain a bash linked against readline, as soon as you |
40 |
would rebuild either of them later, bash would stop working because it can't |
41 |
link to readline. |
42 |
|
43 |
As a consequence of the above, the old bootstrap procedure uses only early |
44 |
stage3 components that do not need any libraries. In particular, this means |
45 |
using gcc 4.2, for all later versions need the gmp and mpfr libraries. It |
46 |
also means that bash needs to be built without readline support, which causes |
47 |
nasty problems all around. |
48 |
|
49 |
The new bootstrap procedure remedies this shortcoming by using the magic of |
50 |
cross-EPREFIX building to keep a clear separation between tools built using |
51 |
the host compiler and tools built using the prefix compiler. This separation |
52 |
is enforced by making sure that everything built by the host compiler only |
53 |
ever ends up in ${EPREFIX}/tmp, whereas everything built by the prefix |
54 |
compiler only ever ends up in ${EPREFIX}. Thus, libraries from the two |
55 |
ABI-incompatible worlds never end up fighting with each other. |
56 |
|
57 |
With this split, portage can build a compiler that is built by the host |
58 |
compiler, and thus lives in ${EPREFIX}/tmp, but which creates binaries that |
59 |
live in ${EPREFIX}. To accomplish this, the bootstrap procedure needs an |
60 |
extra step between the installation of a temporary portage and stage3. I have |
61 |
called this step stage2, and moved the installation of a temporary portage |
62 |
into stage1. This results in the following procedure: |
63 |
- in stage1, portage dependencies (make, wget, patch, etc) as well as portage |
64 |
itself are hand-bootstrapped into ${EPREFIX}/tmp. They are all built using |
65 |
the host compiler. In particular, portage is installed in ${EPREFIX}/tmp; |
66 |
this portage will be used to emerge packages both in ${EPREFIX}/tmp and |
67 |
${EPREFIX}, which it can do nowadays. |
68 |
- in stage2, a toolchain is built that lives in ${EPREFIX}/tmp but creates |
69 |
binaries in ${EPREFIX}. This toolchain is built by portage using the host |
70 |
compiler. |
71 |
- in early stage3, a native toolchain is built in ${EPREFIX}; that is, this is |
72 |
a toolchain that both lives in and targets ${EPREFIX}. It is built by |
73 |
portage, using the compiler from stage2. |
74 |
- in the remainder of the bootstrap, the native compiler is used to build a |
75 |
portage in ${EPREFIX}, followed by the rest of the system. ${EPREFIX}/tmp is |
76 |
removed as soon as native portage is installed, at which point no further |
77 |
temporary tools are necessary. |
78 |
|
79 |
As a side effect of the above, some minor other things have changed that |
80 |
warrant explanation: |
81 |
- There is no longer a need for the manually bootstrapped portage to match the |
82 |
version in the bootstrap snapshot. This used to be necessary to avoid |
83 |
littering the final prefix with portage files that are no longer present in |
84 |
the portage-tree version; but because the temporary portage is installed in |
85 |
${EPREFIX}/tmp, it can litter as much as it likes. |
86 |
- Most of the stage3 workarounds are no longer necessary, and for those that |
87 |
remain I removed the kludges that relied on temporarily patching the portage |
88 |
tree. What remains all works based on environment variables instead of |
89 |
polluting the profiles. *Please* keep it that way, as this makes the |
90 |
bootstrap process much more robust in case people need to sync the portage |
91 |
tree halfway through for whatever reason. |
92 |
- Due the the split-off of ${EPREFIX}/tmp, the parts of ${EPREFIX} that are |
93 |
usually managed by portage are never hand-patched during the bootstrap |
94 |
script. There are a few things that *are* hand-hacked such as the |
95 |
${EPREFIX}/bin/sh symlink, but those are allowed exactly because they are not |
96 |
the purview of portage. The contents of ${EPREFIX}/tmp, on the other hand, |
97 |
can be hacked as much as necessary. |
98 |
|
99 |
I hope this clears things up. |
100 |
-- Ruud |
101 |
|
102 |
|
103 |
References: |
104 |
1. http://article.gmane.org/gmane.linux.gentoo.alt/6867 |