Gentoo Archives: gentoo-alt

From: Ruud Koolen <redlizard@g.o>
To: gentoo-alt@l.g.o
Subject: [gentoo-alt] Bootstrap changes
Date: Sun, 14 Dec 2014 05:20:29
Message-Id: 201412140620.21503.redlizard@gentoo.org
1 Hi all,
2
3 I just pushed the last parts of the bootstrap changes I announced almost a
4 year ago [1]. Parts of these changes have been live for a few weeks. These
5 changes have been pretty well tested and have solved many bootstrapping
6 problems, and they make the bootstrap procedure more future proof. The main
7 practical consequence of this is that the bootstrap procedure no longer
8 depends on gcc 4.2 as a temporary component, which has resolved a lot of
9 minor problems and nasty bootstrap hacks. This mail serves as a summary of
10 what has changed, and why.
11
12 The old bootstrap procedure consists of four major steps:
13 - in stage1, tools that portage needs to do its job but which are not yet
14 present are compiled; examples are make, wget, and patch. These tools are
15 built by hand rather than by ebuild, and are installed in ${EPREFIX}/tmp.
16 They are compiled using the host compiler.
17 - in stage2, a version of portage is manually installed in ${EPREFIX}. This
18 version will later get forcibly overwritten by `emerge portage`, so to avoid
19 orphaned files it is essential this is exactly the same version as the
20 version installed later.
21 - in the beginning of stage3, a toolchain (gcc, binutils, bash, and relevant
22 dependencies) are built by portage in ${EPREFIX}. All of this is built using
23 the host compiler.
24 - in the remainder of the bootstrap, consisting of the second part of stage3
25 and beyond, the native toolchain from the beginning of stage3 is used to
26 build (and, as necessary, rebuild) a complete system. Somewhere along the
27 way, the ${EPREFIX}/tmp tools are removed.
28
29 A key property of this scheme is that the early stage3 tools are all built
30 using the host compiler. As long as all the components in early stage3
31 consist of only executables without any libraries, this is fine. Once you
32 want to use libraries in early stage3, however, things become problematic.
33 For stuff compiled with the host compiler need not be ABI-compatible with
34 stuff compiled with the prefix compiler; the host compiler might use a
35 different bit size (-m64 versus -m32), softfloat instead of hardfloat, an
36 entirely different non-gcc compiler that isn't binary compatible with gcc,
37 whatever. A consequence of this incompatibility is that you can't upgrade
38 those libraries or the executables that link against them. For instance, if
39 early stage3 would contain a bash linked against readline, as soon as you
40 would rebuild either of them later, bash would stop working because it can't
41 link to readline.
42
43 As a consequence of the above, the old bootstrap procedure uses only early
44 stage3 components that do not need any libraries. In particular, this means
45 using gcc 4.2, for all later versions need the gmp and mpfr libraries. It
46 also means that bash needs to be built without readline support, which causes
47 nasty problems all around.
48
49 The new bootstrap procedure remedies this shortcoming by using the magic of
50 cross-EPREFIX building to keep a clear separation between tools built using
51 the host compiler and tools built using the prefix compiler. This separation
52 is enforced by making sure that everything built by the host compiler only
53 ever ends up in ${EPREFIX}/tmp, whereas everything built by the prefix
54 compiler only ever ends up in ${EPREFIX}. Thus, libraries from the two
55 ABI-incompatible worlds never end up fighting with each other.
56
57 With this split, portage can build a compiler that is built by the host
58 compiler, and thus lives in ${EPREFIX}/tmp, but which creates binaries that
59 live in ${EPREFIX}. To accomplish this, the bootstrap procedure needs an
60 extra step between the installation of a temporary portage and stage3. I have
61 called this step stage2, and moved the installation of a temporary portage
62 into stage1. This results in the following procedure:
63 - in stage1, portage dependencies (make, wget, patch, etc) as well as portage
64 itself are hand-bootstrapped into ${EPREFIX}/tmp. They are all built using
65 the host compiler. In particular, portage is installed in ${EPREFIX}/tmp;
66 this portage will be used to emerge packages both in ${EPREFIX}/tmp and
67 ${EPREFIX}, which it can do nowadays.
68 - in stage2, a toolchain is built that lives in ${EPREFIX}/tmp but creates
69 binaries in ${EPREFIX}. This toolchain is built by portage using the host
70 compiler.
71 - in early stage3, a native toolchain is built in ${EPREFIX}; that is, this is
72 a toolchain that both lives in and targets ${EPREFIX}. It is built by
73 portage, using the compiler from stage2.
74 - in the remainder of the bootstrap, the native compiler is used to build a
75 portage in ${EPREFIX}, followed by the rest of the system. ${EPREFIX}/tmp is
76 removed as soon as native portage is installed, at which point no further
77 temporary tools are necessary.
78
79 As a side effect of the above, some minor other things have changed that
80 warrant explanation:
81 - There is no longer a need for the manually bootstrapped portage to match the
82 version in the bootstrap snapshot. This used to be necessary to avoid
83 littering the final prefix with portage files that are no longer present in
84 the portage-tree version; but because the temporary portage is installed in
85 ${EPREFIX}/tmp, it can litter as much as it likes.
86 - Most of the stage3 workarounds are no longer necessary, and for those that
87 remain I removed the kludges that relied on temporarily patching the portage
88 tree. What remains all works based on environment variables instead of
89 polluting the profiles. *Please* keep it that way, as this makes the
90 bootstrap process much more robust in case people need to sync the portage
91 tree halfway through for whatever reason.
92 - Due the the split-off of ${EPREFIX}/tmp, the parts of ${EPREFIX} that are
93 usually managed by portage are never hand-patched during the bootstrap
94 script. There are a few things that *are* hand-hacked such as the
95 ${EPREFIX}/bin/sh symlink, but those are allowed exactly because they are not
96 the purview of portage. The contents of ${EPREFIX}/tmp, on the other hand,
97 can be hacked as much as necessary.
98
99 I hope this clears things up.
100 -- Ruud
101
102
103 References:
104 1. http://article.gmane.org/gmane.linux.gentoo.alt/6867