Gentoo Archives: gentoo-dev

From: Rich Freeman <rich0@g.o>
To: "vivo75@×××××.com" <vivo75@×××××.com>
Cc: gentoo-dev <gentoo-dev@l.g.o>
Subject: Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds
Date: Tue, 07 Aug 2012 18:06:39
Message-Id: CAGfcS_kPb=nCtK-vbNKjKaqJNeEbRitZwJYj7x0nw0SimOrOoA@mail.gmail.com
In Reply to: Re: [gentoo-dev] Portage FEATURE suggestion - limited-visibility builds by "vivo75@gmail.com"
1 On Tue, Jul 31, 2012 at 7:57 PM, vivo75@×××××.com <vivo75@×××××.com> wrote:
2 > Il 31/07/2012 21:27, Michał Górny ha scritto:
3 >> I'd be more afraid about resources, and whether the kernel will be
4 >> actually able to handle bazillion bind mounts. And if, whether it won't
5 >> actually cause more overhead than copying the whole system to some kind
6 >> of tmpfs.
7 >
8 > If testing show that bind mounts are too heavy we could resort to LD_PRELOAD
9 > a library that filter the acces to the disk,
10 > or to rework sandbox to also hide w/o errors some files,
11 > with an appropriate database (sys-apps/mlocate come to mind) every access
12 > will have negligible additional cost compared to that of rotational disks.
13
14 So, while I suspect that bind mount overhead won't actually be that
15 bad, I'm also thinking that extending the role of sandbox as has
16 already been suggested might be the simpler solution (and it works on
17 other kernels as well). I'd still like a run-time solution some day,
18 but that would probably require SELinux and seems like a much more
19 ambitious project, and we'll probably get quite a bit of QA value out
20 of a sandbox solution.
21
22 I think the right solution is to not use external utilities unless
23 they can be linked in - at least not for anything running in sandbox.
24 We're talking about at VERY high volume of file opens most likely and
25 we can't be spawning processes every time that happens, let alone
26 running bash scripts or whatever.
27
28 So, here is my design concept (which had a little help from my LUG - PLUG):
29
30 1. At the start of the build, portage generates a list of files that
31 are legitimate dependencies - anything in DEPEND or @system. This can
32 be done by parsing the /var/pkg/db files (I assume portage has some
33 internal API for this already).
34
35 2. Portage or a helper program (whatever is fastest) calls stat on
36 each file to obtain the device and inode IDs. Maybe long-term we
37 might consider caching these (but I'm not sure how stable they are).
38
39 3. The list of valid device/inode IDs are passed to sandbox somehow
40 (maybe in a file). Sandbox creates a data structure in memory
41 containing them for rapid access (btree or such).
42
43 4. When sandbox intercepts a file open request, it checks the file
44 inode against the list and allows/denies accordingly.
45
46 That said, after doing a quick pass at the sandbox source it seems
47 like it already is designed to restrict read access, but it uses
48 canonical filenames to do so. I'm not sure if those are going to be
49 reliable, especially if a filesystem contains bind mounts. Since it
50 is already checking a read list if we thought that mechnism would be
51 robust and fast, we could just remove SANDBOX_READ="/" from
52 /etc/sandbox.d/00default and then load in whatever we want afterwards.
53 I need to spend more time groking the current source. I'd think that
54 using inode numbers as a key would be faster than determining a
55 canonical file name on every file access, but if sandbox is already
56 doing the latter then obviously it isn't that much overhead.
57
58 The other thing I'm not sure about here are symlinks. If a symlink is
59 contained in a dependency, but the linked file is not, can that file
60 be used by a package? I suppose the reverse is also a concern - if a
61 file is accessed through a symlink that isn't part of a dependency,
62 but the file it is pointing to is, is that a problem? I'm wondering
63 if there is any eselect logic that could cause problems here. When
64 calling stat we can choose whether to dereference symlinks.