Gentoo Archives: gentoo-portage-dev

From: Sid Spry <sid@××××.us>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] Speeding up Tree Verification
Date: Tue, 30 Jun 2020 21:51:31
Message-Id: c181c185-1c59-49c4-9e5f-ba0b389b34a0@www.fastmail.com
In Reply to: Re: [gentoo-portage-dev] Speeding up Tree Verification by "Michał Górny"
1 On Tue, Jun 30, 2020, at 2:29 PM, Michał Górny wrote:
2 > On Tue, 2020-06-30 at 12:50 -0500, Sid Spry wrote:
3 > > On Tue, Jun 30, 2020, at 2:28 AM, Michał Górny wrote:
4 > > > Dnia June 30, 2020 2:13:43 AM UTC, Sid Spry <sid@××××.us> napisał(a):
5 > > > > Hello,
6 > > > >
7 > > > > I have some runnable pseudocode outlining a faster tree verification
8 > > > > algorithm.
9 > > > > Before I create patches I'd like to see if there is any guidance on
10 > > > > making the
11 > > > > changes as unobtrusive as possible. If the radical change in algorithm
12 > > > > is
13 > > > > acceptable I can work on adding the changes.
14 > > > >
15 > > > > Instead of composing any kind of structured data out of the portage
16 > > > > tree my
17 > > > > algorithm just lists all files and then optionally batches them out to
18 > > > > threads.
19 > > > > There is a noticeable speedup by eliding the tree traversal operations
20 > > > > which
21 > > > > can be seen when running the algorithm with a single thread and
22 > > > > comparing it to
23 > > > > the current algorithm in gemato (which should still be discussed
24 > > > > here?).
25 > > >
26 > > > Without reading the code: does your algorithm correctly detect extraneous files?
27 > > >
28 > >
29 > > Yes and no.
30 > >
31 > > I am not sure why this is necessary. If the file does not appear in a manifest it is
32 > > ignored. It makes the most sense to me to put the burden of not including
33 > > untracked files on the publisher. If the user puts an untracked file into the tree it
34 > > will be ignored to no consequence; the authored files don't refer to it, after all.
35 >
36 > This is necessary because a malicious third party can MITM you an rsync
37 > tree with extraneous files (say, -r1 baselayout ebuild) that do horrible
38 > things on your system. If you don't reject files not in Manifest, you
39 > open a huge security hole.
40 >
41
42 Ok, I will refer to https://www.gentoo.org/glep/glep-0074.html and implement the
43 checks in detail, but will still need to spend some time looking for the best place
44 to insert the code.
45
46 I think it best to address this from two fronts. On one hand rejecting extra files
47 seems to have immediate benefit but the larger issue is portage exposing
48 untracked potentially malicious files to the user.
49
50 Has anything like a verity loopback filesystem been explored? It might reduce
51 duplication of work.
52
53 > > But it would be easy enough to build a second list of all files and compare it to
54 > > the list of files built from the manifests. If there are extras an error can be
55 > > generated. This is actually the first test I did on my manifest parsing code. I tried
56 > > to see if my tracked files roughly matched the total files in tree. That can be
57 > > repurposed for this check.
58 > >
59 > > > > Some simple tests like counting all objects traversed and verified
60 > > > > returns the
61 > > > > same(ish). Once it is put into portage it could be tested in detail.
62 > > > >
63 > > > > There is also my partial attempt at removing the brittle interface to
64 > > > > GnuPG
65 > > > > (it's not as if the current code is badly designed, just that parsing
66 > > > > the
67 > > > > output of GnuPG directly is likely not the best idea).
68 > > >
69 > > > The 'brittle interface' is well-defined machine-readable output.
70 > > >
71 > >
72 > > Ok. I was aware there was a machine interface, but the classes that manipulate
73 > > a temporary GPG home seemed like not the best solution. I guess that is all
74 > > due to GPG assuming everything is in ~/.gnupg and keeping its state as a
75 > > directory structure.
76 >
77 > A temporary home directory guarantees that user configuration does not
78 > affect the verification result.
79 >
80
81 Yes, I know why it is there. The temporary construction of the directory is what
82 stood out to me as messy but I guess there is no way around it.
83
84 > >
85 > > > > Needs gemato, dnspython, and requests. Slightly better than random code
86 > > > > because
87 > > > > I took inspiration from the existing gemato classes.
88 > > >
89 > > > The code makes a lot of brittle assumptions about the structure. The
90 > > > GLEP was specifically designed to avoid that and let us adjust the
91 > > > structure in the future to meet our needs.
92 > > >
93 > >
94 > > These same assumptions are built into the code that operates on the
95 > > tree structure. If the GLEP were changed the existing code would also
96 > > potentially need changing. This code just uses the structure in a different
97 > > way.
98 > >
99 >
100 > The code that predates the GLEP, yes. It will eventually be changed to
101 > be more flexible, especially when we can assume that we start removing
102 > backwards compatibility.
103 >
104
105 My comment was more on the general nature of the code having to track
106 what is intended. Even if parsing code (involving the filesystem?) was
107 generated from a specification now the specification is "code" and still may
108 not reflect what is intended.
109
110 Taking liberties with the programmatic representation of the manifests
111 seems appropriate. I also don't see anything in the GLEP that dictates an
112 in-memory representation (and such a thing would probably be
113 inappropriate).

Replies

Subject Author
Re: [gentoo-portage-dev] Speeding up Tree Verification "Michał Górny" <mgorny@g.o>