1 |
On Tue, 2020-06-30 at 12:50 -0500, Sid Spry wrote: |
2 |
> On Tue, Jun 30, 2020, at 2:28 AM, Michał Górny wrote: |
3 |
> > Dnia June 30, 2020 2:13:43 AM UTC, Sid Spry <sid@××××.us> napisał(a): |
4 |
> > > Hello, |
5 |
> > > |
6 |
> > > I have some runnable pseudocode outlining a faster tree verification |
7 |
> > > algorithm. |
8 |
> > > Before I create patches I'd like to see if there is any guidance on |
9 |
> > > making the |
10 |
> > > changes as unobtrusive as possible. If the radical change in algorithm |
11 |
> > > is |
12 |
> > > acceptable I can work on adding the changes. |
13 |
> > > |
14 |
> > > Instead of composing any kind of structured data out of the portage |
15 |
> > > tree my |
16 |
> > > algorithm just lists all files and then optionally batches them out to |
17 |
> > > threads. |
18 |
> > > There is a noticeable speedup by eliding the tree traversal operations |
19 |
> > > which |
20 |
> > > can be seen when running the algorithm with a single thread and |
21 |
> > > comparing it to |
22 |
> > > the current algorithm in gemato (which should still be discussed |
23 |
> > > here?). |
24 |
> > |
25 |
> > Without reading the code: does your algorithm correctly detect extraneous files? |
26 |
> > |
27 |
> |
28 |
> Yes and no. |
29 |
> |
30 |
> I am not sure why this is necessary. If the file does not appear in a manifest it is |
31 |
> ignored. It makes the most sense to me to put the burden of not including |
32 |
> untracked files on the publisher. If the user puts an untracked file into the tree it |
33 |
> will be ignored to no consequence; the authored files don't refer to it, after all. |
34 |
|
35 |
This is necessary because a malicious third party can MITM you an rsync |
36 |
tree with extraneous files (say, -r1 baselayout ebuild) that do horrible |
37 |
things on your system. If you don't reject files not in Manifest, you |
38 |
open a huge security hole. |
39 |
|
40 |
> But it would be easy enough to build a second list of all files and compare it to |
41 |
> the list of files built from the manifests. If there are extras an error can be |
42 |
> generated. This is actually the first test I did on my manifest parsing code. I tried |
43 |
> to see if my tracked files roughly matched the total files in tree. That can be |
44 |
> repurposed for this check. |
45 |
> |
46 |
> > > Some simple tests like counting all objects traversed and verified |
47 |
> > > returns the |
48 |
> > > same(ish). Once it is put into portage it could be tested in detail. |
49 |
> > > |
50 |
> > > There is also my partial attempt at removing the brittle interface to |
51 |
> > > GnuPG |
52 |
> > > (it's not as if the current code is badly designed, just that parsing |
53 |
> > > the |
54 |
> > > output of GnuPG directly is likely not the best idea). |
55 |
> > |
56 |
> > The 'brittle interface' is well-defined machine-readable output. |
57 |
> > |
58 |
> |
59 |
> Ok. I was aware there was a machine interface, but the classes that manipulate |
60 |
> a temporary GPG home seemed like not the best solution. I guess that is all |
61 |
> due to GPG assuming everything is in ~/.gnupg and keeping its state as a |
62 |
> directory structure. |
63 |
|
64 |
A temporary home directory guarantees that user configuration does not |
65 |
affect the verification result. |
66 |
|
67 |
> |
68 |
> > > Needs gemato, dnspython, and requests. Slightly better than random code |
69 |
> > > because |
70 |
> > > I took inspiration from the existing gemato classes. |
71 |
> > |
72 |
> > The code makes a lot of brittle assumptions about the structure. The |
73 |
> > GLEP was specifically designed to avoid that and let us adjust the |
74 |
> > structure in the future to meet our needs. |
75 |
> > |
76 |
> |
77 |
> These same assumptions are built into the code that operates on the |
78 |
> tree structure. If the GLEP were changed the existing code would also |
79 |
> potentially need changing. This code just uses the structure in a different |
80 |
> way. |
81 |
> |
82 |
|
83 |
The code that predates the GLEP, yes. It will eventually be changed to |
84 |
be more flexible, especially when we can assume that we start removing |
85 |
backwards compatibility. |
86 |
|
87 |
-- |
88 |
Best regards, |
89 |
Michał Górny |