1 |
On Tue, Jun 30, 2020, at 2:28 AM, Michał Górny wrote: |
2 |
> Dnia June 30, 2020 2:13:43 AM UTC, Sid Spry <sid@××××.us> napisał(a): |
3 |
> >Hello, |
4 |
> > |
5 |
> >I have some runnable pseudocode outlining a faster tree verification |
6 |
> >algorithm. |
7 |
> >Before I create patches I'd like to see if there is any guidance on |
8 |
> >making the |
9 |
> >changes as unobtrusive as possible. If the radical change in algorithm |
10 |
> >is |
11 |
> >acceptable I can work on adding the changes. |
12 |
> > |
13 |
> >Instead of composing any kind of structured data out of the portage |
14 |
> >tree my |
15 |
> >algorithm just lists all files and then optionally batches them out to |
16 |
> >threads. |
17 |
> >There is a noticeable speedup by eliding the tree traversal operations |
18 |
> >which |
19 |
> >can be seen when running the algorithm with a single thread and |
20 |
> >comparing it to |
21 |
> >the current algorithm in gemato (which should still be discussed |
22 |
> >here?). |
23 |
> |
24 |
> Without reading the code: does your algorithm correctly detect extraneous files? |
25 |
> |
26 |
|
27 |
Yes and no. |
28 |
|
29 |
I am not sure why this is necessary. If the file does not appear in a manifest it is |
30 |
ignored. It makes the most sense to me to put the burden of not including |
31 |
untracked files on the publisher. If the user puts an untracked file into the tree it |
32 |
will be ignored to no consequence; the authored files don't refer to it, after all. |
33 |
|
34 |
But it would be easy enough to build a second list of all files and compare it to |
35 |
the list of files built from the manifests. If there are extras an error can be |
36 |
generated. This is actually the first test I did on my manifest parsing code. I tried |
37 |
to see if my tracked files roughly matched the total files in tree. That can be |
38 |
repurposed for this check. |
39 |
|
40 |
> >Some simple tests like counting all objects traversed and verified |
41 |
> >returns the |
42 |
> >same(ish). Once it is put into portage it could be tested in detail. |
43 |
> > |
44 |
> >There is also my partial attempt at removing the brittle interface to |
45 |
> >GnuPG |
46 |
> >(it's not as if the current code is badly designed, just that parsing |
47 |
> >the |
48 |
> >output of GnuPG directly is likely not the best idea). |
49 |
> |
50 |
> The 'brittle interface' is well-defined machine-readable output. |
51 |
> |
52 |
|
53 |
Ok. I was aware there was a machine interface, but the classes that manipulate |
54 |
a temporary GPG home seemed like not the best solution. I guess that is all |
55 |
due to GPG assuming everything is in ~/.gnupg and keeping its state as a |
56 |
directory structure. |
57 |
|
58 |
> > |
59 |
> >Needs gemato, dnspython, and requests. Slightly better than random code |
60 |
> >because |
61 |
> >I took inspiration from the existing gemato classes. |
62 |
> |
63 |
> The code makes a lot of brittle assumptions about the structure. The |
64 |
> GLEP was specifically designed to avoid that and let us adjust the |
65 |
> structure in the future to meet our needs. |
66 |
> |
67 |
|
68 |
These same assumptions are built into the code that operates on the |
69 |
tree structure. If the GLEP were changed the existing code would also |
70 |
potentially need changing. This code just uses the structure in a different |
71 |
way. |
72 |
|
73 |
I will admit my partial understanding of the entire GLEP. I made some |
74 |
simplifications just to get something demonstrable done. However, please |
75 |
consider removing or putting some of the checks elsewhere. I don't have |
76 |
full suggestions right now, but there is the possibility of saving an |
77 |
appreciable amount of time. |