1 |
On Wed, Jul 1, 2020, at 1:40 AM, Fabian Groffen wrote: |
2 |
> On 30-06-2020 13:13:29 -0500, Sid Spry wrote: |
3 |
> > On Tue, Jun 30, 2020, at 1:20 AM, Fabian Groffen wrote: |
4 |
> > > Hi, |
5 |
> > > |
6 |
> > > On 29-06-2020 21:13:43 -0500, Sid Spry wrote: |
7 |
> > > > Hello, |
8 |
> > > > |
9 |
> > > > I have some runnable pseudocode outlining a faster tree verification algorithm. |
10 |
> > > > Before I create patches I'd like to see if there is any guidance on making the |
11 |
> > > > changes as unobtrusive as possible. If the radical change in algorithm is |
12 |
> > > > acceptable I can work on adding the changes. |
13 |
> > > > |
14 |
> > > > Instead of composing any kind of structured data out of the portage tree my |
15 |
> > > > algorithm just lists all files and then optionally batches them out to threads. |
16 |
> > > > There is a noticeable speedup by eliding the tree traversal operations which |
17 |
> > > > can be seen when running the algorithm with a single thread and comparing it to |
18 |
> > > > the current algorithm in gemato (which should still be discussed here?). |
19 |
> > > |
20 |
> > > I remember something that gemato used to use multiple threads, but |
21 |
> > > because it totally saturated disk-IO, it was brought back to a single |
22 |
> > > thread. People were complaining about unusable systems. |
23 |
> > > |
24 |
> > |
25 |
> > I think this is an argument for cgroups limits support on the portage process or |
26 |
> > account as opposed to an argument against picking a better algorithm. That is |
27 |
> > something I have been working towards, but I am only one man. |
28 |
> |
29 |
> But this requires a) cgroups support, and b) the privileges to use it. |
30 |
> Shouldn't be a problem in the normal case, but just saying. |
31 |
> |
32 |
> > > In any case, can you share your performance results? What speedup did |
33 |
> > > you see, on warm and hot FS caches? Which type of disk do you use? |
34 |
> > > |
35 |
> > |
36 |
> > I ran all tests multiple times to make them warm off of a Samsung SSD, but |
37 |
> > nothing very precise yet. |
38 |
> > |
39 |
> > % gemato verify --openpgp-key signkey.asc /var/db/repos/gentoo |
40 |
> > [...] |
41 |
> > INFO:root:Verifying /var/db/repos/gentoo... |
42 |
> > INFO:root:/var/db/repos/gentoo verified in 16.45 seconds |
43 |
> > |
44 |
> > sometimes going higher, closer to 18s, vs. |
45 |
> > |
46 |
> > % ./veriftree.py |
47 |
> > 4.763171965983929 |
48 |
> > |
49 |
> > So roughly an order of magnitude speedup without batching to threads. |
50 |
> |
51 |
> That is kind of a change. Makes one wonder if you really did the same |
52 |
> work. |
53 |
> |
54 |
|
55 |
That was my initial reaction. I attempted to ensure I was processing all of |
56 |
the files that gemato processed. The full output of my script is something |
57 |
closer to: |
58 |
|
59 |
% ./veriftree.py |
60 |
x.xxxxxxxxxx |
61 |
192157 |
62 |
126237 |
63 |
|
64 |
The first number being the time, the second the total number of manifest directives, |
65 |
and the third being the number of real files in the tree. If you prune the directives |
66 |
that correspond to no file you end up with an exact match IIRC. |
67 |
|
68 |
However, you are right, and I think this is old code. gemato times the manifest file |
69 |
parsing as well as the verification. It seems this change is not in the code I |
70 |
provided. If I do that instead, I get: |
71 |
|
72 |
% ./veriftree.py |
73 |
11.708862617029808 |
74 |
192157 |
75 |
126237 |
76 |
|
77 |
With corresponding times for gemato (at same system state, etc) being ~20s. So it |
78 |
is a halving at worst with assured n-core speedup for 1/2 of that time, and I am |
79 |
fairly confident I can speed up the manifest parsing even more as well. |
80 |
|
81 |
> > > You could compare against qmanifest, which uses OpenMP-based |
82 |
> > > paralllelism while verifying the tree. On SSDs this does help. |
83 |
> > > |
84 |
> > |
85 |
> > I lost my notes -- how do I specify to either gemato or qmanifest the GnuPG |
86 |
> > directory? My code is partially structured as it is because I had problems doing |
87 |
> > this. I rediscovered -K/--openpgp-key in gemato but am unsure for qmanifest. |
88 |
> |
89 |
> qmanifest doesn't do much magic out of the standard gnupg practices. |
90 |
> (It is using gpgme.) If you want it to use a different gnupg dir, you |
91 |
> may change HOME, or GNUPGHOME. |
92 |
> |
93 |
|
94 |
Alright, I will attempt to set that. I think I like the interface of gemato a little more |
95 |
but will look at qmanifest and see how it performs. |