Gentoo Archives: gentoo-portage-dev

From: Sid Spry <sid@××××.us>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] Speeding up Tree Verification
Date: Wed, 01 Jul 2020 20:25:45
Message-Id: 1b2626f7-a171-4363-a6a6-966990246578@www.fastmail.com
In Reply to: Re: [gentoo-portage-dev] Speeding up Tree Verification by Fabian Groffen
1 On Wed, Jul 1, 2020, at 1:40 AM, Fabian Groffen wrote:
2 > On 30-06-2020 13:13:29 -0500, Sid Spry wrote:
3 > > On Tue, Jun 30, 2020, at 1:20 AM, Fabian Groffen wrote:
4 > > > Hi,
5 > > >
6 > > > On 29-06-2020 21:13:43 -0500, Sid Spry wrote:
7 > > > > Hello,
8 > > > >
9 > > > > I have some runnable pseudocode outlining a faster tree verification algorithm.
10 > > > > Before I create patches I'd like to see if there is any guidance on making the
11 > > > > changes as unobtrusive as possible. If the radical change in algorithm is
12 > > > > acceptable I can work on adding the changes.
13 > > > >
14 > > > > Instead of composing any kind of structured data out of the portage tree my
15 > > > > algorithm just lists all files and then optionally batches them out to threads.
16 > > > > There is a noticeable speedup by eliding the tree traversal operations which
17 > > > > can be seen when running the algorithm with a single thread and comparing it to
18 > > > > the current algorithm in gemato (which should still be discussed here?).
19 > > >
20 > > > I remember something that gemato used to use multiple threads, but
21 > > > because it totally saturated disk-IO, it was brought back to a single
22 > > > thread. People were complaining about unusable systems.
23 > > >
24 > >
25 > > I think this is an argument for cgroups limits support on the portage process or
26 > > account as opposed to an argument against picking a better algorithm. That is
27 > > something I have been working towards, but I am only one man.
28 >
29 > But this requires a) cgroups support, and b) the privileges to use it.
30 > Shouldn't be a problem in the normal case, but just saying.
31 >
32 > > > In any case, can you share your performance results? What speedup did
33 > > > you see, on warm and hot FS caches? Which type of disk do you use?
34 > > >
35 > >
36 > > I ran all tests multiple times to make them warm off of a Samsung SSD, but
37 > > nothing very precise yet.
38 > >
39 > > % gemato verify --openpgp-key signkey.asc /var/db/repos/gentoo
40 > > [...]
41 > > INFO:root:Verifying /var/db/repos/gentoo...
42 > > INFO:root:/var/db/repos/gentoo verified in 16.45 seconds
43 > >
44 > > sometimes going higher, closer to 18s, vs.
45 > >
46 > > % ./veriftree.py
47 > > 4.763171965983929
48 > >
49 > > So roughly an order of magnitude speedup without batching to threads.
50 >
51 > That is kind of a change. Makes one wonder if you really did the same
52 > work.
53 >
54
55 That was my initial reaction. I attempted to ensure I was processing all of
56 the files that gemato processed. The full output of my script is something
57 closer to:
58
59 % ./veriftree.py
60 x.xxxxxxxxxx
61 192157
62 126237
63
64 The first number being the time, the second the total number of manifest directives,
65 and the third being the number of real files in the tree. If you prune the directives
66 that correspond to no file you end up with an exact match IIRC.
67
68 However, you are right, and I think this is old code. gemato times the manifest file
69 parsing as well as the verification. It seems this change is not in the code I
70 provided. If I do that instead, I get:
71
72 % ./veriftree.py
73 11.708862617029808
74 192157
75 126237
76
77 With corresponding times for gemato (at same system state, etc) being ~20s. So it
78 is a halving at worst with assured n-core speedup for 1/2 of that time, and I am
79 fairly confident I can speed up the manifest parsing even more as well.
80
81 > > > You could compare against qmanifest, which uses OpenMP-based
82 > > > paralllelism while verifying the tree. On SSDs this does help.
83 > > >
84 > >
85 > > I lost my notes -- how do I specify to either gemato or qmanifest the GnuPG
86 > > directory? My code is partially structured as it is because I had problems doing
87 > > this. I rediscovered -K/--openpgp-key in gemato but am unsure for qmanifest.
88 >
89 > qmanifest doesn't do much magic out of the standard gnupg practices.
90 > (It is using gpgme.) If you want it to use a different gnupg dir, you
91 > may change HOME, or GNUPGHOME.
92 >
93
94 Alright, I will attempt to set that. I think I like the interface of gemato a little more
95 but will look at qmanifest and see how it performs.