Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev-announce@l.g.o
Cc: gentoo-dev@l.g.o
Subject: [gentoo-dev] Incoming NATTkA upgrade
Date: Fri, 16 Apr 2021 15:11:36
Message-Id: 192cac75b99ca81111f4714bff0490a3e0d5a047.camel@gentoo.org
1 Hello, everyone.
2
3 TL;DR:
4
5 1. There has been a few NATTkA misfires around 2 PM UTC today.
6 I'm sorry for the noise.
7
8 2. In the next hour, a major NATTkA + pkgcore upgrade should roll out.
9 No problems are expected but please contact me if you see weird
10 behavior after the upgrade (especially incorrect sanity-check
11 results).
12
13 3. A workaround has been added that should hopefully finally fix
14 occasional misbehavior due to Bugzilla race conditions. As a side
15 effect, NATTkA may be a bit slower in responding to new bugs (up to
16 4 minutes of delay).
17
18 Full explanation follows.
19
20
21 Infra's been running an old version of NATTkA for quite some time.
22 The previous upgrade attempt (that involved an incompatible pkgcheck API
23 change) failed due to some cryptic bugs. A lot of stable/keywording
24 requests suddenly started failing -- and it seemed that pkgcheck was
25 checking keyworded ebuilds in the temporary against old dependencies
26 in /usr/portage.
27
28 I've been doing some new development in NATTkA today, and in order to
29 deploy it cleanly I've finally decided to try figuring out what's wrong
30 with new NATTkA + pkgcore. I've installed the new versions on martin
31 (the Infra host that used to run NATTkA in the past), and started
32 testing them.
33
34 I didn't notice that puppet has failed to remove the old NATTkA cronjob
35 from martin. So when NATTkA was installed again, the cronjob started
36 running the broken NATTkA version, and it started fighting with
37 the correct instance over bugs. As a result, a few bugs has seen ping-
38 pong between sanity-check+ and sanity-check- results. After noticing
39 the problem, I've removed the old cronjob. I apologize for the bugspam
40 caused by this.
41
42 Good news is that I've discovered that upgrading to the latest ~arch
43 pkgcore & co. (unmasked versions) resolves the problem in question.
44 Since NATTkA is run on a different host than other services requiring
45 old pkgcore, I am going to deploy the full set of new versions shortly.
46 The initial testing run didn't yield any suspicious results, so
47 hopefully there will be no major problems this time.
48
49 The new version also includes a workaround for weird NATTkA behavior --
50 you might have noticed in the past that NATTkA was readding arch teams
51 to fixed stabilization requests, or that today it reverted 'package
52 list' to an earlier state while expanding it. I've been trying to
53 figure out what's wrong with NATTkA's logic for a long time, and I've
54 finally came to the conclusion that the problem is actually in Bugzilla.
55
56 I haven't verified the exact cause but it's most likely that Bugzilla is
57 executing multiple SELECT queries while performing the bug search,
58 and therefore could end up with combination of bug properties before
59 and after an update. This is the only way I can explain bug #779535.
60 In a single action, CC-ARCHES was added to the bug and the package list
61 was changed. However, NATTkA has reverted to the old package list while
62 expanding -- which can happen only if the bug had CC-ARCHES already.
63 Both keywords and package list is grabbed from Bugzilla via a single
64 REST API query, so my only explanation for this is that Bugzilla API has
65 returned new keywords but old package list.
66
67 To avoid this, NATTkA now skips bugs that were updated later than 60
68 seconds prior to running the search. These bugs will be deferred to
69 the next run (i.e. 4 minutes later), and Bugzilla should sync up until
70 then. Of course, this is going to work only if the 'last change time'
71 field is updated no later than other bug data.
72
73 If you have any questions or problems, please do not hesitate to contact
74 me or report a bug (either on Gentoo Bugzilla, or on NATTkA's GitHub
75 issue tracker). That said, I realize there's a quite a number of
76 problems reported already, and I hope I'll be able to start addressing
77 them ~next month.
78
79 [1] https://bugs.gentoo.org/779535#c8
80
81 --
82 Best regards,
83 Michał Górny