1 |
Hello, everyone. |
2 |
|
3 |
TL;DR: |
4 |
|
5 |
1. There has been a few NATTkA misfires around 2 PM UTC today. |
6 |
I'm sorry for the noise. |
7 |
|
8 |
2. In the next hour, a major NATTkA + pkgcore upgrade should roll out. |
9 |
No problems are expected but please contact me if you see weird |
10 |
behavior after the upgrade (especially incorrect sanity-check |
11 |
results). |
12 |
|
13 |
3. A workaround has been added that should hopefully finally fix |
14 |
occasional misbehavior due to Bugzilla race conditions. As a side |
15 |
effect, NATTkA may be a bit slower in responding to new bugs (up to |
16 |
4 minutes of delay). |
17 |
|
18 |
Full explanation follows. |
19 |
|
20 |
|
21 |
Infra's been running an old version of NATTkA for quite some time. |
22 |
The previous upgrade attempt (that involved an incompatible pkgcheck API |
23 |
change) failed due to some cryptic bugs. A lot of stable/keywording |
24 |
requests suddenly started failing -- and it seemed that pkgcheck was |
25 |
checking keyworded ebuilds in the temporary against old dependencies |
26 |
in /usr/portage. |
27 |
|
28 |
I've been doing some new development in NATTkA today, and in order to |
29 |
deploy it cleanly I've finally decided to try figuring out what's wrong |
30 |
with new NATTkA + pkgcore. I've installed the new versions on martin |
31 |
(the Infra host that used to run NATTkA in the past), and started |
32 |
testing them. |
33 |
|
34 |
I didn't notice that puppet has failed to remove the old NATTkA cronjob |
35 |
from martin. So when NATTkA was installed again, the cronjob started |
36 |
running the broken NATTkA version, and it started fighting with |
37 |
the correct instance over bugs. As a result, a few bugs has seen ping- |
38 |
pong between sanity-check+ and sanity-check- results. After noticing |
39 |
the problem, I've removed the old cronjob. I apologize for the bugspam |
40 |
caused by this. |
41 |
|
42 |
Good news is that I've discovered that upgrading to the latest ~arch |
43 |
pkgcore & co. (unmasked versions) resolves the problem in question. |
44 |
Since NATTkA is run on a different host than other services requiring |
45 |
old pkgcore, I am going to deploy the full set of new versions shortly. |
46 |
The initial testing run didn't yield any suspicious results, so |
47 |
hopefully there will be no major problems this time. |
48 |
|
49 |
The new version also includes a workaround for weird NATTkA behavior -- |
50 |
you might have noticed in the past that NATTkA was readding arch teams |
51 |
to fixed stabilization requests, or that today it reverted 'package |
52 |
list' to an earlier state while expanding it. I've been trying to |
53 |
figure out what's wrong with NATTkA's logic for a long time, and I've |
54 |
finally came to the conclusion that the problem is actually in Bugzilla. |
55 |
|
56 |
I haven't verified the exact cause but it's most likely that Bugzilla is |
57 |
executing multiple SELECT queries while performing the bug search, |
58 |
and therefore could end up with combination of bug properties before |
59 |
and after an update. This is the only way I can explain bug #779535. |
60 |
In a single action, CC-ARCHES was added to the bug and the package list |
61 |
was changed. However, NATTkA has reverted to the old package list while |
62 |
expanding -- which can happen only if the bug had CC-ARCHES already. |
63 |
Both keywords and package list is grabbed from Bugzilla via a single |
64 |
REST API query, so my only explanation for this is that Bugzilla API has |
65 |
returned new keywords but old package list. |
66 |
|
67 |
To avoid this, NATTkA now skips bugs that were updated later than 60 |
68 |
seconds prior to running the search. These bugs will be deferred to |
69 |
the next run (i.e. 4 minutes later), and Bugzilla should sync up until |
70 |
then. Of course, this is going to work only if the 'last change time' |
71 |
field is updated no later than other bug data. |
72 |
|
73 |
If you have any questions or problems, please do not hesitate to contact |
74 |
me or report a bug (either on Gentoo Bugzilla, or on NATTkA's GitHub |
75 |
issue tracker). That said, I realize there's a quite a number of |
76 |
problems reported already, and I hope I'll be able to start addressing |
77 |
them ~next month. |
78 |
|
79 |
[1] https://bugs.gentoo.org/779535#c8 |
80 |
|
81 |
-- |
82 |
Best regards, |
83 |
Michał Górny |