Re: [gentoo-project] How to improve detection of unmaintained packages? - gentoo-project

From:	Raymond Jennings <shentino@×××××.com>
To:	gentoo-project@l.g.o
Subject:	Re: [gentoo-project] How to improve detection of unmaintained packages?
Date:	Sat, 23 Mar 2019 17:54:03
Message-Id:	`CAGDaZ_qR19s8V8ra0huV+SjdHqoAC8nLADMBMaz+Uyw=5H34eA@mail.gmail.com`
In Reply to:	Re: [gentoo-project] How to improve detection of unmaintained packages? by "Michał Górny"

1

On Sat, Mar 23, 2019 at 10:38 AM Michał Górny <mgorny@g.o> wrote:

2

3

> On Sat, 2019-03-23 at 10:05 -0700, Raymond Jennings wrote:

4

> > On Sat, Mar 23, 2019 at 7:18 AM Alec Warner <antarus@g.o> wrote:

5

> >

6

> > >

7

> > > On Sat, Mar 23, 2019 at 3:32 AM Michał Górny <mgorny@g.o>

8

> wrote:

9

> > >

10

> > > > Hi,

11

> > > >

12

> > > > Gentoo is still having a major problem of unmaintained packages.

13

> > > > I'm not talking about pure 'maintainer-needed' here but packages that

14

> > > > have apparent maintainers and stay under the radar for long, harming

15

> > > > users in the process.  I'd like to query potential solutions as how

16

> we

17

> > > > could improve this and look for new maintainers sooner.

18

> > > >

19

> > > >

20

> > > > The current state

21

> > > > =================

22

> > > > The definition of an unmaintained package here is a bit blurry.  For

23

> our

24

> > > > needs, let's say that an unmaintained package is a package that is

25

> not

26

> > > > getting attention of any of the maintainers, whose bugs are not

27

> looked

28

> > > > at, that does not receive version bumps or simply fails to build for

29

> > > > a long time.

30

> > > >

31

> > > > This is especially the case with 'revived herds', i.e. projects that

32

> > > > were formed from old herds.  Their main characteristic is that they

33

> > > > 'maintain' a large number of loosely-related packages, and their

34

> > > > developers take care of only a small subset of them.  Sadly, we still

35

> > > > have people who cherish that model, and instead of taking packages

36

> they

37

> > > > care about themselves, they shove it into one of 'their' herds.

38

> > > >

39

> > > > So far we're rarely catching such cases directly.  Sometimes it

40

> happens

41

> > > > when another developer tries to use the package and notices the

42

> problem,

43

> > > > then finds that it's been reported a long time ago and never received

44

> > > > any attention.

45

> > > >

46

> > > > Sometimes, after retiring a developer we notice that he had

47

> 'maintained'

48

> > > > packages that were broken for years and never received any attention.

49

> > > > There are even real cases of developers taking over broken packages

50

> just

51

> > > > to prevent them from being lastrited but without ever fixing them.

52

> > > >

53

> > > > Then, some of the packages are noticed as result of major API update

54

> > > > trackers, such as the openssl-1.1+ tracker or ncurses[tinfo] tracker.

55

> > > > Those API changes provoke build failures, and while investigating

56

> them

57

> > > > we discover that some of the software hasn't seen any upstream

58

> attention

59

> > > > since 2000 (!), not to mention maintainers that could actually patch

60

> > > > the issues.

61

> > > >

62

> > > >

63

> > > > Version bump-based inactivity?

64

> > > > ==============================

65

> > > > One of the options would be to monitor inactivity as negligence to

66

> bump

67

> > > > packages.  With euscan and/or repology, we are at least able to

68

> > > > partially monitor and report new versions of software (I think

69

> someone

70

> > > > used to do that but I don't see those reports anymore).  While this

71

> > > > still requires some manual processing (esp. given that repology

72

> results

73

> > > > are sometimes mistaken), it would be a step forward.

74

> > > >

75

> > > > The counterarguments for doing this is that not all version bumps are

76

> > > > meaningful to Gentoo.  We'd have to at least be able to filter out

77

> > > > development releases if maintainers are not doing them.  Sometimes we

78

> > > > also skip releases if they don't introduce anything meaningful to

79

> Gentoo

80

> > > > users.  Finally, some developers reject new versions of software for

81

> > > > various reasons.

82

> > > >

83

> > >

84

> > > I've also considered to just use time.

85

> > >

86

> > > Many *packages* have not been touched in N time. While some software

87

> > > doesn't get updates often, even routine maintenance should require

88

> edits on

89

> > > a fairly regular basis.

90

> > >

91

> > >

92

> > > >

93

> > > > Bugzilla-based inactivity?

94

> > > > ==========================

95

> > > > I've noticed something interesting in Fedora lately.  They have a

96

> policy

97

> > > > that if a package build failure is reported (note: they are reporting

98

> > > > them automatically) and the maintainer does not update it from the

99

> 'NEW'

100

> > > > state, it is automatically orphaned after 8 weeks.  Effectively,

101

> > > > if the maintainer does not take care (or at least pretends to)

102

> > > > of the package, it is orphaned automatically.

103

> > > >

104

> > > > I suppose we might be able to look for a similar policy in Gentoo.

105

> > > > However, there are two obvious counterarguments.  Firstly, this would

106

> > > > create 'busywork' that people would be required to do in order to

107

> > > > prevent from orphaning their packages.  Secondly, a fair number of

108

> > > > developers would just do this 'busywork' to every new bug just to

109

> avoid

110

> > > > the problem, rendering the measure ineffective.

111

> > > >

112

> > >

113

> > > Avoid letting the perfect be the enemy of the good here. Any metric

114

> can be

115

> > > gamed by developers; but it turns out we must choose some metric to

116

> drive

117

> > > the organization. I'm fairly sure not *all* developers will automate

118

> this

119

> > > busywork; because *some* of us want to see the number of unmaintained

120

> > > packages reduced; resulting in a net-win.

121

> > >

122

> > >

123

> > > >

124

> > > > What can we actually do?

125

> > > > ========================

126

> > > > Do you have any specific ideas how we could actually improve

127

> > > > the situation?  I'm particularly looking for things we could do at

128

> least

129

> > > > semi-automatically, without having to spend tremendous effort looking

130

> > > > through thousands of unhandled bugs manually.

131

> > > >

132

> > >

133

> > > So I'd recommend avoiding a specific implementation; which means don't

134

> > > trigger off of a specific signal.

135

> > >

136

> > > Signals:

137

> > > 1) euscan first; because its most accurate and plausible already

138

> > > implemented.

139

> > > 2) Date-based scanning; its trivial to implement.

140

> > >

141

> > > So now for each package, we have 2 straightforward signals. When was it

142

> > > last touched, how many versions behind?

143

> > >

144

> > > Rules:

145

> > > A package is unmaintained if it:

146

> > >   - Has not been touched in 5 years

147

> > >   - Is behind 3 versions AND hasn't been touched in 2 years

148

> > >   - Is behind 5 versions AND hasn't been touched in 1 years

149

> > >

150

> > > As we add more signals (e.g. doesn't build, or unfixed bugs) we can add

151

> > > additional rules.

152

> > >

153

> > > We could generate a QA report per package on the qa reports page.

154

> > > If there is an API for request the QA report, we could cross-link from

155

> > > p.g.o.

156

> > >

157

> > > -A

158

> > >

159

> > >

160

> > >

161

> > > > --

162

> > > > Best regards,

163

> > > > Michał Górny

164

> > > >

165

> > > >

166

> > As a side observation I'd like to exempt a package from being flagged as

167

> > unmaintained if there's nothing wrong with it.  If upstream is idle and

168

> the

169

> > package in a quiet state simply because there's no work needing done,

170

> then

171

> > the package should be left alone.

172

>

173

> This is the attitude that means that few months later a single person is

174

> overburdened with a few dozens unmaintained packages all suddenly

175

> falling apart.  Just like ncurses[tinfo].  Or openssl-1.1.

176

>

177

178

I wanted to point out that a package shouldn't be flagged as unmaintained

179

in the first place unless there is first a reason for it to be maintained.

180

Those should be weeded out as candidates under the principle of "if it

181

isn't broke don't fix it" since there's actually nothing wrong with the

182

package remaining status quo.

183

184

As it is the phase 4 I proposed is meant to catch broken packages that

185

either a) don't have a maintainer at all, or b) whose maintainer is

186

completely incommunicado, and not just busy.

187

188

To clarify context though, could you give an example, howsoever

189

hypothetical about "all suddenly falling apart"?  Perhaps you mean a

190

package that is a wide spread dependency, and its revdeps all break at the

191

same time due to some sort of api change or the like?  Is this what you

192

meant by ncurses and openssl-1.1?

193

194

>

195

> --

196

> Best regards,

197

> Michał Górny

198

>

199

>

1	On Sat, Mar 23, 2019 at 10:38 AM Michał Górny <mgorny@g.o> wrote:
2
3	> On Sat, 2019-03-23 at 10:05 -0700, Raymond Jennings wrote:
4	> > On Sat, Mar 23, 2019 at 7:18 AM Alec Warner <antarus@g.o> wrote:
5	> >
6	> > >
7	> > > On Sat, Mar 23, 2019 at 3:32 AM Michał Górny <mgorny@g.o>
8	> wrote:
9	> > >
10	> > > > Hi,
11	> > > >
12	> > > > Gentoo is still having a major problem of unmaintained packages.
13	> > > > I'm not talking about pure 'maintainer-needed' here but packages that
14	> > > > have apparent maintainers and stay under the radar for long, harming
15	> > > > users in the process. I'd like to query potential solutions as how
16	> we
17	> > > > could improve this and look for new maintainers sooner.
18	> > > >
19	> > > >
20	> > > > The current state
21	> > > > =================
22	> > > > The definition of an unmaintained package here is a bit blurry. For
23	> our
24	> > > > needs, let's say that an unmaintained package is a package that is
25	> not
26	> > > > getting attention of any of the maintainers, whose bugs are not
27	> looked
28	> > > > at, that does not receive version bumps or simply fails to build for
29	> > > > a long time.
30	> > > >
31	> > > > This is especially the case with 'revived herds', i.e. projects that
32	> > > > were formed from old herds. Their main characteristic is that they
33	> > > > 'maintain' a large number of loosely-related packages, and their
34	> > > > developers take care of only a small subset of them. Sadly, we still
35	> > > > have people who cherish that model, and instead of taking packages
36	> they
37	> > > > care about themselves, they shove it into one of 'their' herds.
38	> > > >
39	> > > > So far we're rarely catching such cases directly. Sometimes it
40	> happens
41	> > > > when another developer tries to use the package and notices the
42	> problem,
43	> > > > then finds that it's been reported a long time ago and never received
44	> > > > any attention.
45	> > > >
46	> > > > Sometimes, after retiring a developer we notice that he had
47	> 'maintained'
48	> > > > packages that were broken for years and never received any attention.
49	> > > > There are even real cases of developers taking over broken packages
50	> just
51	> > > > to prevent them from being lastrited but without ever fixing them.
52	> > > >
53	> > > > Then, some of the packages are noticed as result of major API update
54	> > > > trackers, such as the openssl-1.1+ tracker or ncurses[tinfo] tracker.
55	> > > > Those API changes provoke build failures, and while investigating
56	> them
57	> > > > we discover that some of the software hasn't seen any upstream
58	> attention
59	> > > > since 2000 (!), not to mention maintainers that could actually patch
60	> > > > the issues.
61	> > > >
62	> > > >
63	> > > > Version bump-based inactivity?
64	> > > > ==============================
65	> > > > One of the options would be to monitor inactivity as negligence to
66	> bump
67	> > > > packages. With euscan and/or repology, we are at least able to
68	> > > > partially monitor and report new versions of software (I think
69	> someone
70	> > > > used to do that but I don't see those reports anymore). While this
71	> > > > still requires some manual processing (esp. given that repology
72	> results
73	> > > > are sometimes mistaken), it would be a step forward.
74	> > > >
75	> > > > The counterarguments for doing this is that not all version bumps are
76	> > > > meaningful to Gentoo. We'd have to at least be able to filter out
77	> > > > development releases if maintainers are not doing them. Sometimes we
78	> > > > also skip releases if they don't introduce anything meaningful to
79	> Gentoo
80	> > > > users. Finally, some developers reject new versions of software for
81	> > > > various reasons.
82	> > > >
83	> > >
84	> > > I've also considered to just use time.
85	> > >
86	> > > Many packages have not been touched in N time. While some software
87	> > > doesn't get updates often, even routine maintenance should require
88	> edits on
89	> > > a fairly regular basis.
90	> > >
91	> > >
92	> > > >
93	> > > > Bugzilla-based inactivity?
94	> > > > ==========================
95	> > > > I've noticed something interesting in Fedora lately. They have a
96	> policy
97	> > > > that if a package build failure is reported (note: they are reporting
98	> > > > them automatically) and the maintainer does not update it from the
99	> 'NEW'
100	> > > > state, it is automatically orphaned after 8 weeks. Effectively,
101	> > > > if the maintainer does not take care (or at least pretends to)
102	> > > > of the package, it is orphaned automatically.
103	> > > >
104	> > > > I suppose we might be able to look for a similar policy in Gentoo.
105	> > > > However, there are two obvious counterarguments. Firstly, this would
106	> > > > create 'busywork' that people would be required to do in order to
107	> > > > prevent from orphaning their packages. Secondly, a fair number of
108	> > > > developers would just do this 'busywork' to every new bug just to
109	> avoid
110	> > > > the problem, rendering the measure ineffective.
111	> > > >
112	> > >
113	> > > Avoid letting the perfect be the enemy of the good here. Any metric
114	> can be
115	> > > gamed by developers; but it turns out we must choose some metric to
116	> drive
117	> > > the organization. I'm fairly sure not all developers will automate
118	> this
119	> > > busywork; because some of us want to see the number of unmaintained
120	> > > packages reduced; resulting in a net-win.
121	> > >
122	> > >
123	> > > >
124	> > > > What can we actually do?
125	> > > > ========================
126	> > > > Do you have any specific ideas how we could actually improve
127	> > > > the situation? I'm particularly looking for things we could do at
128	> least
129	> > > > semi-automatically, without having to spend tremendous effort looking
130	> > > > through thousands of unhandled bugs manually.
131	> > > >
132	> > >
133	> > > So I'd recommend avoiding a specific implementation; which means don't
134	> > > trigger off of a specific signal.
135	> > >
136	> > > Signals:
137	> > > 1) euscan first; because its most accurate and plausible already
138	> > > implemented.
139	> > > 2) Date-based scanning; its trivial to implement.
140	> > >
141	> > > So now for each package, we have 2 straightforward signals. When was it
142	> > > last touched, how many versions behind?
143	> > >
144	> > > Rules:
145	> > > A package is unmaintained if it:
146	> > > - Has not been touched in 5 years
147	> > > - Is behind 3 versions AND hasn't been touched in 2 years
148	> > > - Is behind 5 versions AND hasn't been touched in 1 years
149	> > >
150	> > > As we add more signals (e.g. doesn't build, or unfixed bugs) we can add
151	> > > additional rules.
152	> > >
153	> > > We could generate a QA report per package on the qa reports page.
154	> > > If there is an API for request the QA report, we could cross-link from
155	> > > p.g.o.
156	> > >
157	> > > -A
158	> > >
159	> > >
160	> > >
161	> > > > --
162	> > > > Best regards,
163	> > > > Michał Górny
164	> > > >
165	> > > >
166	> > As a side observation I'd like to exempt a package from being flagged as
167	> > unmaintained if there's nothing wrong with it. If upstream is idle and
168	> the
169	> > package in a quiet state simply because there's no work needing done,
170	> then
171	> > the package should be left alone.
172	>
173	> This is the attitude that means that few months later a single person is
174	> overburdened with a few dozens unmaintained packages all suddenly
175	> falling apart. Just like ncurses[tinfo]. Or openssl-1.1.
176	>
177
178	I wanted to point out that a package shouldn't be flagged as unmaintained
179	in the first place unless there is first a reason for it to be maintained.
180	Those should be weeded out as candidates under the principle of "if it
181	isn't broke don't fix it" since there's actually nothing wrong with the
182	package remaining status quo.
183
184	As it is the phase 4 I proposed is meant to catch broken packages that
185	either a) don't have a maintainer at all, or b) whose maintainer is
186	completely incommunicado, and not just busy.
187
188	To clarify context though, could you give an example, howsoever
189	hypothetical about "all suddenly falling apart"? Perhaps you mean a
190	package that is a wide spread dependency, and its revdeps all break at the
191	same time due to some sort of api change or the like? Is this what you
192	meant by ncurses and openssl-1.1?
193
194	>
195	> --
196	> Best regards,
197	> Michał Górny
198	>
199	>

Gentoo Archives: gentoo-project