Gentoo Archives: gentoo-dev

From: Alec Warner <antarus@g.o>
To: Gentoo Dev <gentoo-dev@l.g.o>
Subject: Re: [gentoo-dev] Killing herds, again
Date: Wed, 03 Apr 2019 22:52:23
Message-Id: CAAr7Pr93Wkqg2-_bdCePmchwbLubZzRYxgokobvpbE-xFz-qTA@mail.gmail.com
In Reply to: [gentoo-dev] Killing herds, again by "Michał Górny"
1 On Wed, Apr 3, 2019 at 1:36 PM Michał Górny <mgorny@g.o> wrote:
2
3 > Hello, everyone.
4 >
5 > Back in 2016, we've killed the technical representation of herds. Some
6 > of them were disbanded completely, others merged with existing projects
7 > or converted into new projects. This solved some of the problems with
8 > maintainer declarations but it didn't solve the most important problem
9 > herds posed. Sadly, it seems that the spirit of herds survived along
10 > with those problems.
11 >
12 > Herds served as a method of grouping packages by a common topic,
13 > somewhat similar (but usually more broadly) than categories. In their
14 > mature state, herds had either their specific maintainers, or were
15 > directly connected to projects (which in turn provided maintainers for
16 > the herds). Today, we still have many herds that are masked either
17 > as complete projects, or semi-projects (i.e. project entries without
18 > explicit lead, policies or anything else).
19 >
20 >
21 > What's wrong with herds?
22 > ------------------------
23 > The main problem with herds is that they represent an artificial
24 > relation between packages. The only common thing about them is topic,
25 > and there is no real reason why a group of people would maintain all
26 > packages regarding the same topic. In fact, it is absurd -- say, why
27 > would a single person maintain 10+ competing cron implementations?
28 > Surely, there is some common knowledge related to running cron,
29 > and it is entirely possible that a single person would use a few
30 > different cron implementations on different systems. But that doesn't
31 > justify creating an artificial project to maintain all cron
32 > implementations.
33 >
34 > Mapping this to reality, projects usually represent a few developers,
35 > each of them interested in a specific subset of packages maintained by
36 > the project. In some cases, this is explicitly noted as project member
37 > roles; in other, it is not stated clearly anywhere. In both cases,
38 > there is usually some group of packages that are assigned to
39 > the specific project but not maintained by any of the project members.
40 >
41 > Less structured projects often have problems tracking member activity.
42 > More than once a project effectively died when all members became
43 > inactive, yet effectively hid the fact that the relevant packages were
44 > unmaintained and sometimes discouraged more timid developers from fixing
45 > bugs.
46 >
47
48 I'm not sure I follow this logic.
49
50 1) We know who is in every project.
51 2) We know the state of every developer.
52
53 We should be able to detect if:
54
55 a) A project has empty.
56 b) A project has no active developers.
57
58 I don't see how this is markedly different from a package, assigned to no
59 maintainer. Or a package assigned to a maintainer who is not active.
60
61 So the solution here seems to be to fix the tools to detect this situation
62 and make it clearer that the package has no active maintainer?
63
64 (I tend to agree with your general thrust of the rest of the proposal, but
65 I think in general limiting how projects are used on a statutory basis
66 seems incorrect.)
67
68
69 >
70 >
71 > What kind of projects make sense?
72 > ---------------------------------
73 > If we are to fight herd-like projects, I think it is important to
74 > consider a bit what kind of projects make sense, and what form herd-like
75 > trouble.
76 >
77 > The two projects maintaining the largest number of packages in Gentoo
78 > are respectively the Perl project and the Python project. Strictly
79 > speaking, both could be considered herd-like -- after all, they maintain
80 > a lot of packages belonging to the same category. To some degree, this
81 > is true. However, I believe those make sense because:
82 >
83 > a. They maintain a central group of packages, eclasses, policies etc.
84 > related to writing ebuilds using the specific programming language,
85 > and help other developers with it. The existence of such a project is
86 > really useful.
87 >
88 > b. The packages maintained by them have many common properties,
89 > frequently come from common sources (CPAN, pypi) and that makes it
90 > possible for a large number of developers to actually maintain all
91 > of them.
92 >
93 > The Python project I know better, so I'll add something. It does not
94 > accept all Python packages (although some developers insist on adding us
95 > to them without asking), and especially not random programs written in
96 > the Python language. It specifically focuses on Python module packages,
97 > i.e. resources generally useful to Python programmers. This is what
98 > makes it different from a common herd project.
99 >
100 > The third biggest project in Gentoo is -- in my opinion -- a perfect
101 > example of a problematic herd-project. The games project maintains
102 > a total of 877 packages, and sad to say many are in a really bad shape.
103 > Even if we presumed all developers were active, this gives us 175
104 > packages per person, and I seriously doubt one person can actively
105 > maintain that many programs. Add to that the fact that many of them are
106 > proprietary and fetch-restricted, and only the people possessing a copy
107 > can maintain it, and you see how blurry the package mapping is.
108 >
109 > Let's look at the next projects on the list. Proxy-maint is very
110 > specific as it proxies contributors; however, it is technically valid
111 > since all project members can (and should) actively proxy for any
112 > maintainers we have. Though I have to admit the number of maintained
113 > packages simply overburdens us.
114 >
115 > Haskell, Java, Ruby are other examples of projects focused on
116 > programming languages. KDE and GNOME projects generally make sense
117 > since packages maintained by those projects have many common features,
118 > and the core set has common upstream and sometimes synced releases. It
119 > is reasonable to assume members of those projects will maintain all, or
120 > at least majority of those packages.
121 >
122 > The next project is Sound -- and in my experience, it involves a lot of
123 > poorly maintained or unmaintained packages. Again, the problem is that
124 > the packages maintained by the project have little in common -- why
125 > would any single person maintain a dozen audio players, converters,
126 > libraries, etc. Having multiple people in project may increase
127 > the chance that they would happen to cover a larger set of competing
128 > packages but that's really more incidental than expected.
129 >
130 > This is basically how I'd summarize a difference between a valid
131 > project, and a herd-project. A valid project maintains packages that
132 > have many common properties, where it really makes sense for
133 > an arbitrarily chosen project member to take care of an arbitrary chosen
134 > package maintained by the project. A herd-project maintains packages
135 > that have only common topic, and usually means that an arbitrarily
136 > chosen project member maintains only a small subset of all packages
137 > maintained by the project.
138 >
139 > Looking further through the list, projects that seem to make sense
140 > include ROS, Emacs, maybe base-system, SELinux, ML, X11 (after all, it
141 > maintains core Xorg and nobody sets them as 'backup' maintainers for
142 > random X11 programs), PHP, vim...
143 >
144 > Project that are herd-like include science (possibly with all its
145 > flavors), netmon, video, desktop-misc (this is a very example of 'random
146 > programs'), graphics...
147 >
148 >
149 > What do I propose?
150 > ------------------
151 > I'd like to propose either disbanding herd-like projects entirely, or
152 > transforming them into more proper projects. Not only those that are
153 > clearly dysfunctional but also those that incidentally happen to work
154 > (e.g. because they maintain a few packages, or because they represent
155 > a single developer with wide interest).
156 >
157 > More specifically, I'd like each of the affected projects to choose
158 > between:
159 >
160 > a. disbanding the project entirely and finding individual maintainers
161 > for all packages,
162 >
163 > b. reducing the packages maintained by the project to a well-defined
164 > 'core set' whose maintenance by a group of developers makes sense,
165 > and finding individual maintainers for the remaining packages,
166 >
167 > c. splitting one or more smaller projects with well-defined scope from
168 > the project, and doing a. or b. for the remaining packages.
169 >
170 > Let's take a few examples. For a start, cron project. Previously, it
171 > maintained a number of different cron implementations (most having their
172 > individual maintainers by now), a cronbase package and cron.eclass.
173 > In this context, option a. means disbanding the project entirely. Some
174 > packages already have maintainers, others go maintainer-needed.
175 >
176 > Option b. would most likely involve leaving a cron project as small
177 > entity to provide policies for consistent cron handling, and maintain
178 > cronbase package and cron.eclass. Different cron implementation would
179 > go to individual maintainers anyway.
180 >
181 > A similar example can be made for the PAM project that maintained
182 > pambase, Linux-PAM, pam.eclass and some PAM modules. Here a. means
183 > giving all packages away, and b. means leaving a minimal project that
184 > maintains policies, pambase, Linux-PAM and the eclass. The individual
185 > modules (except for maybe very common, if there were some) would find
186 > individual maintainers.
187 >
188 > A good example for the c. option is the recently revived VoIP project.
189 > Again, this is an example of herd-project that tries to maintain
190 > an arbitrary set of loosely related packages. To some, it might make
191 > sense, especially since there's only a few VoIP packages left in Gentoo.
192 > Nevertheless, there is no reason why a single project member would
193 > maintain multiple competing VoIP stacks.
194 >
195 > Here, the c. option would mean creating project(s) for specific stacks
196 > of interest. For example, if there was specific project-level interest
197 > for maintaining Asterisk packages, an Asterisk project would make more
198 > sense than generic 'VoIP'.
199 >
200 >
201 > Why, again?
202 > -----------
203 > As I said before, the main problem with herds is that they introduce
204 > artificial and non-transparent relation between packages and package
205 > maintainers.
206 >
207
208 So back to this goal (which again I think is laudable.)
209
210
211 >
212 > Firstly, they usually tend to include packages that none of the project
213 > members is actually interested in maintaining. This also includes
214 > packages added by other developers (let's shove it in here, it matches
215 > their job description!) or packages leftover from other developers
216 > (where the project was backup maintainer). This means having a lot of
217 > packages that seem to have a maintainer but actually don't.
218 >
219
220 I have a lot of empathy for this point FWIW. Tooling can find empty /
221 abandoned projects, but we cannot do things like clearly say "This package
222 shouldn't be in this project"
223 or "This package is not actually maintained by a project".
224
225 One rule we might use here is that packages always need at least a single
226 human maintainer, and the project just an annotation; but doesn't affected
227 maintainer status.
228 So e.g. if there are 8 competing cron implementations, "cron-team" can't
229 maintain all 8, they have to find individual humans to vouch for each[0].
230
231
232 >
233 > Secondly, they frequently lack proper structure and handling of leaving
234 > members. Therefore, whenever a member maintaining a specific set of
235 > packages leaves, it is possible that the number of not-really-maintained
236 > packages increases.
237 >
238 > Thirdly, they tend to degenerate and become defunct (much more than
239 > projects that make sense). Then, the number of not-really-maintained
240 > packages ends up being really high.
241 >
242 > My goal here is to make sure that we have clear and correct information
243 > about package maintainers. Most notable, if a package has no active
244 > maintainer, we really need to have 'up for grabs' issued and package
245 > marked as maintainer-needed, rather than hidden behind some project
246 > whose members may not even be aware of the fact that they're its
247 > maintainers.
248 >
249
250 >
251 > What do you think?
252 >
253 >
254 [0] This is itself a question the project needs to decide for itself; does
255 every package need to be maintained actively? Some might answer no, and
256 maybe running for months / years without a maintainer is OK for Gentoo. Its
257 not an opinion I personally hold, but I suspect some community members do
258 hold it. Herds / Projects help Gentoo scale and enable 160 humans to
259 maintain 19,600 packages. Taking this away will likely affect the number of
260 packages in the tree as maintainers scale down their stake in the tree.
261
262 -A
263
264
265 > --
266 > Best regards,
267 > Michał Górny
268 >
269 >

Replies

Subject Author
Re: [gentoo-dev] Killing herds, again "Michał Górny" <mgorny@g.o>