1 |
On Wed, Apr 3, 2019 at 1:36 PM Michał Górny <mgorny@g.o> wrote: |
2 |
|
3 |
> Hello, everyone. |
4 |
> |
5 |
> Back in 2016, we've killed the technical representation of herds. Some |
6 |
> of them were disbanded completely, others merged with existing projects |
7 |
> or converted into new projects. This solved some of the problems with |
8 |
> maintainer declarations but it didn't solve the most important problem |
9 |
> herds posed. Sadly, it seems that the spirit of herds survived along |
10 |
> with those problems. |
11 |
> |
12 |
> Herds served as a method of grouping packages by a common topic, |
13 |
> somewhat similar (but usually more broadly) than categories. In their |
14 |
> mature state, herds had either their specific maintainers, or were |
15 |
> directly connected to projects (which in turn provided maintainers for |
16 |
> the herds). Today, we still have many herds that are masked either |
17 |
> as complete projects, or semi-projects (i.e. project entries without |
18 |
> explicit lead, policies or anything else). |
19 |
> |
20 |
> |
21 |
> What's wrong with herds? |
22 |
> ------------------------ |
23 |
> The main problem with herds is that they represent an artificial |
24 |
> relation between packages. The only common thing about them is topic, |
25 |
> and there is no real reason why a group of people would maintain all |
26 |
> packages regarding the same topic. In fact, it is absurd -- say, why |
27 |
> would a single person maintain 10+ competing cron implementations? |
28 |
> Surely, there is some common knowledge related to running cron, |
29 |
> and it is entirely possible that a single person would use a few |
30 |
> different cron implementations on different systems. But that doesn't |
31 |
> justify creating an artificial project to maintain all cron |
32 |
> implementations. |
33 |
> |
34 |
> Mapping this to reality, projects usually represent a few developers, |
35 |
> each of them interested in a specific subset of packages maintained by |
36 |
> the project. In some cases, this is explicitly noted as project member |
37 |
> roles; in other, it is not stated clearly anywhere. In both cases, |
38 |
> there is usually some group of packages that are assigned to |
39 |
> the specific project but not maintained by any of the project members. |
40 |
> |
41 |
> Less structured projects often have problems tracking member activity. |
42 |
> More than once a project effectively died when all members became |
43 |
> inactive, yet effectively hid the fact that the relevant packages were |
44 |
> unmaintained and sometimes discouraged more timid developers from fixing |
45 |
> bugs. |
46 |
> |
47 |
|
48 |
I'm not sure I follow this logic. |
49 |
|
50 |
1) We know who is in every project. |
51 |
2) We know the state of every developer. |
52 |
|
53 |
We should be able to detect if: |
54 |
|
55 |
a) A project has empty. |
56 |
b) A project has no active developers. |
57 |
|
58 |
I don't see how this is markedly different from a package, assigned to no |
59 |
maintainer. Or a package assigned to a maintainer who is not active. |
60 |
|
61 |
So the solution here seems to be to fix the tools to detect this situation |
62 |
and make it clearer that the package has no active maintainer? |
63 |
|
64 |
(I tend to agree with your general thrust of the rest of the proposal, but |
65 |
I think in general limiting how projects are used on a statutory basis |
66 |
seems incorrect.) |
67 |
|
68 |
|
69 |
> |
70 |
> |
71 |
> What kind of projects make sense? |
72 |
> --------------------------------- |
73 |
> If we are to fight herd-like projects, I think it is important to |
74 |
> consider a bit what kind of projects make sense, and what form herd-like |
75 |
> trouble. |
76 |
> |
77 |
> The two projects maintaining the largest number of packages in Gentoo |
78 |
> are respectively the Perl project and the Python project. Strictly |
79 |
> speaking, both could be considered herd-like -- after all, they maintain |
80 |
> a lot of packages belonging to the same category. To some degree, this |
81 |
> is true. However, I believe those make sense because: |
82 |
> |
83 |
> a. They maintain a central group of packages, eclasses, policies etc. |
84 |
> related to writing ebuilds using the specific programming language, |
85 |
> and help other developers with it. The existence of such a project is |
86 |
> really useful. |
87 |
> |
88 |
> b. The packages maintained by them have many common properties, |
89 |
> frequently come from common sources (CPAN, pypi) and that makes it |
90 |
> possible for a large number of developers to actually maintain all |
91 |
> of them. |
92 |
> |
93 |
> The Python project I know better, so I'll add something. It does not |
94 |
> accept all Python packages (although some developers insist on adding us |
95 |
> to them without asking), and especially not random programs written in |
96 |
> the Python language. It specifically focuses on Python module packages, |
97 |
> i.e. resources generally useful to Python programmers. This is what |
98 |
> makes it different from a common herd project. |
99 |
> |
100 |
> The third biggest project in Gentoo is -- in my opinion -- a perfect |
101 |
> example of a problematic herd-project. The games project maintains |
102 |
> a total of 877 packages, and sad to say many are in a really bad shape. |
103 |
> Even if we presumed all developers were active, this gives us 175 |
104 |
> packages per person, and I seriously doubt one person can actively |
105 |
> maintain that many programs. Add to that the fact that many of them are |
106 |
> proprietary and fetch-restricted, and only the people possessing a copy |
107 |
> can maintain it, and you see how blurry the package mapping is. |
108 |
> |
109 |
> Let's look at the next projects on the list. Proxy-maint is very |
110 |
> specific as it proxies contributors; however, it is technically valid |
111 |
> since all project members can (and should) actively proxy for any |
112 |
> maintainers we have. Though I have to admit the number of maintained |
113 |
> packages simply overburdens us. |
114 |
> |
115 |
> Haskell, Java, Ruby are other examples of projects focused on |
116 |
> programming languages. KDE and GNOME projects generally make sense |
117 |
> since packages maintained by those projects have many common features, |
118 |
> and the core set has common upstream and sometimes synced releases. It |
119 |
> is reasonable to assume members of those projects will maintain all, or |
120 |
> at least majority of those packages. |
121 |
> |
122 |
> The next project is Sound -- and in my experience, it involves a lot of |
123 |
> poorly maintained or unmaintained packages. Again, the problem is that |
124 |
> the packages maintained by the project have little in common -- why |
125 |
> would any single person maintain a dozen audio players, converters, |
126 |
> libraries, etc. Having multiple people in project may increase |
127 |
> the chance that they would happen to cover a larger set of competing |
128 |
> packages but that's really more incidental than expected. |
129 |
> |
130 |
> This is basically how I'd summarize a difference between a valid |
131 |
> project, and a herd-project. A valid project maintains packages that |
132 |
> have many common properties, where it really makes sense for |
133 |
> an arbitrarily chosen project member to take care of an arbitrary chosen |
134 |
> package maintained by the project. A herd-project maintains packages |
135 |
> that have only common topic, and usually means that an arbitrarily |
136 |
> chosen project member maintains only a small subset of all packages |
137 |
> maintained by the project. |
138 |
> |
139 |
> Looking further through the list, projects that seem to make sense |
140 |
> include ROS, Emacs, maybe base-system, SELinux, ML, X11 (after all, it |
141 |
> maintains core Xorg and nobody sets them as 'backup' maintainers for |
142 |
> random X11 programs), PHP, vim... |
143 |
> |
144 |
> Project that are herd-like include science (possibly with all its |
145 |
> flavors), netmon, video, desktop-misc (this is a very example of 'random |
146 |
> programs'), graphics... |
147 |
> |
148 |
> |
149 |
> What do I propose? |
150 |
> ------------------ |
151 |
> I'd like to propose either disbanding herd-like projects entirely, or |
152 |
> transforming them into more proper projects. Not only those that are |
153 |
> clearly dysfunctional but also those that incidentally happen to work |
154 |
> (e.g. because they maintain a few packages, or because they represent |
155 |
> a single developer with wide interest). |
156 |
> |
157 |
> More specifically, I'd like each of the affected projects to choose |
158 |
> between: |
159 |
> |
160 |
> a. disbanding the project entirely and finding individual maintainers |
161 |
> for all packages, |
162 |
> |
163 |
> b. reducing the packages maintained by the project to a well-defined |
164 |
> 'core set' whose maintenance by a group of developers makes sense, |
165 |
> and finding individual maintainers for the remaining packages, |
166 |
> |
167 |
> c. splitting one or more smaller projects with well-defined scope from |
168 |
> the project, and doing a. or b. for the remaining packages. |
169 |
> |
170 |
> Let's take a few examples. For a start, cron project. Previously, it |
171 |
> maintained a number of different cron implementations (most having their |
172 |
> individual maintainers by now), a cronbase package and cron.eclass. |
173 |
> In this context, option a. means disbanding the project entirely. Some |
174 |
> packages already have maintainers, others go maintainer-needed. |
175 |
> |
176 |
> Option b. would most likely involve leaving a cron project as small |
177 |
> entity to provide policies for consistent cron handling, and maintain |
178 |
> cronbase package and cron.eclass. Different cron implementation would |
179 |
> go to individual maintainers anyway. |
180 |
> |
181 |
> A similar example can be made for the PAM project that maintained |
182 |
> pambase, Linux-PAM, pam.eclass and some PAM modules. Here a. means |
183 |
> giving all packages away, and b. means leaving a minimal project that |
184 |
> maintains policies, pambase, Linux-PAM and the eclass. The individual |
185 |
> modules (except for maybe very common, if there were some) would find |
186 |
> individual maintainers. |
187 |
> |
188 |
> A good example for the c. option is the recently revived VoIP project. |
189 |
> Again, this is an example of herd-project that tries to maintain |
190 |
> an arbitrary set of loosely related packages. To some, it might make |
191 |
> sense, especially since there's only a few VoIP packages left in Gentoo. |
192 |
> Nevertheless, there is no reason why a single project member would |
193 |
> maintain multiple competing VoIP stacks. |
194 |
> |
195 |
> Here, the c. option would mean creating project(s) for specific stacks |
196 |
> of interest. For example, if there was specific project-level interest |
197 |
> for maintaining Asterisk packages, an Asterisk project would make more |
198 |
> sense than generic 'VoIP'. |
199 |
> |
200 |
> |
201 |
> Why, again? |
202 |
> ----------- |
203 |
> As I said before, the main problem with herds is that they introduce |
204 |
> artificial and non-transparent relation between packages and package |
205 |
> maintainers. |
206 |
> |
207 |
|
208 |
So back to this goal (which again I think is laudable.) |
209 |
|
210 |
|
211 |
> |
212 |
> Firstly, they usually tend to include packages that none of the project |
213 |
> members is actually interested in maintaining. This also includes |
214 |
> packages added by other developers (let's shove it in here, it matches |
215 |
> their job description!) or packages leftover from other developers |
216 |
> (where the project was backup maintainer). This means having a lot of |
217 |
> packages that seem to have a maintainer but actually don't. |
218 |
> |
219 |
|
220 |
I have a lot of empathy for this point FWIW. Tooling can find empty / |
221 |
abandoned projects, but we cannot do things like clearly say "This package |
222 |
shouldn't be in this project" |
223 |
or "This package is not actually maintained by a project". |
224 |
|
225 |
One rule we might use here is that packages always need at least a single |
226 |
human maintainer, and the project just an annotation; but doesn't affected |
227 |
maintainer status. |
228 |
So e.g. if there are 8 competing cron implementations, "cron-team" can't |
229 |
maintain all 8, they have to find individual humans to vouch for each[0]. |
230 |
|
231 |
|
232 |
> |
233 |
> Secondly, they frequently lack proper structure and handling of leaving |
234 |
> members. Therefore, whenever a member maintaining a specific set of |
235 |
> packages leaves, it is possible that the number of not-really-maintained |
236 |
> packages increases. |
237 |
> |
238 |
> Thirdly, they tend to degenerate and become defunct (much more than |
239 |
> projects that make sense). Then, the number of not-really-maintained |
240 |
> packages ends up being really high. |
241 |
> |
242 |
> My goal here is to make sure that we have clear and correct information |
243 |
> about package maintainers. Most notable, if a package has no active |
244 |
> maintainer, we really need to have 'up for grabs' issued and package |
245 |
> marked as maintainer-needed, rather than hidden behind some project |
246 |
> whose members may not even be aware of the fact that they're its |
247 |
> maintainers. |
248 |
> |
249 |
|
250 |
> |
251 |
> What do you think? |
252 |
> |
253 |
> |
254 |
[0] This is itself a question the project needs to decide for itself; does |
255 |
every package need to be maintained actively? Some might answer no, and |
256 |
maybe running for months / years without a maintainer is OK for Gentoo. Its |
257 |
not an opinion I personally hold, but I suspect some community members do |
258 |
hold it. Herds / Projects help Gentoo scale and enable 160 humans to |
259 |
maintain 19,600 packages. Taking this away will likely affect the number of |
260 |
packages in the tree as maintainers scale down their stake in the tree. |
261 |
|
262 |
-A |
263 |
|
264 |
|
265 |
> -- |
266 |
> Best regards, |
267 |
> Michał Górny |
268 |
> |
269 |
> |