Gentoo Archives: gentoo-soc

From: "André Erdmann" <dywi@×××××××.de>
To: gentoo-soc@l.g.o
Cc: Denis Dupeyron <calchan@g.o>
Subject: [gentoo-soc] Proposal: R_Overlay: Automated overlay maintenance
Date: Thu, 02 May 2013 22:17:57
Message-Id: CAGrucu3EDCEp8ShbRGU-xefZGuTPksP0umjwZmyFFsKLYbuiXg@mail.gmail.com
1 Hello,
2
3 I've submitted a proposal today about extending last year's GSoC
4 project "Automatically generated overlay of R packages" [0] with focus
5 on automated overlay maintenance. It's included in this mail for
6 public review. Feel free to comment ;)
7
8 Kind Regards,
9 André Erdmann
10
11 [0] http://git.overlays.gentoo.org/gitweb/?p=proj/R_overlay.git;a=summary
12
13 === proposal starts here ===
14
15 --- Table of contents ---
16
17 1 Abstract
18 2 Objective
19 3 Implementation Ideas
20 4 Deliverables
21 5 Timeline
22 6 Biography / About me
23 7 Extra information
24
25 --- Table of contents ---
26
27 1 Abstract
28 ---
29
30 roverlay is a program that creates ebuilds for R packages and makes
31 them available as overlay. It's the result of last year's GSoC project
32 "Automatically generated overlay of R packages".
33
34 This project will extend overlay creation and add automated overlay maintenance.
35
36
37 2 Objective
38 ---
39
40 To give a short review of what has been done since the end of last
41 year's GSoC, these features have been added:
42
43 * faster Manifest file creation using the portage libs directly (still
44 experimental)
45 * creation of a package mirror directory ("overlay DISTDIR")
46 * package rules that allow to control various aspects of package
47 processing (=ebuild creation)
48
49 Overall, roverlay's code size increased by about 30% (+3800 lines),
50 whereas its user-targeted documentation increased by 22% (+500 lines).
51
52 There's still work to do in order to get a fully automated overlay. As
53 mentioned before, this project/proposal focuses on two major areas,
54 (a) extending ebuild and overlay creation and (b) adding automated
55 overlay maintenance, with the latter one being more important.
56
57 The aim of automated overlay maintenance is to provide an overlay that
58 can be deployed to end-users without requiring interaction by the
59 overlay maintainer (ideally). This includes verification of ebuilds
60 and the entire overlay (typically after running roverlay) as well as a
61 simple status web page. A tinderbox approach may also be implemented
62 (in addition to structural testing).
63
64 As usual, proper documentation is considered essential.
65
66
67 3 Implementation Ideas
68 ---
69
70 This section lists a few features/enhancements that I plan to
71 implement. It's not a definite list of things that will be done, but
72 rather gives you an idea of how the result will look like.
73
74
75 3.1 Extending Ebuild/Overlay Creation
76
77
78 3.1.1 Control Flow
79
80 Currently, the package rules system is able to ignore an R package
81 entirely and to set an ebuild's KEYWORDS variable. This feature will
82 extend this by
83
84 * "Relocate" packages
85
86 -> change the category and/or name of an ebuild
87 -> rename the (local) src files using arrows in SRC_URI
88 This is necessary because R package names are sometimes too generic.
89
90 * Modify ebuild variables at ebuild creation time via package rules
91 * Patch ebuilds after creating the overlay (but before writing
92 Manifest files and testing it)
93
94
95 3.1.2 Misc Features
96
97 * Add SLOT handling to dependency resolution
98
99 * Replace the r_suggests USE flag with a USE_EXPAND variable so that
100 users can select optional dependencies on a per-package basis
101
102 * Bypass ebuild generation: insert hand-written ebuilds into the overlay
103
104 Doing that at overlay creation time has two advantages: the inserted
105 ebuild safely replaces any generated one and it won't be excluded from
106 overlay verification.
107
108 * Run incremental overlay creation for a given set of packages
109
110 More importantly, regenerate ebuilds on tarball (checksum) change,
111 which happens when upstream (CRAN etc.) changes a package's content
112 without renaming it. Maybe revbump regenerated ebuilds which allows
113 easy end-user upgrades.
114
115 * Generate meaningful statistics for the web page / QA tools
116
117
118 3.1.3 Console
119
120 roverlay already features the depres console, which can be used to
121 create and test dependency rules in a rather limited way.
122
123 A reimplementation of this console would allow to control each step of
124 overlay creation interactively, for example:
125
126 * "forge" packages - create fake package information from user input
127 * create ebuilds for packages, add them to an overlay
128 * test subsystems like dependency resolution and package rules
129
130 The new console would also feature a more user-friendly interface,
131 e.g. readline support (-> tab completion).
132
133 This is a nice-to-have feature and may be dropped in favor of others.
134
135
136 3.2 Automated Overlay Maintenance
137
138 The main goal here is to require as little maintainer interaction as possible.
139
140
141 3.2.1 Overlay Verification
142
143 * Structural testing: verify the overlay and all of its ebuilds
144 (possibly using repoman.checks etc)
145
146 -> Ensure that all dependencies of an ebuild are actually satisfiable
147
148 * Black-box testing: tinderbox approach - (try to) build all packages (ebuilds)
149
150 An ebuild that doesn't pass structural testing will be removed from
151 the overlay. Test results will be logged and included in the status
152 web page.
153
154
155 3.2.2 Other QA Tools
156
157 To be figured out.
158
159 Currently, I plan to provide a script that reports the overlay's
160 status and provides easy access to overlay statistics and log
161 messages.
162
163 This script should support several output formats, human readable text
164 and html at least, so that it can serve as base for creating a status
165 web page.
166
167
168 3.2.3 Overlay Snapshots
169
170 The idea here is to add version control to the overlay, which allows
171 to restore a previous (known-to-work) state of the overlay.
172
173 A real "known-to-work" state would also have to include all R
174 packages, but that's not practical (could be solved with hard links /
175 file copies, though).
176
177 The main purpose of this feature is to recover from roverlay.py
178 failure, e.g. if the result doesn't meet one's expectations, without
179 the need to regenerate the entire overlay (which could easily take
180 hours).
181
182 Possible solutions:
183
184 1. use a full-featured version control system, e.g. git.
185
186 The drawback of this solution is that git's history will needlessly
187 grow (as mentioned above, there's no real "known-to-work" state, so a
188 complete history is useless). This could be remedied by use of
189 git-filter-branch, but it would add a layer of complexity without any
190 real advantage.
191
192 2. create tarballs and keep a certain set of them, e.g. keep the last
193 seven days (or roverlay.py runs) plus one tarball for each of the last
194 12 weeks.
195
196 I'd definitely opt for 2 here.
197
198
199 3.3 User Tools
200
201 Generally, users are expected to simply add the overlay with layman
202 and use it like any other, but certain cases might require user
203 interaction:
204
205 * upstream changes the package's content without renaming it (see 3.1.2)
206
207 This invalidates package files downloaded by the user. Rebuilding
208 these packages (ebuilds) may be advantageous, too.
209
210 Removing and/or refetching the files is rather easy, a postsync script
211 could do the job, possibly with code-side support in roverlay.py to
212 speed things up (i.e., don't scan the whole overlay after each sync).
213
214 The tricky part is to determine the list of packages that should be
215 rebuilt. Simply rebuilding any ebuild whose package file has been
216 removed from ${DISTDIR} is not accurate for at least two reasons
217 (package not installed, user does not keep distfiles). A possible
218 solution is to rev-bump ebuilds in roverlay.py whenever upstream
219 changes a package.
220
221
222 4 Deliverables
223 ---
224
225 Coding will be done in Python as that's the programming language in
226 which roverlay.py is written. Some tools may be written as shell
227 scripts, e.g. the user refetch tool. Documentation will continue to
228 use reStructuredText.
229
230 4.1 Final (September 23)
231
232 ebuild/overlay creation (roverlay.py):
233 all features as listed in 3.1.1 and 3.1.2
234 roverlay console
235
236 automated overlay maintenance:
237 overlay verification using both structural testing and tinderboxing
238 snapshot create/restore functionality
239 overlay status script and web page
240
241 user tools: refetch tool
242
243 documentation:
244 roverlay.py's new features and automated overlay maintenance fully documented
245 The user refetch tool probably doesn't need much in-depth documentation
246
247
248
249 4.2 Mid-term (July 29)
250
251 ebuild/overlay creation (roverlay.py):
252 all features as listed in 3.1.1 and 3.1.2
253
254 automated overlay maintenance:
255 overlay verification using structural testing
256 snapshot create/restore functionality
257 basic overlay status script
258
259 user tools: refetch tool
260
261 documentation: partially done, roverlay.py's new features documented
262 (in doc/rst/usage.rst)
263
264
265
266 5 Timeline
267 ---
268
269 * May 28 - Jun 16: implement control flow features as described in 3.1.1
270 * Jun 17 - Jun 30: misc features as listed in 3.1.2
271 * Jul 1 - Jul 14: add structural testing
272 * Jul 15 - Jul 21: write the user refetch and overlay snapshot/restore tools
273 * Jul 22 - Jul 28: basic qa script
274 * Jul 29 - Aug 4: Mid-term evaluations / write documentation
275 * Aug 5 - Aug 18: add tinderboxing
276 * Aug 19 - Aug 25: extend the qa script, simple status web page
277 * Aug 26 - Sep 8: roverlay console (see 3.1.3)
278 * Sep 9 - Sep 23 (Mo): write/improve documentation
279
280
281 6 Biography / About me
282 ---
283
284 I'm a twenty-one year old undergraduate student from Stuttgart,
285 Germany. My major subject is Computer Science in which I'll get my
286 bachelor's degree in 2014.
287
288 I've been using Gentoo for 4 years now and I'm about to become an
289 official dev in near future. As for other open source activities, I
290 contribute to the TLP project (https://github.com/linrunner/TLP) on a
291 regular basis (since mid 2011), sometimes in form of patches, but
292 normally (and more important, in my opinion) by doing code reviews. I
293 also maintain a small portage overlay for TLP
294 (https://github.com/dywisor/tlp-portage).
295
296
297 7 Extra information
298 ---
299
300 7.1 Use the tools that you will use in your project to make changes to code
301
302 My bug tracker activity is low. I've recently reported a minor build
303 issue and proposed a patch (bug #467728,
304 https://bugs.gentoo.org/show_bug.cgi?id=467728).
305
306
307 7.2 Participate in our development community
308
309 You can find a mailing list entry from me at <this mail>.
310
311
312 7.3 Contact Info
313
314 email : dywi at mailerd.de
315 irc : dywi at irc.freenode.net
316
317 home mailing address: <removed>
318 phone number: <removed>
319
320
321 7.4 Working hours
322
323 Mo - Sa, 8 am - 9 pm UTC
324
325 Actual working time sums up to about 20 hours per week from May 28
326 until Jul 20 and then 35hours/week.