1 |
Hello, |
2 |
|
3 |
I've submitted a proposal today about extending last year's GSoC |
4 |
project "Automatically generated overlay of R packages" [0] with focus |
5 |
on automated overlay maintenance. It's included in this mail for |
6 |
public review. Feel free to comment ;) |
7 |
|
8 |
Kind Regards, |
9 |
André Erdmann |
10 |
|
11 |
[0] http://git.overlays.gentoo.org/gitweb/?p=proj/R_overlay.git;a=summary |
12 |
|
13 |
=== proposal starts here === |
14 |
|
15 |
--- Table of contents --- |
16 |
|
17 |
1 Abstract |
18 |
2 Objective |
19 |
3 Implementation Ideas |
20 |
4 Deliverables |
21 |
5 Timeline |
22 |
6 Biography / About me |
23 |
7 Extra information |
24 |
|
25 |
--- Table of contents --- |
26 |
|
27 |
1 Abstract |
28 |
--- |
29 |
|
30 |
roverlay is a program that creates ebuilds for R packages and makes |
31 |
them available as overlay. It's the result of last year's GSoC project |
32 |
"Automatically generated overlay of R packages". |
33 |
|
34 |
This project will extend overlay creation and add automated overlay maintenance. |
35 |
|
36 |
|
37 |
2 Objective |
38 |
--- |
39 |
|
40 |
To give a short review of what has been done since the end of last |
41 |
year's GSoC, these features have been added: |
42 |
|
43 |
* faster Manifest file creation using the portage libs directly (still |
44 |
experimental) |
45 |
* creation of a package mirror directory ("overlay DISTDIR") |
46 |
* package rules that allow to control various aspects of package |
47 |
processing (=ebuild creation) |
48 |
|
49 |
Overall, roverlay's code size increased by about 30% (+3800 lines), |
50 |
whereas its user-targeted documentation increased by 22% (+500 lines). |
51 |
|
52 |
There's still work to do in order to get a fully automated overlay. As |
53 |
mentioned before, this project/proposal focuses on two major areas, |
54 |
(a) extending ebuild and overlay creation and (b) adding automated |
55 |
overlay maintenance, with the latter one being more important. |
56 |
|
57 |
The aim of automated overlay maintenance is to provide an overlay that |
58 |
can be deployed to end-users without requiring interaction by the |
59 |
overlay maintainer (ideally). This includes verification of ebuilds |
60 |
and the entire overlay (typically after running roverlay) as well as a |
61 |
simple status web page. A tinderbox approach may also be implemented |
62 |
(in addition to structural testing). |
63 |
|
64 |
As usual, proper documentation is considered essential. |
65 |
|
66 |
|
67 |
3 Implementation Ideas |
68 |
--- |
69 |
|
70 |
This section lists a few features/enhancements that I plan to |
71 |
implement. It's not a definite list of things that will be done, but |
72 |
rather gives you an idea of how the result will look like. |
73 |
|
74 |
|
75 |
3.1 Extending Ebuild/Overlay Creation |
76 |
|
77 |
|
78 |
3.1.1 Control Flow |
79 |
|
80 |
Currently, the package rules system is able to ignore an R package |
81 |
entirely and to set an ebuild's KEYWORDS variable. This feature will |
82 |
extend this by |
83 |
|
84 |
* "Relocate" packages |
85 |
|
86 |
-> change the category and/or name of an ebuild |
87 |
-> rename the (local) src files using arrows in SRC_URI |
88 |
This is necessary because R package names are sometimes too generic. |
89 |
|
90 |
* Modify ebuild variables at ebuild creation time via package rules |
91 |
* Patch ebuilds after creating the overlay (but before writing |
92 |
Manifest files and testing it) |
93 |
|
94 |
|
95 |
3.1.2 Misc Features |
96 |
|
97 |
* Add SLOT handling to dependency resolution |
98 |
|
99 |
* Replace the r_suggests USE flag with a USE_EXPAND variable so that |
100 |
users can select optional dependencies on a per-package basis |
101 |
|
102 |
* Bypass ebuild generation: insert hand-written ebuilds into the overlay |
103 |
|
104 |
Doing that at overlay creation time has two advantages: the inserted |
105 |
ebuild safely replaces any generated one and it won't be excluded from |
106 |
overlay verification. |
107 |
|
108 |
* Run incremental overlay creation for a given set of packages |
109 |
|
110 |
More importantly, regenerate ebuilds on tarball (checksum) change, |
111 |
which happens when upstream (CRAN etc.) changes a package's content |
112 |
without renaming it. Maybe revbump regenerated ebuilds which allows |
113 |
easy end-user upgrades. |
114 |
|
115 |
* Generate meaningful statistics for the web page / QA tools |
116 |
|
117 |
|
118 |
3.1.3 Console |
119 |
|
120 |
roverlay already features the depres console, which can be used to |
121 |
create and test dependency rules in a rather limited way. |
122 |
|
123 |
A reimplementation of this console would allow to control each step of |
124 |
overlay creation interactively, for example: |
125 |
|
126 |
* "forge" packages - create fake package information from user input |
127 |
* create ebuilds for packages, add them to an overlay |
128 |
* test subsystems like dependency resolution and package rules |
129 |
|
130 |
The new console would also feature a more user-friendly interface, |
131 |
e.g. readline support (-> tab completion). |
132 |
|
133 |
This is a nice-to-have feature and may be dropped in favor of others. |
134 |
|
135 |
|
136 |
3.2 Automated Overlay Maintenance |
137 |
|
138 |
The main goal here is to require as little maintainer interaction as possible. |
139 |
|
140 |
|
141 |
3.2.1 Overlay Verification |
142 |
|
143 |
* Structural testing: verify the overlay and all of its ebuilds |
144 |
(possibly using repoman.checks etc) |
145 |
|
146 |
-> Ensure that all dependencies of an ebuild are actually satisfiable |
147 |
|
148 |
* Black-box testing: tinderbox approach - (try to) build all packages (ebuilds) |
149 |
|
150 |
An ebuild that doesn't pass structural testing will be removed from |
151 |
the overlay. Test results will be logged and included in the status |
152 |
web page. |
153 |
|
154 |
|
155 |
3.2.2 Other QA Tools |
156 |
|
157 |
To be figured out. |
158 |
|
159 |
Currently, I plan to provide a script that reports the overlay's |
160 |
status and provides easy access to overlay statistics and log |
161 |
messages. |
162 |
|
163 |
This script should support several output formats, human readable text |
164 |
and html at least, so that it can serve as base for creating a status |
165 |
web page. |
166 |
|
167 |
|
168 |
3.2.3 Overlay Snapshots |
169 |
|
170 |
The idea here is to add version control to the overlay, which allows |
171 |
to restore a previous (known-to-work) state of the overlay. |
172 |
|
173 |
A real "known-to-work" state would also have to include all R |
174 |
packages, but that's not practical (could be solved with hard links / |
175 |
file copies, though). |
176 |
|
177 |
The main purpose of this feature is to recover from roverlay.py |
178 |
failure, e.g. if the result doesn't meet one's expectations, without |
179 |
the need to regenerate the entire overlay (which could easily take |
180 |
hours). |
181 |
|
182 |
Possible solutions: |
183 |
|
184 |
1. use a full-featured version control system, e.g. git. |
185 |
|
186 |
The drawback of this solution is that git's history will needlessly |
187 |
grow (as mentioned above, there's no real "known-to-work" state, so a |
188 |
complete history is useless). This could be remedied by use of |
189 |
git-filter-branch, but it would add a layer of complexity without any |
190 |
real advantage. |
191 |
|
192 |
2. create tarballs and keep a certain set of them, e.g. keep the last |
193 |
seven days (or roverlay.py runs) plus one tarball for each of the last |
194 |
12 weeks. |
195 |
|
196 |
I'd definitely opt for 2 here. |
197 |
|
198 |
|
199 |
3.3 User Tools |
200 |
|
201 |
Generally, users are expected to simply add the overlay with layman |
202 |
and use it like any other, but certain cases might require user |
203 |
interaction: |
204 |
|
205 |
* upstream changes the package's content without renaming it (see 3.1.2) |
206 |
|
207 |
This invalidates package files downloaded by the user. Rebuilding |
208 |
these packages (ebuilds) may be advantageous, too. |
209 |
|
210 |
Removing and/or refetching the files is rather easy, a postsync script |
211 |
could do the job, possibly with code-side support in roverlay.py to |
212 |
speed things up (i.e., don't scan the whole overlay after each sync). |
213 |
|
214 |
The tricky part is to determine the list of packages that should be |
215 |
rebuilt. Simply rebuilding any ebuild whose package file has been |
216 |
removed from ${DISTDIR} is not accurate for at least two reasons |
217 |
(package not installed, user does not keep distfiles). A possible |
218 |
solution is to rev-bump ebuilds in roverlay.py whenever upstream |
219 |
changes a package. |
220 |
|
221 |
|
222 |
4 Deliverables |
223 |
--- |
224 |
|
225 |
Coding will be done in Python as that's the programming language in |
226 |
which roverlay.py is written. Some tools may be written as shell |
227 |
scripts, e.g. the user refetch tool. Documentation will continue to |
228 |
use reStructuredText. |
229 |
|
230 |
4.1 Final (September 23) |
231 |
|
232 |
ebuild/overlay creation (roverlay.py): |
233 |
all features as listed in 3.1.1 and 3.1.2 |
234 |
roverlay console |
235 |
|
236 |
automated overlay maintenance: |
237 |
overlay verification using both structural testing and tinderboxing |
238 |
snapshot create/restore functionality |
239 |
overlay status script and web page |
240 |
|
241 |
user tools: refetch tool |
242 |
|
243 |
documentation: |
244 |
roverlay.py's new features and automated overlay maintenance fully documented |
245 |
The user refetch tool probably doesn't need much in-depth documentation |
246 |
|
247 |
|
248 |
|
249 |
4.2 Mid-term (July 29) |
250 |
|
251 |
ebuild/overlay creation (roverlay.py): |
252 |
all features as listed in 3.1.1 and 3.1.2 |
253 |
|
254 |
automated overlay maintenance: |
255 |
overlay verification using structural testing |
256 |
snapshot create/restore functionality |
257 |
basic overlay status script |
258 |
|
259 |
user tools: refetch tool |
260 |
|
261 |
documentation: partially done, roverlay.py's new features documented |
262 |
(in doc/rst/usage.rst) |
263 |
|
264 |
|
265 |
|
266 |
5 Timeline |
267 |
--- |
268 |
|
269 |
* May 28 - Jun 16: implement control flow features as described in 3.1.1 |
270 |
* Jun 17 - Jun 30: misc features as listed in 3.1.2 |
271 |
* Jul 1 - Jul 14: add structural testing |
272 |
* Jul 15 - Jul 21: write the user refetch and overlay snapshot/restore tools |
273 |
* Jul 22 - Jul 28: basic qa script |
274 |
* Jul 29 - Aug 4: Mid-term evaluations / write documentation |
275 |
* Aug 5 - Aug 18: add tinderboxing |
276 |
* Aug 19 - Aug 25: extend the qa script, simple status web page |
277 |
* Aug 26 - Sep 8: roverlay console (see 3.1.3) |
278 |
* Sep 9 - Sep 23 (Mo): write/improve documentation |
279 |
|
280 |
|
281 |
6 Biography / About me |
282 |
--- |
283 |
|
284 |
I'm a twenty-one year old undergraduate student from Stuttgart, |
285 |
Germany. My major subject is Computer Science in which I'll get my |
286 |
bachelor's degree in 2014. |
287 |
|
288 |
I've been using Gentoo for 4 years now and I'm about to become an |
289 |
official dev in near future. As for other open source activities, I |
290 |
contribute to the TLP project (https://github.com/linrunner/TLP) on a |
291 |
regular basis (since mid 2011), sometimes in form of patches, but |
292 |
normally (and more important, in my opinion) by doing code reviews. I |
293 |
also maintain a small portage overlay for TLP |
294 |
(https://github.com/dywisor/tlp-portage). |
295 |
|
296 |
|
297 |
7 Extra information |
298 |
--- |
299 |
|
300 |
7.1 Use the tools that you will use in your project to make changes to code |
301 |
|
302 |
My bug tracker activity is low. I've recently reported a minor build |
303 |
issue and proposed a patch (bug #467728, |
304 |
https://bugs.gentoo.org/show_bug.cgi?id=467728). |
305 |
|
306 |
|
307 |
7.2 Participate in our development community |
308 |
|
309 |
You can find a mailing list entry from me at <this mail>. |
310 |
|
311 |
|
312 |
7.3 Contact Info |
313 |
|
314 |
email : dywi at mailerd.de |
315 |
irc : dywi at irc.freenode.net |
316 |
|
317 |
home mailing address: <removed> |
318 |
phone number: <removed> |
319 |
|
320 |
|
321 |
7.4 Working hours |
322 |
|
323 |
Mo - Sa, 8 am - 9 pm UTC |
324 |
|
325 |
Actual working time sums up to about 20 hours per week from May 28 |
326 |
until Jul 20 and then 35hours/week. |