1 |
Hi, |
2 |
|
3 |
I've just submitted my proposal (included in this mail) for this |
4 |
year's gsoc project idea 'Automatically generated overlay of R |
5 |
packages'. Feel free to comment and criticize, it helps me a lot to |
6 |
improve it. |
7 |
|
8 |
Kind Regards, |
9 |
Andre Erdmann |
10 |
|
11 |
<<< |
12 |
--- Table of contents --- |
13 |
1 Abstract |
14 |
2 Objective |
15 |
3 Implementation ideas |
16 |
3.1 General |
17 |
3.1.1 Distributing and maintaining the overlay |
18 |
3.1.2 Package mirror |
19 |
3.1.3 Scripts |
20 |
|
21 |
3.2 Getting packages & information from CRAN, Bioconductor |
22 |
3.2.1 Fixing package metadata |
23 |
3.2.2 Getting packages from BIOC |
24 |
3.2.3 Getting packages from CRAN |
25 |
|
26 |
3.3 Satisfying dependencies |
27 |
3.3.1 R package deps |
28 |
3.3.2 system deps |
29 |
3.3.3 optional deps (sys or R-pkg) |
30 |
3.3.4 Additional thoughts |
31 |
|
32 |
3.4 Generating ebuilds |
33 |
|
34 |
4 Deliverables |
35 |
4.1 Final (August 20) |
36 |
4.2 Mid-Term (July 9) |
37 |
|
38 |
5 Timeline |
39 |
6 Biography / About me |
40 |
7 Extra information |
41 |
7.1 Use the tools that you will use in your project to make changes to code |
42 |
7.2 Participate in our development community |
43 |
7.3 Contact info |
44 |
7.4 Working hours |
45 |
|
46 |
--- Table of contents --- |
47 |
|
48 |
1 Abstract |
49 |
--- |
50 |
The aim of this project is to create an overlay that contains |
51 |
automatically generated ebuilds for R packages from CRAN and BOIC. |
52 |
|
53 |
2 Objective |
54 |
--- |
55 |
Creating an ebuild for an R package means |
56 |
* check for newer versions |
57 |
* discover all dependencies (system deps, too) |
58 |
* fix package metadata if required |
59 |
* upload the R package to a mirror |
60 |
* write the ebuild and create/update the Manifest file. |
61 |
It's impossible to do that for all R packages from CRAN/BOIC due to |
62 |
the high number of available packages. Using scripts to automate the |
63 |
process addresses this issue and can be used to create an overlay. |
64 |
|
65 |
Clean code, proper logging and good documentation are key components |
66 |
of this project since integration into the Gentoo infrastructure is |
67 |
intended. |
68 |
|
69 |
|
70 |
3 Implementation ideas |
71 |
--- |
72 |
3.1 General |
73 |
|
74 |
3.1.1 Distributing and maintaining the overlay |
75 |
|
76 |
Using git offers easy updates/rollbacks via git push/pull/checkout/... |
77 |
and would allow to separate overlay creation/update from overlay |
78 |
hosting. |
79 |
|
80 |
the abstract overlay layout would be: |
81 |
/dev-R/ |
82 |
/dev-R/<origin>-<package_name>/ |
83 |
/dev-R/<origin>-<package_name>/<origin>-<package-name>-<${PVR}>.ebuild |
84 |
/dev-R/<origin>-<package_name>/{[metadata.xml,],ChangeLog,Manifest} |
85 |
/eclass/ |
86 |
/eclass/<files that help, e.g. r_overlay-fix-metadata.eclass> |
87 |
/profiles/... |
88 |
|
89 |
example: /dev-R/CRAN-numDeriv/CRAN-numDeriv-2012.3.1.ebuild |
90 |
|
91 |
3.1.2 Package mirror |
92 |
|
93 |
To be figured out. |
94 |
Probably nothing special, could be some directory/file tree structure like |
95 |
"<r-overlay_distfiles-root>/dev-R/<portage-atom_base>/<portage-atom_base>-<${PVR}>.tar.gz" |
96 |
|
97 |
3.1.3 Scripts |
98 |
|
99 |
Will be implemented using bash or perl and kept/developed in an extra git repo. |
100 |
An ebuild that installs the whole script set will be part of the gsoc |
101 |
project, which means that an overlay client could be used to maintain |
102 |
the overlay (with proper write/push permissions on the server). |
103 |
There will be an ebuild-generator, src_package-to-mirror uploader |
104 |
(rsync or mv/cp, ...), a cran/boic importer ("syncer"), test scripts, |
105 |
a dependency resolution map etc.. |
106 |
Logging will be done in a machine/parse-friendly way and a converter |
107 |
into a more human-readable format will be supplied, too. |
108 |
|
109 |
3.2 Getting packages & information from CRAN, Bioconductor |
110 |
(and possibly Omegahat etc. but this is not part of the gsoc project) |
111 |
|
112 |
The DESCRIPTION file contains all(?) information that is required to |
113 |
generate an ebuild for a package. |
114 |
|
115 |
3.2.1 Fixing package metadata |
116 |
|
117 |
Package metadata could be fixed while emerging (using an eclass) or at |
118 |
overlay creation/update-time (which requires gentoo-specific source |
119 |
tarballs). |
120 |
Further discussion has to be made in order to choose the right option here; |
121 |
I would choose the former one and add flags for a metadata-fixer |
122 |
eclass ("E_ROVERLAY_FIXMETADATA='this that etc'") to the generated |
123 |
ebuild, leaving the package tarballs unchanged. |
124 |
|
125 |
3.2.2 Getting packages from BIOC |
126 |
|
127 |
BIOC offers all source packages in a tree structure (<pkg_name/...>) |
128 |
via svn, making an update check as easy as 'svn status --show-updates' |
129 |
(or similar). |
130 |
The DESCRIPTION file can be parsed directly; tarball creation is |
131 |
necessary for the distfiles mirror, so no specific pros/cons regarding |
132 |
3.2.1. |
133 |
|
134 |
3.2.3 Getting packages from CRAN |
135 |
|
136 |
CRAN source packages can be mirrored as tarballs (containing the |
137 |
package files) using rsync. Each tarball has to be extracted (at least |
138 |
partially, see 3.2.1). |
139 |
|
140 |
|
141 |
3.3 Satisfying dependencies |
142 |
|
143 |
Relevant fields in the DESCRIPTION file are 'Depends, [Suggests], |
144 |
[SystemRequirements]', maybe more ('[Enhances]',...). |
145 |
|
146 |
3.3.1 R package deps |
147 |
|
148 |
a Create all ebuilds blindly (don't check these deps) and perform |
149 |
a consistency check ("can satisfy") afterwards |
150 |
b Queue the creation of an ebuild E until all of it's dependencies have been |
151 |
added (or no package left to add which would imply dependency failure) |
152 |
b.1 Could try to satisfy remaining dependencies as sys deps |
153 |
(would do it for "R" in 'Depends: ...' anyway). |
154 |
=> would choose b here (would also allow 'stable' git commits per |
155 |
ebuild/R package). |
156 |
|
157 |
3.3.2 system deps |
158 |
|
159 |
The task is to find dependency aequivalents in the portage tree, using |
160 |
|
161 |
|
162 |
* a static dependency alias map (as file,...) |
163 |
"<this dependency>" means "<atom-base>", |
164 |
e.g. "gnuplot" => "sci-visualization/gnuplot" |
165 |
|
166 |
|
167 |
May be overridden by a per-Rpackage alias map since different dependencies |
168 |
could have the same name (varies by package author or |
169 |
even within portage - app-misc/screen vs app-vim/screen (just as an example)) |
170 |
|
171 |
* a dynamic dependency alias map ("satisfied before, but not confirmed") |
172 |
Created by... |
173 |
|
174 |
* educated guessing |
175 |
by name; or try using string metrics (case sensitivity, Levenshtein, ...)) |
176 |
|
177 |
Proper logging of the dependency resolution is a crucial part here. |
178 |
|
179 |
|
180 |
3.3.3 optional deps (sys or R-pkg) |
181 |
|
182 |
'Suggests:' could be controlled via USE-Flags (IUSE+=" <short name of |
183 |
optional dep> <another one>") |
184 |
'Enhances:' could be ignored or added as USE-Flags, too. |
185 |
|
186 |
3.3.4 Additional thoughts |
187 |
|
188 |
A simple web ui or script (for ssh or local usage) could be implemented to |
189 |
ease such things (dependency mapping, maybe more; not part of the gsoc project |
190 |
for now, but kept in mind as a nice-to-have feature). |
191 |
|
192 |
3.4 Generating ebuilds |
193 |
|
194 |
Ebuilds use generic eclasses that handle most of the installation process. |
195 |
Extra per-package dependencies could be 'injected' where required |
196 |
(using an extra dependency map,...). If a R package can be installed |
197 |
from both CRAN and BOIC, choose the 'best' one (to be clarified) or |
198 |
add both (and block each other in the ebuilds). |
199 |
|
200 |
|
201 |
4 Deliverables |
202 |
--- |
203 |
4.1 Final (August 20) |
204 |
|
205 |
scripts : cran/boic importer, |
206 |
ebuild generator with dependency resolution, |
207 |
src_package-to-mirror uploader, |
208 |
log reader, |
209 |
consistency checkers etc. (see 3.1.3) |
210 |
ebuild for the overlay generation script set |
211 |
|
212 |
documentation : man pages for the scripts, |
213 |
implementation documentation ("what the system |
214 |
does and how") |
215 |
as wiki or GuideXML page (needs to be discussed) |
216 |
|
217 |
overlay : functioning; filled with R package ebuilds, eclasses, etc. |
218 |
|
219 |
4.2 Mid-Term (July 9) |
220 |
|
221 |
scripts : ebuild generator |
222 |
documentation : implementation documentation 70-80% done |
223 |
overlay : eclasses |
224 |
|
225 |
|
226 |
5 Timeline |
227 |
--- |
228 |
|
229 |
* - May 1 (Tu): read documentation, concretize the implementation ideas |
230 |
|
231 |
* May 2 - May 21 Mo): write the (a priori) project documentation of |
232 |
what the system does and how |
233 |
|
234 |
* May 22 - : begin to discuss with the Gentoo infrastructure team |
235 |
whether they would agree to install the overlay system on Gentoo |
236 |
infrastructure |
237 |
|
238 |
* May 24 - June 3 (Su): write eclasses and the ebuild generator |
239 |
without dependency resolution |
240 |
|
241 |
* June 4 - July 1 (Su): add dependency resolution to the ebuild generator |
242 |
|
243 |
* July 2 - July 9 (Mo): prepare for the Mid-term evaluations; improve |
244 |
documentation |
245 |
|
246 |
* July 10 - July 30 (Mo): sync scripts (import from CRAN / BOIC, push |
247 |
R package source to mirror), test scripts |
248 |
|
249 |
* July 31 - August 10 (Fr): write man pages, the log reader and an |
250 |
ebuild for the overlay generation script set |
251 |
|
252 |
* August 11 - August 20 (Mo): improve documentation |
253 |
|
254 |
|
255 |
|
256 |
6 Biography / About me |
257 |
--- |
258 |
I'm a twenty-year-old undergraduate student from Stuttgart, Germany. |
259 |
My major subject is Computer Science in which I'll get my bachelor's |
260 |
degree in 2013 or 2014. |
261 |
I like to tinker with Linux distributions and computer devices |
262 |
(Dreamplug, mips starter kit - just to name a few). I'm not afraid of |
263 |
writing programs/scripts whenever required (or supposed to be |
264 |
helpful/..). I've seen and used several programming languages |
265 |
(including different paradigms) such as C/C++, Haskell, Prolog, |
266 |
Python, Perl, mips-Assembler (spim), Ada, Java and Shell |
267 |
(bash/dash/hush) which gives me a basic overview on how to implement |
268 |
things. |
269 |
|
270 |
I've been using Gentoo for 3 years now and started a small |
271 |
gentoo-related project last year. It's main effort is called |
272 |
tlp-gentoo [1], which automatically converts TLP's source code (a set |
273 |
of scripts that implement power management) into gentoo-usable one |
274 |
using regex, patches etc.. |
275 |
Ebuilds for the converted source are provided in an overlay (tlp-portage [2]). |
276 |
Moreover, I'm doing code reviews for the upstream project on a regular basis. |
277 |
I've chosen this project 'cause I want to go further than that. |
278 |
An automatically generated overlay is a thing I've never done before |
279 |
and I'm really interested in realizing that. In addition, I'll be |
280 |
using a few R packages in near future, but my motivation is the |
281 |
process of overlay creation. I'm also referring to the tlp project to |
282 |
show you that I don't plan to bail out after the initial work has been |
283 |
done. |
284 |
|
285 |
|
286 |
7 Extra information |
287 |
--- |
288 |
7.1 Use the tools that you will use in your project to make changes to code |
289 |
|
290 |
fixed bugs at bugs.gentoo.org : none so far; to be done. |
291 |
|
292 |
7.2 Participate in our development community |
293 |
|
294 |
You can find a mailing list entry from me <in this mail>. |
295 |
|
296 |
7.3 Contact info |
297 |
|
298 |
email : dywi at mailerd.de |
299 |
irc : dywi at irc.freenode.net |
300 |
|
301 |
home mailing address: <removed> |
302 |
|
303 |
phone number: <removed> |
304 |
|
305 |
7.4 Working hours |
306 |
|
307 |
Mo - Sa, 8 am - 9 pm UTC |
308 |
actual working time may vary, but sums up to 30-40 hours per week. |
309 |
|
310 |
|
311 |
--- |
312 |
[1] https://github.com/dywisor/tlp-gentoo |
313 |
[2] https://github.com/dywisor/tlp-portage |
314 |
--- |
315 |
|
316 |
>>> |