Gentoo Archives: gentoo-soc

From: "André Erdmann" <andre.gentoo-ml@×××××××.de>
To: gentoo-soc@l.g.o
Subject: [gentoo-soc] Fwd: [soc proposal] Automatically generated overlay of R packages
Date: Fri, 06 Apr 2012 18:10:24
Message-Id: CAGrucu1n213VXyiTmnjA55qVn_6FccmwVNpahSPkbp8iJVmXbQ@mail.gmail.com
1 Hi,
2
3 I've just submitted my proposal (included in this mail) for this
4 year's gsoc project idea 'Automatically generated overlay of R
5 packages'. Feel free to comment and criticize, it helps me a lot to
6 improve it.
7
8 Kind Regards,
9 Andre Erdmann
10
11 <<<
12 --- Table of contents ---
13 1 Abstract
14 2 Objective
15 3 Implementation ideas
16  3.1 General
17  3.1.1 Distributing and maintaining the overlay
18  3.1.2 Package mirror
19  3.1.3 Scripts
20
21  3.2 Getting packages & information from CRAN, Bioconductor
22  3.2.1 Fixing package metadata
23  3.2.2 Getting packages from BIOC
24  3.2.3 Getting packages from CRAN
25
26  3.3 Satisfying dependencies
27  3.3.1 R package deps
28  3.3.2 system deps
29  3.3.3 optional deps (sys or R-pkg)
30  3.3.4 Additional thoughts
31
32  3.4 Generating ebuilds
33
34 4 Deliverables
35  4.1 Final (August 20)
36  4.2 Mid-Term (July 9)
37
38 5 Timeline
39 6 Biography / About me
40 7 Extra information
41  7.1 Use the tools that you will use in your project to make changes to code
42  7.2 Participate in our development community
43  7.3 Contact info
44  7.4 Working hours
45
46 --- Table of contents ---
47
48 1 Abstract
49 ---
50 The aim of this project is to create an overlay that contains
51 automatically  generated ebuilds for R packages from CRAN and BOIC.
52
53 2 Objective
54 ---
55 Creating an ebuild for an R package means
56 * check for newer versions
57 * discover all dependencies (system deps, too)
58 * fix package metadata if required
59 * upload the R package to a mirror
60 * write the ebuild and create/update the Manifest file.
61 It's impossible to do that for all R packages from CRAN/BOIC due to
62 the high number of available packages. Using scripts to automate the
63 process addresses this issue and can be used to create an overlay.
64
65 Clean code, proper logging and good documentation are key components
66 of this project since integration into the Gentoo infrastructure is
67 intended.
68
69
70 3 Implementation ideas
71 ---
72 3.1 General
73
74 3.1.1 Distributing and maintaining the overlay
75
76 Using git offers easy updates/rollbacks via git push/pull/checkout/...
77 and would allow to separate overlay creation/update from overlay
78 hosting.
79
80 the abstract overlay layout would be:
81  /dev-R/
82  /dev-R/<origin>-<package_name>/
83  /dev-R/<origin>-<package_name>/<origin>-<package-name>-<${PVR}>.ebuild
84  /dev-R/<origin>-<package_name>/{[metadata.xml,],ChangeLog,Manifest}
85  /eclass/
86  /eclass/<files that help, e.g. r_overlay-fix-metadata.eclass>
87  /profiles/...
88
89 example: /dev-R/CRAN-numDeriv/CRAN-numDeriv-2012.3.1.ebuild
90
91 3.1.2 Package mirror
92
93 To be figured out.
94 Probably nothing special, could be some directory/file tree structure like
95  "<r-overlay_distfiles-root>/dev-R/<portage-atom_base>/<portage-atom_base>-<${PVR}>.tar.gz"
96
97 3.1.3 Scripts
98
99 Will be implemented using bash or perl and kept/developed in an extra git repo.
100 An ebuild that installs the whole script set will be part of the gsoc
101 project, which means that an overlay client could be used to maintain
102 the overlay (with proper write/push permissions on the server).
103 There will be an ebuild-generator, src_package-to-mirror uploader
104 (rsync or mv/cp, ...), a cran/boic importer ("syncer"), test scripts,
105 a dependency resolution map etc..
106 Logging will be done in a machine/parse-friendly way and a converter
107 into a more human-readable format will be supplied, too.
108
109 3.2 Getting packages & information from CRAN, Bioconductor
110     (and possibly Omegahat etc. but this is not part of the gsoc project)
111
112 The DESCRIPTION file contains all(?) information that is required to
113 generate an ebuild for a package.
114
115 3.2.1 Fixing package metadata
116
117 Package metadata could be fixed while emerging (using an eclass) or at
118 overlay creation/update-time (which requires gentoo-specific source
119 tarballs).
120 Further discussion has to be made in order to choose the right option here;
121 I would choose the former one and add flags for a metadata-fixer
122 eclass ("E_ROVERLAY_FIXMETADATA='this that etc'") to the generated
123 ebuild, leaving the package tarballs unchanged.
124
125 3.2.2 Getting packages from BIOC
126
127 BIOC offers all source packages in a tree structure (<pkg_name/...>)
128 via svn, making an update check as easy as 'svn status --show-updates'
129 (or similar).
130 The DESCRIPTION file can be parsed directly; tarball creation is
131 necessary for the distfiles mirror, so no specific pros/cons regarding
132 3.2.1.
133
134 3.2.3 Getting packages from CRAN
135
136 CRAN source packages can be mirrored as tarballs (containing the
137 package files) using rsync. Each tarball has to be extracted (at least
138 partially, see 3.2.1).
139
140
141 3.3 Satisfying dependencies
142
143 Relevant fields in the DESCRIPTION file are 'Depends, [Suggests],
144 [SystemRequirements]', maybe more ('[Enhances]',...).
145
146 3.3.1 R package deps
147
148 a Create all ebuilds blindly (don't check these deps) and perform
149    a consistency check ("can satisfy") afterwards
150 b Queue the creation of an ebuild E until all of it's dependencies have been
151    added (or no package left to add which would imply dependency failure)
152 b.1 Could try to satisfy remaining dependencies as sys deps
153      (would do it for "R" in 'Depends: ...' anyway).
154 => would choose b here (would also allow 'stable' git commits per
155 ebuild/R package).
156
157 3.3.2 system deps
158
159 The task is to find dependency aequivalents in the portage tree, using
160
161
162 * a static dependency alias map (as file,...)
163    "<this dependency>" means "<atom-base>",
164      e.g. "gnuplot" => "sci-visualization/gnuplot"
165
166
167   May be overridden by a per-Rpackage alias map since different dependencies
168   could have the same name (varies by package author or
169   even within portage - app-misc/screen vs app-vim/screen (just as an example))
170
171 * a dynamic dependency alias map ("satisfied before, but not confirmed")
172    Created by...
173
174 * educated guessing
175    by name; or try using string metrics (case sensitivity, Levenshtein, ...))
176
177 Proper logging of the dependency resolution is a crucial part here.
178
179
180 3.3.3 optional deps (sys or R-pkg)
181
182 'Suggests:' could be controlled via USE-Flags (IUSE+=" <short name of
183 optional dep> <another one>")
184 'Enhances:' could be ignored or added as USE-Flags, too.
185
186 3.3.4 Additional thoughts
187
188 A simple web ui or script (for ssh or local usage) could be implemented to
189 ease such things (dependency mapping, maybe more; not part of the gsoc project
190 for now, but kept in mind as a nice-to-have feature).
191
192 3.4 Generating ebuilds
193
194 Ebuilds use generic eclasses that handle most of the installation process.
195 Extra per-package dependencies could be 'injected' where required
196 (using an extra dependency map,...). If a R package can be installed
197 from both CRAN and BOIC, choose the 'best' one (to be clarified) or
198 add both (and block each other in the ebuilds).
199
200
201 4 Deliverables
202 ---
203 4.1 Final (August 20)
204
205 scripts : cran/boic importer,
206            ebuild generator with dependency resolution,
207            src_package-to-mirror uploader,
208            log reader,
209            consistency checkers etc. (see 3.1.3)
210            ebuild for the overlay generation script set
211
212 documentation : man pages for the scripts,
213                        implementation documentation ("what the system
214 does and how")
215                         as wiki or GuideXML page (needs to be discussed)
216
217 overlay : functioning; filled with R package ebuilds, eclasses, etc.
218
219 4.2 Mid-Term (July 9)
220
221 scripts             : ebuild generator
222 documentation : implementation documentation 70-80% done
223 overlay            : eclasses
224
225
226 5 Timeline
227 ---
228
229 *  - May 1 (Tu): read documentation, concretize the implementation ideas
230
231 * May 2 - May 21 Mo): write the (a priori) project documentation of
232 what the system does and how
233
234 * May 22 - : begin to discuss with the Gentoo infrastructure team
235 whether they would agree to install the overlay system on Gentoo
236 infrastructure
237
238 * May 24 - June 3 (Su): write eclasses and the ebuild generator
239 without dependency resolution
240
241 * June 4 - July 1 (Su): add dependency resolution to the ebuild generator
242
243 * July 2 - July 9 (Mo): prepare for the Mid-term evaluations; improve
244 documentation
245
246 * July 10 - July 30 (Mo): sync scripts (import from CRAN / BOIC, push
247 R package source to mirror), test scripts
248
249 * July 31 - August 10 (Fr): write man pages, the log reader and an
250 ebuild for the overlay generation script set
251
252 * August 11 - August 20 (Mo): improve documentation
253
254
255
256 6 Biography / About me
257 ---
258 I'm a twenty-year-old undergraduate student from Stuttgart, Germany.
259 My major subject is Computer Science in which I'll get my bachelor's
260 degree in 2013 or 2014.
261 I like to tinker with Linux distributions and computer devices
262 (Dreamplug, mips starter kit - just to name a few). I'm not afraid of
263 writing   programs/scripts  whenever required (or supposed to be
264 helpful/..). I've seen and used several programming languages
265 (including different paradigms) such as C/C++, Haskell, Prolog,
266 Python, Perl, mips-Assembler (spim), Ada, Java and Shell
267 (bash/dash/hush) which gives me a basic overview on how to implement
268 things.
269
270 I've been using Gentoo for 3 years now and started a small
271 gentoo-related project last year. It's main effort is called
272 tlp-gentoo [1], which automatically converts TLP's source code (a set
273 of scripts that implement power management) into gentoo-usable one
274 using regex, patches etc..
275 Ebuilds for the converted source are provided in an overlay (tlp-portage [2]).
276 Moreover, I'm doing code reviews for the upstream project on a regular basis.
277 I've chosen this project 'cause I want to go further than that.
278 An automatically generated overlay is a thing I've never done before
279 and I'm really interested in realizing that. In addition, I'll be
280 using a few R packages in near future, but my motivation is the
281 process of overlay creation. I'm also referring to the tlp project to
282 show you that I don't plan to bail out after the initial work has been
283 done.
284
285
286 7 Extra information
287 ---
288 7.1 Use the tools that you will use in your project to make changes to code
289
290 fixed bugs at bugs.gentoo.org : none so far; to be done.
291
292 7.2 Participate in our development community
293
294 You can find a mailing list entry from me <in this mail>.
295
296 7.3 Contact info
297
298 email  : dywi at mailerd.de
299 irc      : dywi at irc.freenode.net
300
301 home mailing address: <removed>
302
303 phone number: <removed>
304
305 7.4 Working hours
306
307 Mo - Sa, 8 am - 9 pm UTC
308 actual working time may vary, but sums up to 30-40 hours per week.
309
310
311 ---
312 [1] https://github.com/dywisor/tlp-gentoo
313 [2] https://github.com/dywisor/tlp-portage
314 ---
315
316 >>>