Gentoo Archives: gentoo-soc

From: "André Erdmann" <dywi@×××××××.de>
To: gentoo-soc@l.g.o
Cc: Denis Dupeyron <calchan@g.o>
Subject: [gentoo-soc] Automatically generated overlay of R packages - final report
Date: Tue, 21 Aug 2012 18:14:25
Message-Id: CAGrucu2TXdLJufJGHstayfU2i0syEqv9DnqfAJeLvAdUVfP7cw@mail.gmail.com
1 Hi everyone,
2
3 == Brief summary of this project ==
4
5 The aim of this project is to create scripts that automate the process
6 of overlay creation/maintenance for R packages from repositories such
7 as CRAN and Bioconductor.
8
9 Longer:
10 For the ebuild creation of a single package one needs to extract the
11 package, copy-paste data from its description file to the ebuild and
12 look up dependencies, which is time-consuming. Although trivial for a
13 few number of packages, this is practically impossible to do by hand
14 for repositories like CRAN (> 3500 packages), especially 'cause it
15 also requires tracking changes (new / updated / removed packages). The
16 solution is to automate that process and this is what this project is
17 about.
18
19 The project's git repository is located at
20 http://git.overlays.gentoo.org/gitweb/?p=proj/R_overlay.git
21
22 == Current state of this project and future directions ==
23
24 Automatically generated overlay of R packages has now reached the end
25 of GSoC 2012's coding period. The result briefly described below is
26 roverlay, a script and modules written in python. I tried to keep the
27 code as extensible as possible, making future extensions like other
28 ways to get R packages (git, svn, ...) easy.
29
30 It has two user-accessible main parts:
31 * overlay creation, which accepts R packages as input and creates a
32 portage overlay for them
33 * repository management, download R packages from remotes and use them
34 as input for overlay creation
35
36 The minimal requirement for downloading packages is that a remote
37 offers http access to its packages. The preferred way is rsync, which
38 is used for CRAN and BIOC. The http support has been added later to
39 include repos like R-Forge and Omegahat.
40
41 Overlay creation is able to work incrementally so that existing
42 ebuilds don't have to be recreated. It involves several tasks:
43 * reading R package metadata and fixing errors like misspelled data
44 fields along the way, e.g. 'Depents' is read as 'Depends'. Package
45 reading is configurable.
46 * ebuild creation, which tries to create an ebuild for an R package
47 using its metadata
48 -> dependency resolution that creates correct DEPEND/RDEPEND ebuild
49 variables. It's realized by a dictionary approach extended by
50 version-relative lookups.
51 * overlay writing
52 -> per-Package metadata.xml/Manifest creation
53
54 Currently, the ebuild creation success rate is slightly higher than
55 95%. Ca. 900 out of 32000 creations fail due to various reasons: os
56 type not supported, dependency unresolvable, R package format not
57 supported (.Z-compressed tarballs, ...).
58
59 Extensive documentation is available at [0] and covers usage,
60 configuration, installation, what to expect and how roverlay works.
61
62 All in all, I accomplished most objectives of my proposal. Some have
63 been dropped like getting packages via svn, some have been added like
64 getting packages via http and the version-relative dependency
65 resolution. What's really missing is the integration into Gentoo's
66 infrastructure so that end-users can add the resulting overlay using
67 Layman. This will hopefully happen in the near future. As for the
68 future, I'll focus on adding features based on real world/production
69 usage needs.
70
71 At last, I'd like to thank Denis (Calchan), my mentor, for guidance
72 throughout the last months. I don't tend to ask many questions, but
73 whenever I had one, he was able to answer it ;) Overall, taking part
74 in gsoc for Gentoo has been a good experience.
75
76
77 [0] http://git.overlays.gentoo.org/gitweb/?p=proj/R_overlay.git;a=blob_plain;f=doc/html/usage.html;hb=HEAD
78
79 --
80 Regards,
81 André E.