1 |
As the subject says, this report is pretty long. It's intended for |
2 |
those who haven't closely followed my work up until now and would like |
3 |
to catch up, so go grab a cup of coffee if you really want to read |
4 |
this to the end. |
5 |
|
6 |
Subjects in this report (in order): |
7 |
-intro of the project |
8 |
-what have I been up to last week |
9 |
-instructions on installing packages from bioconductor and CRAN |
10 |
-g-common, the interface (or actually lack of interface) this project will have |
11 |
-plans for the coming week and next week |
12 |
|
13 |
Perhaps an introduction of the circumstances is in place. R is a |
14 |
language for statisticians. With statistics being such a wide topic, |
15 |
there are thousands of additional packages you can install to further |
16 |
analyze data, and the Bioconductor project adds another field to R by |
17 |
introducing genomics. My job is to cleanly enable Gentoo users to |
18 |
install the latest versions of these packages systemwide, as opposed |
19 |
to directly calling R's package installers and ending up with dangling |
20 |
files. Last week, I was up to the point where some packages installed |
21 |
correctly, but there were some rough edges too. For packages not |
22 |
relying on external (non-R) libraries, this should all be smoothed out |
23 |
now. |
24 |
|
25 |
I've spent a lot of time communicating with several parties last week. |
26 |
There was a minor issue with the Bioconductor repositories, I've |
27 |
spoken to some people about g-common, talked a bit with the CRAN |
28 |
maintainers and had some technical discussions with rafaelmartins, |
29 |
who's a gsoc student working on g-octave, as you may know. |
30 |
|
31 |
Then there are some helpful dependency resolution changes. |
32 |
Dependencies on R packages now work perfectly fine, and external |
33 |
dependencies are going to be tackled soon (but it won't be pretty). |
34 |
|
35 |
So why is this helpful? It means you can install most Bioconductor |
36 |
packages flawlessly. |
37 |
|
38 |
As promised in an earlier email to the gentoo-science ML, some |
39 |
instructions. Please note that this will of course not be the way |
40 |
you'll eventually use g-cran, but I'm still working on the interface |
41 |
(more on that later). |
42 |
|
43 |
First, create two overlays. I'm simply calling them bioconductor_1 and |
44 |
bioconductor_2. One of them primarily contains code, the other |
45 |
consists primarily of gene databases. |
46 |
# mkdir -p /usr/local/portage/bioconductor_1/profiles |
47 |
# mkdir -p /usr/local/portage/bioconductor_2/profiles |
48 |
Now we need to set the repo_name and categories of these overlays, too. |
49 |
# echo "bioconductor_1" >> /usr/local/portage/bioconductor_1/profiles/repo_name |
50 |
# echo "bioconductor_2" >> /usr/local/portage/bioconductor_2/profiles/repo_name |
51 |
# echo "dev-R" >> /usr/local/portage/bioconductor_1/profiles/categories |
52 |
# echo "dev-R" >> /usr/local/portage/bioconductor_2/profiles/categories |
53 |
It's time to actually get the tree. Make sure you've installed g-cran |
54 |
(it's in the science overlay), sync the repositories and then generate |
55 |
the tree: |
56 |
# g-cran /usr/local/portage/bioconductor_1 sync |
57 |
http://www.bioconductor.org/packages/devel/bioc |
58 |
# g-cran /usr/local/portage/bioconductor_2 sync |
59 |
http://www.bioconductor.org/packages/devel/data/annotation |
60 |
# g-cran /usr/local/portage/bioconductor_1 generate-tree |
61 |
# g-cran /usr/local/portage/bioconductor_2 generate-tree |
62 |
|
63 |
You can now add the overlays to your favorite package manager and |
64 |
start emerging (*ahem* - installing) packages. If all is well, you |
65 |
should be able to install, for example, dev-R/zebrafishdb (this is a |
66 |
bioconductor_2 database package that pulls in several bioconductor_1 |
67 |
packages). I have absolutely no clue as to what you can do with these |
68 |
packages, but I suppose some biology fans out there can clarify that. |
69 |
|
70 |
Now, it may be that portage complains about missing Manifest files. If |
71 |
that's the case, then also run: |
72 |
# for x in /usr/local/portage/bioconductor_{1,2}/dev-R/*; do touch |
73 |
"${x}/Manifest"; done |
74 |
I hope that should do the trick, please tell me if it does, and if |
75 |
it's needed at all. Once you've done this and this trick actually |
76 |
works, you should be able to install dev-R/zebrafishdb. |
77 |
|
78 |
If you don't need no stinkin' databases of deoxyribonucleic acid, but |
79 |
are interested in CRAN, just create a cran overlay as we did for |
80 |
bioconductor_1 and bioconductor_2, but use http://cran.r-project.org |
81 |
as the source repository, and 'cran' for the overlay name. Better yet, |
82 |
find a mirror close to you at http://cran.r-project.org/mirrors.html |
83 |
|
84 |
Okay, so that was quite a journey to get a simple sqlite database of |
85 |
gene data. g-common is what will be making all this easier. |
86 |
Unfortunately I haven't heard much from the other two students I was |
87 |
cooperating with before, anymore, so I'm going to invent something of |
88 |
my own. The plan has remained roughly the same, but time after time |
89 |
I'm struggling to explain it, so please bear with me as you read this. |
90 |
|
91 |
[start explanation of g-common] |
92 |
Current projects to install non-ebuild packages generate ebuild files |
93 |
at request, put them in an overlay and tell portage to install them. |
94 |
The problem with this approach is that the ebuilds are only generated |
95 |
when you know what you want to install, ie. the overlay doesn't get |
96 |
fully populated upfront. This approach implies you cannot search for |
97 |
packages in such repositories, you cannot depend on packages in such |
98 |
repositories, and you can't trivially update packages in such |
99 |
repositories. I'd like to generate a full package tree at sync time, |
100 |
no matter if you want to use it or not. Further, this syncing should |
101 |
work like any other overlay: ideally, support for non-ebuild |
102 |
repositories is transparent to the users. I'm going to do this via an |
103 |
abstraction layer called g-common, for which support needs to be |
104 |
written for all package managers. But once that support is written, |
105 |
and the non-ebuild repository reading code is adjusted to work with |
106 |
g-common, there is nothing stopping you from using a non-ebuild |
107 |
repository like a regular ebuild overlay. |
108 |
How this works is not exactly trivial to explain, but the important |
109 |
part is that even though tools like g-cran are really functioning, the |
110 |
package managers thinks it's dealing with a regular PMS-worthy tree. |
111 |
At sync time, the package manager simply calls the g-common method for |
112 |
syncing a tree, which in turn calls the appropriate repository driver |
113 |
to fetch the new package listing from the true remote repository. To |
114 |
integrate this well, some patching is needed. At install time, all the |
115 |
various pkg_unpack, src_install, etc. phases result in calls to |
116 |
g-common, and again those result in calls to the appropriate |
117 |
repository driver, which then executes the phase, but all this is sort |
118 |
of PMS-compliant. Call it over-engineering, but it'll feel like magic |
119 |
and I'm going to prove it. |
120 |
[end explanation of g-common] |
121 |
|
122 |
The plan for this week is to /finally/ get some work done on g-common |
123 |
and perhaps prepare the code for external dependency resolution. On |
124 |
Saturday, I'm unfortunately leaving for vacation, so you won't see me |
125 |
doing much. After that vacation, first of all there's GUADEC 2010 |
126 |
which I'm going to attend, but of course I'm also going to continue |
127 |
developing g-common and finish external dependency resolution. |
128 |
|
129 |
Now, if you've come to this point in my email, I'd really like to |
130 |
thank you, because I know how easy it is to simply mark an email as |
131 |
read and move on. You are why I'm developing this, thanks a lot! |
132 |
|
133 |
The next weekly report will be in two weeks, |
134 |
Auke Booij / tulcod. |