1 |
Hi everybody! |
2 |
|
3 |
euscan is available in portage as a dev package |
4 |
(app-portage/euscan-9999). This tool allows to check if a given |
5 |
package/ebuild has new upstream versions or not. It uses different |
6 |
heuristics to scan upstream and grab new versions and related urls. |
7 |
|
8 |
euscan can use either custom "handlers" for well known upstream (github, |
9 |
pypi, cpan, sourceforge, google-code, etc..) or use directory scanning |
10 |
using SRC_URI. If directory scan fails for some reason, euscan will |
11 |
fallback to brute force (generating possible next version number and |
12 |
trying to fetch those packages). |
13 |
|
14 |
The problem that we're facing with euscan is that some packages in |
15 |
upstream use strange version numbers or the list of available versions |
16 |
is placed in a location that is totally different from SRC_URI. |
17 |
|
18 |
Examples: |
19 |
- MySQL: most MySQL mirrors are not browsable (always fallback to brute |
20 |
force) |
21 |
- webalizer uses strange version numbers in upstream |
22 |
(ftp://ftp.mrunix.net/pub/webalizer/), in this case euscan should be |
23 |
aware that 2.21-02 is the version number in upstream and scan the ftp |
24 |
directory searching for webalizer-(\d+).(\d+)-(\d+).tar.gz. The last |
25 |
version of webalizer, 2.23.05, is not recognized by euscan and is not |
26 |
available in gentoo. |
27 |
- Authen-SASL-Cyrus in upstream uses “-server” in version numbers |
28 |
http://www.cpan.org/authors/id/P/PB/PBOETTCH/ |
29 |
- XML-Tidy that uses stranges letters in version number |
30 |
|
31 |
|
32 |
We thought about how to solve this issue and we agreed that the best way |
33 |
to handle the problem for every specific case was adding some more |
34 |
information in metadata.xml. |
35 |
|
36 |
In Debian, uscan uses information from debian/watch inside debian |
37 |
packages, hence as so much work is already done we thought about taking |
38 |
this info from watch files and save it in metadata.xml to make euscan |
39 |
use it. |
40 |
|
41 |
I wrote a simple script that patches metadata.xml adding an experimental |
42 |
<watch> tag with data from debian packages: |
43 |
https://github.com/volpino/euscan/blob/master/bin/euscan_patch_metadata |
44 |
|
45 |
A basic watch data contains a base url to scan and a pattern to search |
46 |
into it: |
47 |
Example: |
48 |
base: http://icedtea.classpath.org/download/source/ |
49 |
pattern: icedtea-([\d\.]+).tar.gz |
50 |
Which means "open that url and search for the links that match that |
51 |
pattern". |
52 |
This is useful for example when is not possible to retrieve the base url |
53 |
from SRC_URI (icedtea’s SRC_URI is |
54 |
http://icedtea.classpath.org/hg/release/icedtea7-forest-2.2/hotspot/archive/889dffcf4a54.tar.gz) |
55 |
|
56 |
Advanced usage with directory pattern: |
57 |
Example: |
58 |
base: http://ftp.gwdg.de/pub/misc/mysql/Downloads/MySQL-([\d\.]+) |
59 |
pattern: mysql-([\d\.]+).tar.gz |
60 |
Scans all directories that match the query looking for links that match |
61 |
the pattern |
62 |
|
63 |
We need also some options for mangling versions and download url: these |
64 |
options can contain regexps or names of mangling rules (e.g.: "cpan" |
65 |
means apply mangling rules for CPAN versions) |
66 |
|
67 |
Version mangling example: |
68 |
As mentioned above webalizer uses both dots and hyphens in version |
69 |
numbers, so an option like this is required versionmangle=”s/-/./” |
70 |
|
71 |
Download url mangling example: |
72 |
Page scan on berlios returns an url like this: |
73 |
http://prdownload.berlios.de/mirageiv/mirage-0.9.tar.gz that should be |
74 |
mangled to get a working download url with an option like |
75 |
downloadurlmangle=”s/prdownload/download/” |
76 |
|
77 |
(for more info see uscan manpage) |
78 |
|
79 |
Another example: dev-perl/Math-BaseCnv or XML-Tidy in upstream use |
80 |
strange version numbers like 1.8.B59BrZ that should be mangled to 1.8 |
81 |
|
82 |
Summarizing we need: |
83 |
- A base url and a file pattern to search for new upstream versions when |
84 |
SRC_URI is not suitable |
85 |
- some options for mangling retrieved data from the scan of upstream |
86 |
using base url and pattern or using remote-id information |
87 |
|
88 |
So our problem is: how can we store this data in a very flexible and |
89 |
efficient way? |
90 |
Proposed solutions: |
91 |
|
92 |
1) Add an euscan tag with a custom namespace |
93 |
Example: |
94 |
<euscan xmlns="http://euscan.iksaif.net"> |
95 |
<transformation> |
96 |
<regexp><from>a</from><to>b</to></regexp> |
97 |
<cpan-mangle/> |
98 |
<gentoo-mangle/> |
99 |
</transformation> |
100 |
</euscan> |
101 |
Which means: apply regex s/a/b/ then apply cpan mangling rules and then |
102 |
gentoo mangling rules. |
103 |
|
104 |
2) Change quite heavily the remote-id tag: |
105 |
- adding versionmanging and downloadmangling options that contain |
106 |
regexes |
107 |
- adding a new remote-id type called for example url, that tag will |
108 |
contain the base url and the pattern |
109 |
|
110 |
3) Add a watch tag to <upstream> with versionmangling and |
111 |
downloadmangling options. This tag can have a type (and in that case the |
112 |
data from remote-id is used) or can contain the base url and the file |
113 |
pattern. (this is what is currently implemented for our tests). |
114 |
|
115 |
|
116 |
So before going further, we would like some feedback from you on these |
117 |
approaches. |
118 |
What do you think about them? Which do you prefer? Do you think there’s |
119 |
a better approach or some steps can be changed in a more efficient way? |
120 |
|
121 |
|
122 |
|
123 |
Other examples: |
124 |
|
125 |
dev-perl/XML-Tidy: # We have to strip trailing letters in version and |
126 |
then apply cpan mangling rules |
127 |
<upstream> |
128 |
<remote-id type="cpan">XML-Tidy</remote-id> |
129 |
<remote-id type="cpan-module">XML::Tidy</remote-id> |
130 |
<watch type="cpan" versionmangle="s/(\d+)((\.\d+)*).*/$1$2/;cpan"> |
131 |
</watch> |
132 |
</upstream> |
133 |
|
134 |
sys-fs/dfc: # Download hosting sux and have download id in url |
135 |
<upstream> |
136 |
<watch version="3"> |
137 |
http://projects.gw-computing.net/projects/dfc/files |
138 |
/attachments/download/[0-9]+/dfc-(.*)\.tar\.gz |
139 |
</watch> |
140 |
</upstream> |
141 |
|
142 |
sys-dev/gcc: # Tons of files in SRC_URI, let’s be more efficient |
143 |
|
144 |
media-plugins/vdr-cpumon # 0.0.6a == 0.0.6_p1 so should need version |
145 |
mangling |
146 |
|
147 |
app-admin/webalizer: |
148 |
<upstream> |
149 |
<watch version="3" versionmangle="s/-/./"> |
150 |
http://www.mrunix.net/webalizer/download.html |
151 |
webalizer-(.*)-src\.tgz |
152 |
</watch> |
153 |
</upstream> |
154 |
|
155 |
kde-base/okular: |
156 |
<upstream> |
157 |
<watch |
158 |
version="3">ftp://ftp.kde.org/pub/kde/stable/([\d\.]*)/src/okular-([\d\.]*).tar.xz</watch> |
159 |
<watch |
160 |
version="3">ftp://ftp.kde.org/pub/kde/stable/((?:\d\.)+\d)/src/okular-((?:\d\.)+\d).tar.xz</watch> |
161 |
</upstream> |
162 |
|
163 |
sci-geosciences/grass: |
164 |
<upstream> |
165 |
<watch |
166 |
version="3">http://grass.osgeo.org/grass64/source/grass-([\d\.]*(?:RC\d){0,1}).tar.gz</watch> |
167 |
</upstream> |
168 |
|
169 |
-- |
170 |
f. |
171 |
|
172 |
"Always code as if the guy who ends up maintaining your code will be a |
173 |
violent psychopath who knows where you live." |
174 |
(Martin Golding) |