Gentoo Archives: gentoo-dev

From: "Federico \\\"fox\\\" Scrinzi" <fox91@×××××.no>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] [RFC] euscan: Need to add more upstream info in metadata.xml
Date: Fri, 10 Aug 2012 11:13:18
Message-Id: 5024EC6C.3070300@anche.no
1 Hi everybody!
2
3 euscan is available in portage as a dev package
4 (app-portage/euscan-9999). This tool allows to check if a given
5 package/ebuild has new upstream versions or not. It uses different
6 heuristics to scan upstream and grab new versions and related urls.
7
8 euscan can use either custom "handlers" for well known upstream (github,
9 pypi, cpan, sourceforge, google-code, etc..) or use directory scanning
10 using SRC_URI. If directory scan fails for some reason, euscan will
11 fallback to brute force (generating possible next version number and
12 trying to fetch those packages).
13
14 The problem that we're facing with euscan is that some packages in
15 upstream use strange version numbers or the list of available versions
16 is placed in a location that is totally different from SRC_URI.
17
18 Examples:
19 - MySQL: most MySQL mirrors are not browsable (always fallback to brute
20 force)
21 - webalizer uses strange version numbers in upstream
22 (ftp://ftp.mrunix.net/pub/webalizer/), in this case euscan should be
23 aware that 2.21-02 is the version number in upstream and scan the ftp
24 directory searching for webalizer-(\d+).(\d+)-(\d+).tar.gz. The last
25 version of webalizer, 2.23.05, is not recognized by euscan and is not
26 available in gentoo.
27 - Authen-SASL-Cyrus in upstream uses “-server” in version numbers
28 http://www.cpan.org/authors/id/P/PB/PBOETTCH/
29 - XML-Tidy that uses stranges letters in version number
30
31
32 We thought about how to solve this issue and we agreed that the best way
33 to handle the problem for every specific case was adding some more
34 information in metadata.xml.
35
36 In Debian, uscan uses information from debian/watch inside debian
37 packages, hence as so much work is already done we thought about taking
38 this info from watch files and save it in metadata.xml to make euscan
39 use it.
40
41 I wrote a simple script that patches metadata.xml adding an experimental
42 <watch> tag with data from debian packages:
43 https://github.com/volpino/euscan/blob/master/bin/euscan_patch_metadata
44
45 A basic watch data contains a base url to scan and a pattern to search
46 into it:
47 Example:
48 base: http://icedtea.classpath.org/download/source/
49 pattern: icedtea-([\d\.]+).tar.gz
50 Which means "open that url and search for the links that match that
51 pattern".
52 This is useful for example when is not possible to retrieve the base url
53 from SRC_URI (icedtea’s SRC_URI is
54 http://icedtea.classpath.org/hg/release/icedtea7-forest-2.2/hotspot/archive/889dffcf4a54.tar.gz)
55
56 Advanced usage with directory pattern:
57 Example:
58 base: http://ftp.gwdg.de/pub/misc/mysql/Downloads/MySQL-([\d\.]+)
59 pattern: mysql-([\d\.]+).tar.gz
60 Scans all directories that match the query looking for links that match
61 the pattern
62
63 We need also some options for mangling versions and download url: these
64 options can contain regexps or names of mangling rules (e.g.: "cpan"
65 means apply mangling rules for CPAN versions)
66
67 Version mangling example:
68 As mentioned above webalizer uses both dots and hyphens in version
69 numbers, so an option like this is required versionmangle=”s/-/./”
70
71 Download url mangling example:
72 Page scan on berlios returns an url like this:
73 http://prdownload.berlios.de/mirageiv/mirage-0.9.tar.gz that should be
74 mangled to get a working download url with an option like
75 downloadurlmangle=”s/prdownload/download/”
76
77 (for more info see uscan manpage)
78
79 Another example: dev-perl/Math-BaseCnv or XML-Tidy in upstream use
80 strange version numbers like 1.8.B59BrZ that should be mangled to 1.8
81
82 Summarizing we need:
83 - A base url and a file pattern to search for new upstream versions when
84 SRC_URI is not suitable
85 - some options for mangling retrieved data from the scan of upstream
86 using base url and pattern or using remote-id information
87
88 So our problem is: how can we store this data in a very flexible and
89 efficient way?
90 Proposed solutions:
91
92 1) Add an euscan tag with a custom namespace
93 Example:
94 <euscan xmlns="http://euscan.iksaif.net">
95 <transformation>
96 <regexp><from>a</from><to>b</to></regexp>
97 <cpan-mangle/>
98 <gentoo-mangle/>
99 </transformation>
100 </euscan>
101 Which means: apply regex s/a/b/ then apply cpan mangling rules and then
102 gentoo mangling rules.
103
104 2) Change quite heavily the remote-id tag:
105 - adding versionmanging and downloadmangling options that contain
106 regexes
107 - adding a new remote-id type called for example url, that tag will
108 contain the base url and the pattern
109
110 3) Add a watch tag to <upstream> with versionmangling and
111 downloadmangling options. This tag can have a type (and in that case the
112 data from remote-id is used) or can contain the base url and the file
113 pattern. (this is what is currently implemented for our tests).
114
115
116 So before going further, we would like some feedback from you on these
117 approaches.
118 What do you think about them? Which do you prefer? Do you think there’s
119 a better approach or some steps can be changed in a more efficient way?
120
121
122
123 Other examples:
124
125 dev-perl/XML-Tidy: # We have to strip trailing letters in version and
126 then apply cpan mangling rules
127 <upstream>
128 <remote-id type="cpan">XML-Tidy</remote-id>
129 <remote-id type="cpan-module">XML::Tidy</remote-id>
130 <watch type="cpan" versionmangle="s/(\d+)((\.\d+)*).*/$1$2/;cpan">
131 </watch>
132 </upstream>
133
134 sys-fs/dfc: # Download hosting sux and have download id in url
135 <upstream>
136 <watch version="3">
137 http://projects.gw-computing.net/projects/dfc/files
138 /attachments/download/[0-9]+/dfc-(.*)\.tar\.gz
139 </watch>
140 </upstream>
141
142 sys-dev/gcc: # Tons of files in SRC_URI, let’s be more efficient
143
144 media-plugins/vdr-cpumon # 0.0.6a == 0.0.6_p1 so should need version
145 mangling
146
147 app-admin/webalizer:
148 <upstream>
149 <watch version="3" versionmangle="s/-/./">
150 http://www.mrunix.net/webalizer/download.html
151 webalizer-(.*)-src\.tgz
152 </watch>
153 </upstream>
154
155 kde-base/okular:
156 <upstream>
157 <watch
158 version="3">ftp://ftp.kde.org/pub/kde/stable/([\d\.]*)/src/okular-([\d\.]*).tar.xz</watch>
159 <watch
160 version="3">ftp://ftp.kde.org/pub/kde/stable/((?:\d\.)+\d)/src/okular-((?:\d\.)+\d).tar.xz</watch>
161 </upstream>
162
163 sci-geosciences/grass:
164 <upstream>
165 <watch
166 version="3">http://grass.osgeo.org/grass64/source/grass-([\d\.]*(?:RC\d){0,1}).tar.gz</watch>
167 </upstream>
168
169 --
170 f.
171
172 "Always code as if the guy who ends up maintaining your code will be a
173 violent psychopath who knows where you live."
174 (Martin Golding)

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies