Gentoo Archives: gentoo-soc

From: Arne Babenhauserheide <arne_bab@×××.de>
To: gentoo-soc@l.g.o
Cc: Jeremy Olexa <darkside@g.o>
Subject: Re: [gentoo-soc] [GSoC-status] Tree-wide collision checking and files database
Date: Mon, 15 Jun 2009 06:24:14
Message-Id: 200906150823.35289.arne_bab@web.de
In Reply to: Re: [gentoo-soc] [GSoC-status] Tree-wide collision checking and files database by Jeremy Olexa
1 Am Freitag, 12. Juni 2009 17:05:33 schrieb Jeremy Olexa:
2 > > It seems that whole process could be sped up by hosting binary
3 > > packages on one central server (Binary host). Obviously various versions
4 > > of the same package would be created and therefore unique names could be
5 > > created by using some metadata to create hash part of filename. On a
6 > > first thought I would use USE flags and DEPEND as metadata to hash.
7 >
8 > This is a cool aspect of the project, I hope you can work with solar
9 > and zmedico to improve binpkgs. USE flags seem to be the trouble spot
10 > of binpkgs.
11
12 Maybe this Bug could help with that:
13
14 "Include more info about a binpkg"
15 - http://bugs.gentoo.org/show_bug.cgi?id=150031#c7
16
17 I wanted to tackle that myself some time ago but got stuck at trying to find
18 my way within portage so I postponed it a few times...
19
20 It would only need three things:
21
22
23 * Find SLOT and USE for any ebuild (SLOT is everything which affects all
24 packages on one machine, USE are the *active* USE-flags of the package).
25
26 * Save binpackages in a directory structure which contains a hash over the
27 SLOT and a hash over the active USE flags.
28
29 * Locate binpackages via the same hashes.
30
31
32 The path to a binpackage would then look like this:
33
34 $PKGDIR/$CATEGORY/$PN/$SLOTS_HASH/$USE_HASH/python-2.5.2-r8.tbz2
35
36
37 You can make the USE hash more user-friendly by only hashing a USE string, if
38 the unhashed string would be longer than the hashed string.
39
40 As hash you can simply use sha.sha() - maybe in BASE32 encoding to make it
41 URL- and filename-safe.
42
43 --- !python
44 # hashing a USE string
45 from sha import sha
46 from base64 import b32encode
47
48 hash = sha()
49 use_string = "STRING OF THE ACTIVE USE FLAGS"
50 # if the string is longer than a base32 encoded sha1 hash,
51 # hash it to have an upper bound on its length
52 if len(use_string) > 32:
53 hash.update(use_string)
54 use_string = b32encode(hash.digest())
55 ...
56
57 One missing piece is a nice human readable delimiter for the USE flags in the
58 USE string which can safely be used in URLs. If there is none, always hashing
59 USE flags is a simple fallback option.
60
61 Since "-" and "_" are already being used in USE-flags I don't know about more
62 possible delimiters - except just using the space and escaping it in URLs as
63 %20.
64
65 The bash won't like that (require escaping), but since portage is written in
66 Python and ebuilds don't need to access binpackages directly that should be a
67 minor problem.
68
69 Best wishes,
70 Arne
71
72 --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
73 - singing a part of the history of free software -
74 http://infinite-hands.draketo.de

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-soc] [GSoC-status] Tree-wide collision checking and files database Arne Babenhauserheide <arne_bab@×××.de>