Gentoo Archives: gentoo-dev

From: "Robin H. Johnson" <robbat2@××××××××××××××.net>
To: George Shapovalov <george@g.o>
Cc: gentoo-dev@g.o
Subject: Re: [gentoo-dev] Portage programming question
Date: Sun, 30 Mar 2003 10:47:07
Message-Id: 20030330104705.GA20613@cherenkov.orbis-terrarum.net
In Reply to: Re: [gentoo-dev] Portage programming question by George Shapovalov
1 On Sun, Mar 30, 2003 at 01:13:24AM -0800, George Shapovalov wrote:
2 > On Saturday 29 March 2003 23:51, Robin H. Johnson wrote:
3 > > I'm going to write some C++ or Java instead for what I want. Python
4 > > seems to lack the Sets and other sequence datatypes that would be
5 > > greatly useful.
6 > Well, not sure what are you referring to, but python sure does have sequence
7 > types and sets. In fact all sequence types provide set abilities.
8
9 > You might want to take a look here:
10 > http://www.python.org/doc/current/lib/typesseq.html
11 > and here:
12 > http://www.python.org/doc/current/tut/node7.html
13 I looked at those originally when learning python properly earlier
14 today.
15
16 My definitions:
17 A LIST is a group of unordered and possibly duplicate items. As in
18 LINKED LIST. Commonly O(n) time to access any item. O(n/2) or better if
19 cleverly implemented. O(n) time for add/remove/contains/size. O(1) for
20 add/remove (after search).
21
22 An ARRAY is a LIST with additional pointers to each element allowing
23 constant time O(1) access to each element. O(n) time for arbitrary
24 add/remove.
25
26 A SET is a group of items that are all unique and are accessed in
27 naturally sorted order. O(1) time for access/add/remove/contains/size.
28
29 A MAP is defined as a SET with each element being the key to another
30 data structure. (Python provides this as a dictionary). Same big-O as
31 SET.
32
33 A BIMAP is best explained as a pair of SETs doubly linked in MAPs. It is
34 MAP that can be used in both directions, with significent memory, speed
35 and consistancy improvements, over just using two MAPs.
36 An explaination of BIMAP:
37 A is the set of (A1,A2,A3)
38 B is the set of (B1,B2,B3)
39 BIMAP.parents() might be: {A1=>(B1,B3),A2=>(B2),A3=(B2,B3)}
40 BIMAP.children() would then be: {B1=>(A1),B2=>(A2,A3),B3=(A1,A3)}
41 The actual data of A and B is stored seperately and references to them
42 are used since each item will appear at least twice.
43 BIMAPs offer the major advantage that an element from set of data can be
44 used as a key to obtain related items in constant amortized time.
45 O(1) for almost all operations.
46
47 Hashtable implementation of SET/MAP/BIMAP is cruicial for high
48 performance.
49
50 Python's implementation of a MAP (dictionary) implemented via hash
51 tables, but I can't find anything on tweaking the load factor and
52 initial capacity, which makes a significent difference on data sets of
53 large but known sizes.
54
55 For these structures, various iterators are required as well, as this is
56 the best way to traverse the data. Functional programming methods to
57 apply to the data is also required, such as (Python provides these first
58 three) filter, map, reduce, (Python doesn't seem to have these last
59 three) set union, set difference, set intersection.
60
61 For a SET, I considered using MAP and just ignoring the data field, but
62 that seems like a terrible waste of cpu and memory. I find it really
63 strange that Python doesn't have a SET while it does have a MAP, esp.
64 given that MAPs are a logical extension of SETs.
65
66 It seems Python did at one stage consider adding SETS:
67 http://www.python.org/peps/pep-0218.html
68 But I can't make much sense of the outcome.
69
70 It seems I would have to implement BIMAP myself, since there isn't one
71 in Python that I can find at all. I would admit that it is significently
72 less common data structure than a list/map/set. I did have to write my
73 own Java implementation of it, but it is available in other functional
74 programming languages (Mercury provides a very good example).
75
76 BIMAP is the main structure that I would be using, but SET and MAP also
77 to a slightly lesser degree. I want to create a BIMAP containing the
78 relative paths to each ebuild as one key, and the IUSE data from that
79 ebuild as the other key.
80
81 --
82 Robin Hugh Johnson
83 E-Mail : robbat2@××××××××××××××.net
84 Home Page : http://www.orbis-terrarum.net/?l=people.robbat2
85 ICQ# : 30269588 or 41961639
86 GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85

Replies

Subject Author
Re: [gentoo-dev] Portage programming question George Shapovalov <george@g.o>