Gentoo Archives: gentoo-dev

From: Evan Powers <powers.161@×××.edu>
To: gentoo-dev@g.o
Subject: Re: [gentoo-dev] updated gentoolkit with echangelog modification
Date: Mon, 28 Apr 2003 23:27:47
Message-Id: 200304281927.41956.powers.161@osu.edu
In Reply to: Re: [gentoo-dev] updated gentoolkit with echangelog modification by Nicholas Wourms
1 On Monday 28 April 2003 09:29 am, Nicholas Wourms wrote:
2 > do things. Later, I realized the other way, but I became
3 > convinced that the "new" style made more sense.
4 ...
5 > Can you provide some arguments why
6 > the old style is more proper then the new style? Does it
7 > scale well? Which makes more sense in the short-term until
8 > a more permanent, extensible format can be developed? I'd
9 > like to hear your views on readability.
10
11 The more I think about it, the more I think the "old" style is ultimately a
12 better native storage format, and the less I agree with you about the
13 scalability of the "new" format. I'll argue my point by discussing how to
14 perform the different operations one performs on a change log.
15
16 The short summary of the following is that with the "old" style:
17 1) you can leverage existing software or write simple software to mitigate the
18 conveniences of the "new" format
19 2) with the "new" format, you have to write more complicated software to
20 mitigate the conveniences of the "old" format; existing software is not much
21 help
22 3) as the log grows, the time required to perform operations on the "new"
23 style is at least O(n) worse (assuming O(n^3)/O(n^2) = O(n)) than the same
24 operation on the "old" style
25
26 The operations are:
27 1) adding a new entry which touches one ebuild
28 2) adding a new entry which touches several ebuilds
29 3) extracting the most recent changes of a particular ebuild
30 4) extracting the most recent changes across all ebuilds
31 5) more complicated operations:
32 a) changes made by a developer
33 b) changes made within a time range
34 c) ...
35
36 Let's assume that we're willing to use software to help us, and write it if
37 necessary, in order to hit the optimal native format.
38
39 OK, sort-of-big-O analysis of operations on the "old" format (n is the number
40 of entries in the changelog, e the number of ebuilds):
41 1) O(1) [just add it to the top]
42 2) O(1)
43 3) O(n) [iterate over the n entries and print those against the specified
44 ebuild]
45 4) O(1) [just take the top few]
46 5)
47 a) O(n) [iterate over the n entries and print those by the developer]
48 b) on average, O(n/2) [find the beginning of the range and print it]
49
50 Note that, for each query operation:
51 1) "less ChangeLog", run a / search will work
52 2) "grep -A nnn" almost works
53 3) only need simple awk/perl to extract the result
54
55 sort-of-big-O analysis of operations on the "new" format:
56 1) on average, O(n/2) [have to find the ebuild section]
57 2) O(n) [not cleanly possible, unless you duplicate the log in each section or
58 split it apart and tag each piece with something unique to the original
59 changeset]
60 3) on average, O(n/2) [have to find the ebuild section]
61 4) O(n*log(n)) [have to resort the entries; assume best known sorting algo.]
62 5)
63 a) O(n^2*log(n)) [have to iterate over entries to find those by a particular
64 developer (number presumably proportional to n), then sort them]
65 b) about O(n^2*log(n)) [similar to 5a]
66
67 Note that the new format is never more efficient than by a coefficient, and is
68 in general significantly worse. Further:
69 1) less searches are useless
70 2) "grep | sort" will give you the line numbers
71 3) awk/perl software doubles in complexity (as measured by L.O.C.)
72
73 Evan
74
75 --
76 gentoo-dev@g.o mailing list