Gentoo Archives: gentoo-dev

From:	Evan Powers <powers.161@×××.edu>
To:	gentoo-dev@g.o
Subject:	Re: [gentoo-dev] updated gentoolkit with echangelog modification
Date:	Mon, 28 Apr 2003 23:27:47
Message-Id:	`200304281927.41956.powers.161@osu.edu`
In Reply to:	Re: [gentoo-dev] updated gentoolkit with echangelog modification by Nicholas Wourms

1	On Monday 28 April 2003 09:29 am, Nicholas Wourms wrote:
2	> do things. Later, I realized the other way, but I became
3	> convinced that the "new" style made more sense.
4	...
5	> Can you provide some arguments why
6	> the old style is more proper then the new style? Does it
7	> scale well? Which makes more sense in the short-term until
8	> a more permanent, extensible format can be developed? I'd
9	> like to hear your views on readability.
10
11	The more I think about it, the more I think the "old" style is ultimately a
12	better native storage format, and the less I agree with you about the
13	scalability of the "new" format. I'll argue my point by discussing how to
14	perform the different operations one performs on a change log.
15
16	The short summary of the following is that with the "old" style:
17	1) you can leverage existing software or write simple software to mitigate the
18	conveniences of the "new" format
19	2) with the "new" format, you have to write more complicated software to
20	mitigate the conveniences of the "old" format; existing software is not much
21	help
22	3) as the log grows, the time required to perform operations on the "new"
23	style is at least O(n) worse (assuming O(n^3)/O(n^2) = O(n)) than the same
24	operation on the "old" style
25
26	The operations are:
27	1) adding a new entry which touches one ebuild
28	2) adding a new entry which touches several ebuilds
29	3) extracting the most recent changes of a particular ebuild
30	4) extracting the most recent changes across all ebuilds
31	5) more complicated operations:
32	a) changes made by a developer
33	b) changes made within a time range
34	c) ...
35
36	Let's assume that we're willing to use software to help us, and write it if
37	necessary, in order to hit the optimal native format.
38
39	OK, sort-of-big-O analysis of operations on the "old" format (n is the number
40	of entries in the changelog, e the number of ebuilds):
41	1) O(1) [just add it to the top]
42	2) O(1)
43	3) O(n) [iterate over the n entries and print those against the specified
44	ebuild]
45	4) O(1) [just take the top few]
46	5)
47	a) O(n) [iterate over the n entries and print those by the developer]
48	b) on average, O(n/2) [find the beginning of the range and print it]
49
50	Note that, for each query operation:
51	1) "less ChangeLog", run a / search will work
52	2) "grep -A nnn" almost works
53	3) only need simple awk/perl to extract the result
54
55	sort-of-big-O analysis of operations on the "new" format:
56	1) on average, O(n/2) [have to find the ebuild section]
57	2) O(n) [not cleanly possible, unless you duplicate the log in each section or
58	split it apart and tag each piece with something unique to the original
59	changeset]
60	3) on average, O(n/2) [have to find the ebuild section]
61	4) O(n*log(n)) [have to resort the entries; assume best known sorting algo.]
62	5)
63	a) O(n^2*log(n)) [have to iterate over entries to find those by a particular
64	developer (number presumably proportional to n), then sort them]
65	b) about O(n^2*log(n)) [similar to 5a]
66
67	Note that the new format is never more efficient than by a coefficient, and is
68	in general significantly worse. Further:
69	1) less searches are useless
70	2) "grep \| sort" will give you the line numbers
71	3) awk/perl software doubles in complexity (as measured by L.O.C.)
72
73	Evan
74
75	--
76	gentoo-dev@g.o mailing list

Report Message

Find on MARC Find on Google Groups