1 |
On Monday 28 April 2003 09:29 am, Nicholas Wourms wrote: |
2 |
> do things. Later, I realized the other way, but I became |
3 |
> convinced that the "new" style made more sense. |
4 |
... |
5 |
> Can you provide some arguments why |
6 |
> the old style is more proper then the new style? Does it |
7 |
> scale well? Which makes more sense in the short-term until |
8 |
> a more permanent, extensible format can be developed? I'd |
9 |
> like to hear your views on readability. |
10 |
|
11 |
The more I think about it, the more I think the "old" style is ultimately a |
12 |
better native storage format, and the less I agree with you about the |
13 |
scalability of the "new" format. I'll argue my point by discussing how to |
14 |
perform the different operations one performs on a change log. |
15 |
|
16 |
The short summary of the following is that with the "old" style: |
17 |
1) you can leverage existing software or write simple software to mitigate the |
18 |
conveniences of the "new" format |
19 |
2) with the "new" format, you have to write more complicated software to |
20 |
mitigate the conveniences of the "old" format; existing software is not much |
21 |
help |
22 |
3) as the log grows, the time required to perform operations on the "new" |
23 |
style is at least O(n) worse (assuming O(n^3)/O(n^2) = O(n)) than the same |
24 |
operation on the "old" style |
25 |
|
26 |
The operations are: |
27 |
1) adding a new entry which touches one ebuild |
28 |
2) adding a new entry which touches several ebuilds |
29 |
3) extracting the most recent changes of a particular ebuild |
30 |
4) extracting the most recent changes across all ebuilds |
31 |
5) more complicated operations: |
32 |
a) changes made by a developer |
33 |
b) changes made within a time range |
34 |
c) ... |
35 |
|
36 |
Let's assume that we're willing to use software to help us, and write it if |
37 |
necessary, in order to hit the optimal native format. |
38 |
|
39 |
OK, sort-of-big-O analysis of operations on the "old" format (n is the number |
40 |
of entries in the changelog, e the number of ebuilds): |
41 |
1) O(1) [just add it to the top] |
42 |
2) O(1) |
43 |
3) O(n) [iterate over the n entries and print those against the specified |
44 |
ebuild] |
45 |
4) O(1) [just take the top few] |
46 |
5) |
47 |
a) O(n) [iterate over the n entries and print those by the developer] |
48 |
b) on average, O(n/2) [find the beginning of the range and print it] |
49 |
|
50 |
Note that, for each query operation: |
51 |
1) "less ChangeLog", run a / search will work |
52 |
2) "grep -A nnn" almost works |
53 |
3) only need simple awk/perl to extract the result |
54 |
|
55 |
sort-of-big-O analysis of operations on the "new" format: |
56 |
1) on average, O(n/2) [have to find the ebuild section] |
57 |
2) O(n) [not cleanly possible, unless you duplicate the log in each section or |
58 |
split it apart and tag each piece with something unique to the original |
59 |
changeset] |
60 |
3) on average, O(n/2) [have to find the ebuild section] |
61 |
4) O(n*log(n)) [have to resort the entries; assume best known sorting algo.] |
62 |
5) |
63 |
a) O(n^2*log(n)) [have to iterate over entries to find those by a particular |
64 |
developer (number presumably proportional to n), then sort them] |
65 |
b) about O(n^2*log(n)) [similar to 5a] |
66 |
|
67 |
Note that the new format is never more efficient than by a coefficient, and is |
68 |
in general significantly worse. Further: |
69 |
1) less searches are useless |
70 |
2) "grep | sort" will give you the line numbers |
71 |
3) awk/perl software doubles in complexity (as measured by L.O.C.) |
72 |
|
73 |
Evan |
74 |
|
75 |
-- |
76 |
gentoo-dev@g.o mailing list |