Gentoo Archives: gentoo-dev

From: Roman Gaufman <hackeron@×××××.com>
To: Ciaran McCreesh <ciaranm@g.o>
Cc: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] A few modest suggestions regarding tree size
Date: Tue, 12 Oct 2004 21:59:19
Message-Id: 921ad39e04101214595d147cc9@mail.gmail.com
In Reply to: [gentoo-dev] A few modest suggestions regarding tree size by Ciaran McCreesh
1 I think its wrong for a maintainer to troll like this on the dev
2 mailing list -- but then again its ciaranm - he's a joker :)
3
4 But seriously, there were some nice ideas in the last few messages,
5 and it wasnt 50Kb improvements, but a major drop in file number and
6 size, like 25% drop, thats pretty major -- also, it doesnt affect
7 developers much in terms of productivity.
8
9 Including my idea of syncing based on a changelog, that will reduce
10 time to literally less than a second without affecting ebuild
11 maintainers at all.
12
13 On Tue, 12 Oct 2004 22:37:25 +0100, Ciaran McCreesh <ciaranm@g.o> wrote:
14 > It has come to my attention that, during recent weeks, a small number of
15 > users have been complaining recently about the size of the rsync tree.
16 > My august colleagues have proposed many ingenious solutions, but
17 > misfortunately they are all complicated and involve a lot of manual
18 > work. I believe the following small changes (which can mostly be
19 > automated) would prove of much larger benefit to the community for a
20 > vastly reduced cost.
21 >
22 > To begin with, I'd like to draw your attention to comments in ebuilds.
23 > It is an oft-forgotten fact that these items provide absolutely no
24 > benefit to the end user. "Surely", I hear you say, "it is not worth
25 > getting hung up over such an insignificant triviality! What harm do a
26 > few trifling little remarks do?". Yet, when actually measured, these
27 > 'innocent minutiae' (as you might call them had you a penchant for
28 > obsolete vocabulary or a predilection for pomposity) account for
29 > approximately 20% of the total ebuild content in the tree. It is obvious
30 > that an immediate ban upon these silly things, alongside a small script
31 > to remove them from the tree, would provide a very large gain for our
32 > users without having to remove any existing code. Adding in a repoman
33 > check to error out if such lines were present would clearly be a good
34 > start.
35 >
36 > Next up are blank lines, which, as all the world knows are of no use at
37 > all to anyone. These account for a staggering 150KBytes of data in the
38 > main tree, which, over a 9600 dialup line, would save us over two
39 > minutes on an emerge sync. Again, removing these pointless wastes of
40 > space via a bash script is trivial.
41 >
42 > Staying with the blank spaces thing, leading whitespaces (which serve no
43 > practical purpose and are only used to make the code "look pretty" --
44 > although how a bash script could ever be considered "pretty" is beyond
45 > my limited mind) account for nearly half a megabyte of data. Clearly
46 > these should immediately be removed and any developer using them in the
47 > future should have their cvs access suspended pending a review of their
48 > status within the project -- as devrel and our managers will tell you,
49 > being nice to the users is our number one priority.
50 >
51 > There are other trivial ways to save space too. The commonly used helper
52 > function "emake", for example, is a shocking five bytes in length.
53 > Replacing this with a much more helpfully named "e", and likewise
54 > replacing "econf" with "c", would gain something like 50KBytes. If we
55 > also replace src_unpack, src_compile and src_install with more
56 > appropriate alternatives we could shave off a further 300KBytes. I have
57 > no doubt that the reader could extend this logic to the other portage
58 > internals and common function names, bring the total up to half a
59 > megabyte or more.
60 >
61 > This can be extended to other functions, of course. In particular I'd
62 > like to draw your attention to the absurdly named "flag-o-matic.eclass".
63 > Merely inheriting this eclass adds at least thirteen bytes (that's over
64 > a hundred bits!) of bloat to an ebuild, and that's before we start on
65 > the ridiculously verbose function names. What's all this "replace-flags"
66 > nonsense I ask you? Any educated programmer can see that "rf" is a far
67 > more useful name. Even those who are not convinced that space needs to
68 > be saved must surely notice how much developer time would be saved
69 > through reduced typing.
70 >
71 > It remains a mystery to me how anyone could possibly have overlooked the
72 > following suggestion. Currently, we install 'dependency information'
73 > inside ebuilds. This is blatantly pointless -- as RedHat have so ably
74 > demonstrated with their 'rpm' installer (and, albeit in a non-Linux
75 > environment, I am assured that Microsoft are in the same boat), there is
76 > no need for automatic dependency tracking and resolution. Our users are
77 > more than capable of working this out for themselves. Similarly, the
78 > HOMEPAGE variable is entirely pointless and has been supersede by Google
79 > [1].
80 >
81 > Oh, and then we come to metadata.xml. As all the world knows, xml is a
82 > massive waste of space, and (as a data interchange format not a data
83 > storage format) utterly unsuited for configuration files. A typical
84 > metadata.xml file is 95%+ noise. By replacing these with flat text files
85 > listing the maintainers, we could save somewhere in the region of one
86 > and a half megabytes.
87 >
88 > Also, no-one has yet considered all the useless fluff in the tree that
89 > nobody actually uses. By removing all ebuilds and eclasses related to
90 > emacs, kde, gnome, php, gaim or java related from the tree, as well as
91 > anything which is only supplied as a binary we could save... Well, I'll
92 > let you do the calculations yourselves. Although mathematics is not the
93 > main focus of my degree, I believe I understand enough to know that the
94 > result is a very big number.
95 >
96 > Similarly, all those "compile fix" patches we supply are obviously
97 > worthless. If anyone has any doubt, I suggest they just look at how
98 > many users are using broken CFLAGS and compilers -- clearly, working
99 > code is not a major concern. We should of course leave in security
100 > patches, since security is our number one priority.
101 >
102 > ChangeLogs are the next thing to fall under my scrutiny. Clearly these
103 > are entirely worthless, since anyone who cares can just read the cvs
104 > logs and use diff. Kiss goodbye to 14MBytes of junk. Hang on? Did I just
105 > say 14MBytes? Yes. Fourteen Megabytes. That's a one, then a four, then
106 > six zeros. That's fourteen million bytes, or over one hundred and ten
107 > million bits. When syncing my GPRS phone whilst sitting inside a large
108 > metal cage in north Yorkshire, that could save me over TWELVE HOURS on
109 > sync time.
110 >
111 > I understand that my previous point may cause a small amount of disquiet
112 > amongst a small proportion of our userbase. After all, how are they
113 > supposed to decide whether to update if they do not know what an update
114 > will change? To them, I must point out that whilst such an attitude is
115 > appropriate for a small hobbyist distribution aimed at skilled users, it
116 > is utterly at odds with what enterprise users require. For them, it is
117 > important that they can perform updates without having to know what they
118 > are doing -- remember that in a corporate environment, any information
119 > is too much information, and time spent reading ChangeLogs is time not
120 > spent doing useful work. Please do not forget that better enterprise
121 > support is our number one priority.
122 >
123 > Finally, I must draw KEYWORDS to your scrutiny, and in particular the
124 > misguided choice of ~ to indicate unstable. In ASCII, the tilde
125 > character is represented by the octet 0x7E (hexadecimal), or, in binary,
126 > 01111110. A cursory glance at this will show that it contains
127 > significantly more 1 bits than 0 bits. As anyone who has had a basic
128 > schooling in the field of compression can tell you, 1 bits do not
129 > compress as well as 0 bits (they don't have as much empty space in the
130 > middle), so clearly we would be better off picking something else. I
131 > propose the ( character, which has only one 1 bit for every four 0 bits.
132 > Also, I suggest we drop the amd64 keyword and just use x86 to save
133 > space, since we all know fine well that amd64 is just like x86 with a
134 > few extra bits stuck onto the end. Or rather, the start, since x86 gets
135 > its bytes backwards...
136 >
137 > Gentlemen, ladies, jforman, I believe those remedies outlined herein are
138 > a far more sensible solution than any other current proposal. I eagerly
139 > await the implementation.
140 >
141 > [1]: http://www.google.ca/
142 >
143 > --
144 > Ciaran McCreesh : Gentoo Developer (Vim, Fluxbox, Sparc, Mips)
145 > Mail : ciaranm at gentoo.org
146 > Web : http://dev.gentoo.org/~ciaranm
147 >
148 >
149 >
150
151 --
152 gentoo-dev@g.o mailing list

Replies

Subject Author
Re: [gentoo-dev] A few modest suggestions regarding tree size Jason Rhinelander <jason@××××××××××××××××.com>