Gentoo Archives: gentoo-dev

From: Ciaran McCreesh <ciaranm@g.o>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] A few modest suggestions regarding tree size
Date: Tue, 12 Oct 2004 21:42:12
Message-Id: 20041012223725.6c42698a@snowdrop.home
1 It has come to my attention that, during recent weeks, a small number of
2 users have been complaining recently about the size of the rsync tree.
3 My august colleagues have proposed many ingenious solutions, but
4 misfortunately they are all complicated and involve a lot of manual
5 work. I believe the following small changes (which can mostly be
6 automated) would prove of much larger benefit to the community for a
7 vastly reduced cost.
8
9 To begin with, I'd like to draw your attention to comments in ebuilds.
10 It is an oft-forgotten fact that these items provide absolutely no
11 benefit to the end user. "Surely", I hear you say, "it is not worth
12 getting hung up over such an insignificant triviality! What harm do a
13 few trifling little remarks do?". Yet, when actually measured, these
14 'innocent minutiae' (as you might call them had you a penchant for
15 obsolete vocabulary or a predilection for pomposity) account for
16 approximately 20% of the total ebuild content in the tree. It is obvious
17 that an immediate ban upon these silly things, alongside a small script
18 to remove them from the tree, would provide a very large gain for our
19 users without having to remove any existing code. Adding in a repoman
20 check to error out if such lines were present would clearly be a good
21 start.
22
23 Next up are blank lines, which, as all the world knows are of no use at
24 all to anyone. These account for a staggering 150KBytes of data in the
25 main tree, which, over a 9600 dialup line, would save us over two
26 minutes on an emerge sync. Again, removing these pointless wastes of
27 space via a bash script is trivial.
28
29 Staying with the blank spaces thing, leading whitespaces (which serve no
30 practical purpose and are only used to make the code "look pretty" --
31 although how a bash script could ever be considered "pretty" is beyond
32 my limited mind) account for nearly half a megabyte of data. Clearly
33 these should immediately be removed and any developer using them in the
34 future should have their cvs access suspended pending a review of their
35 status within the project -- as devrel and our managers will tell you,
36 being nice to the users is our number one priority.
37
38 There are other trivial ways to save space too. The commonly used helper
39 function "emake", for example, is a shocking five bytes in length.
40 Replacing this with a much more helpfully named "e", and likewise
41 replacing "econf" with "c", would gain something like 50KBytes. If we
42 also replace src_unpack, src_compile and src_install with more
43 appropriate alternatives we could shave off a further 300KBytes. I have
44 no doubt that the reader could extend this logic to the other portage
45 internals and common function names, bring the total up to half a
46 megabyte or more.
47
48 This can be extended to other functions, of course. In particular I'd
49 like to draw your attention to the absurdly named "flag-o-matic.eclass".
50 Merely inheriting this eclass adds at least thirteen bytes (that's over
51 a hundred bits!) of bloat to an ebuild, and that's before we start on
52 the ridiculously verbose function names. What's all this "replace-flags"
53 nonsense I ask you? Any educated programmer can see that "rf" is a far
54 more useful name. Even those who are not convinced that space needs to
55 be saved must surely notice how much developer time would be saved
56 through reduced typing.
57
58 It remains a mystery to me how anyone could possibly have overlooked the
59 following suggestion. Currently, we install 'dependency information'
60 inside ebuilds. This is blatantly pointless -- as RedHat have so ably
61 demonstrated with their 'rpm' installer (and, albeit in a non-Linux
62 environment, I am assured that Microsoft are in the same boat), there is
63 no need for automatic dependency tracking and resolution. Our users are
64 more than capable of working this out for themselves. Similarly, the
65 HOMEPAGE variable is entirely pointless and has been supersede by Google
66 [1].
67
68 Oh, and then we come to metadata.xml. As all the world knows, xml is a
69 massive waste of space, and (as a data interchange format not a data
70 storage format) utterly unsuited for configuration files. A typical
71 metadata.xml file is 95%+ noise. By replacing these with flat text files
72 listing the maintainers, we could save somewhere in the region of one
73 and a half megabytes.
74
75 Also, no-one has yet considered all the useless fluff in the tree that
76 nobody actually uses. By removing all ebuilds and eclasses related to
77 emacs, kde, gnome, php, gaim or java related from the tree, as well as
78 anything which is only supplied as a binary we could save... Well, I'll
79 let you do the calculations yourselves. Although mathematics is not the
80 main focus of my degree, I believe I understand enough to know that the
81 result is a very big number.
82
83 Similarly, all those "compile fix" patches we supply are obviously
84 worthless. If anyone has any doubt, I suggest they just look at how
85 many users are using broken CFLAGS and compilers -- clearly, working
86 code is not a major concern. We should of course leave in security
87 patches, since security is our number one priority.
88
89 ChangeLogs are the next thing to fall under my scrutiny. Clearly these
90 are entirely worthless, since anyone who cares can just read the cvs
91 logs and use diff. Kiss goodbye to 14MBytes of junk. Hang on? Did I just
92 say 14MBytes? Yes. Fourteen Megabytes. That's a one, then a four, then
93 six zeros. That's fourteen million bytes, or over one hundred and ten
94 million bits. When syncing my GPRS phone whilst sitting inside a large
95 metal cage in north Yorkshire, that could save me over TWELVE HOURS on
96 sync time.
97
98 I understand that my previous point may cause a small amount of disquiet
99 amongst a small proportion of our userbase. After all, how are they
100 supposed to decide whether to update if they do not know what an update
101 will change? To them, I must point out that whilst such an attitude is
102 appropriate for a small hobbyist distribution aimed at skilled users, it
103 is utterly at odds with what enterprise users require. For them, it is
104 important that they can perform updates without having to know what they
105 are doing -- remember that in a corporate environment, any information
106 is too much information, and time spent reading ChangeLogs is time not
107 spent doing useful work. Please do not forget that better enterprise
108 support is our number one priority.
109
110 Finally, I must draw KEYWORDS to your scrutiny, and in particular the
111 misguided choice of ~ to indicate unstable. In ASCII, the tilde
112 character is represented by the octet 0x7E (hexadecimal), or, in binary,
113 01111110. A cursory glance at this will show that it contains
114 significantly more 1 bits than 0 bits. As anyone who has had a basic
115 schooling in the field of compression can tell you, 1 bits do not
116 compress as well as 0 bits (they don't have as much empty space in the
117 middle), so clearly we would be better off picking something else. I
118 propose the ( character, which has only one 1 bit for every four 0 bits.
119 Also, I suggest we drop the amd64 keyword and just use x86 to save
120 space, since we all know fine well that amd64 is just like x86 with a
121 few extra bits stuck onto the end. Or rather, the start, since x86 gets
122 its bytes backwards...
123
124 Gentlemen, ladies, jforman, I believe those remedies outlined herein are
125 a far more sensible solution than any other current proposal. I eagerly
126 await the implementation.
127
128 [1]: http://www.google.ca/
129
130 --
131 Ciaran McCreesh : Gentoo Developer (Vim, Fluxbox, Sparc, Mips)
132 Mail : ciaranm at gentoo.org
133 Web : http://dev.gentoo.org/~ciaranm

Replies