1 |
It has come to my attention that, during recent weeks, a small number of |
2 |
users have been complaining recently about the size of the rsync tree. |
3 |
My august colleagues have proposed many ingenious solutions, but |
4 |
misfortunately they are all complicated and involve a lot of manual |
5 |
work. I believe the following small changes (which can mostly be |
6 |
automated) would prove of much larger benefit to the community for a |
7 |
vastly reduced cost. |
8 |
|
9 |
To begin with, I'd like to draw your attention to comments in ebuilds. |
10 |
It is an oft-forgotten fact that these items provide absolutely no |
11 |
benefit to the end user. "Surely", I hear you say, "it is not worth |
12 |
getting hung up over such an insignificant triviality! What harm do a |
13 |
few trifling little remarks do?". Yet, when actually measured, these |
14 |
'innocent minutiae' (as you might call them had you a penchant for |
15 |
obsolete vocabulary or a predilection for pomposity) account for |
16 |
approximately 20% of the total ebuild content in the tree. It is obvious |
17 |
that an immediate ban upon these silly things, alongside a small script |
18 |
to remove them from the tree, would provide a very large gain for our |
19 |
users without having to remove any existing code. Adding in a repoman |
20 |
check to error out if such lines were present would clearly be a good |
21 |
start. |
22 |
|
23 |
Next up are blank lines, which, as all the world knows are of no use at |
24 |
all to anyone. These account for a staggering 150KBytes of data in the |
25 |
main tree, which, over a 9600 dialup line, would save us over two |
26 |
minutes on an emerge sync. Again, removing these pointless wastes of |
27 |
space via a bash script is trivial. |
28 |
|
29 |
Staying with the blank spaces thing, leading whitespaces (which serve no |
30 |
practical purpose and are only used to make the code "look pretty" -- |
31 |
although how a bash script could ever be considered "pretty" is beyond |
32 |
my limited mind) account for nearly half a megabyte of data. Clearly |
33 |
these should immediately be removed and any developer using them in the |
34 |
future should have their cvs access suspended pending a review of their |
35 |
status within the project -- as devrel and our managers will tell you, |
36 |
being nice to the users is our number one priority. |
37 |
|
38 |
There are other trivial ways to save space too. The commonly used helper |
39 |
function "emake", for example, is a shocking five bytes in length. |
40 |
Replacing this with a much more helpfully named "e", and likewise |
41 |
replacing "econf" with "c", would gain something like 50KBytes. If we |
42 |
also replace src_unpack, src_compile and src_install with more |
43 |
appropriate alternatives we could shave off a further 300KBytes. I have |
44 |
no doubt that the reader could extend this logic to the other portage |
45 |
internals and common function names, bring the total up to half a |
46 |
megabyte or more. |
47 |
|
48 |
This can be extended to other functions, of course. In particular I'd |
49 |
like to draw your attention to the absurdly named "flag-o-matic.eclass". |
50 |
Merely inheriting this eclass adds at least thirteen bytes (that's over |
51 |
a hundred bits!) of bloat to an ebuild, and that's before we start on |
52 |
the ridiculously verbose function names. What's all this "replace-flags" |
53 |
nonsense I ask you? Any educated programmer can see that "rf" is a far |
54 |
more useful name. Even those who are not convinced that space needs to |
55 |
be saved must surely notice how much developer time would be saved |
56 |
through reduced typing. |
57 |
|
58 |
It remains a mystery to me how anyone could possibly have overlooked the |
59 |
following suggestion. Currently, we install 'dependency information' |
60 |
inside ebuilds. This is blatantly pointless -- as RedHat have so ably |
61 |
demonstrated with their 'rpm' installer (and, albeit in a non-Linux |
62 |
environment, I am assured that Microsoft are in the same boat), there is |
63 |
no need for automatic dependency tracking and resolution. Our users are |
64 |
more than capable of working this out for themselves. Similarly, the |
65 |
HOMEPAGE variable is entirely pointless and has been supersede by Google |
66 |
[1]. |
67 |
|
68 |
Oh, and then we come to metadata.xml. As all the world knows, xml is a |
69 |
massive waste of space, and (as a data interchange format not a data |
70 |
storage format) utterly unsuited for configuration files. A typical |
71 |
metadata.xml file is 95%+ noise. By replacing these with flat text files |
72 |
listing the maintainers, we could save somewhere in the region of one |
73 |
and a half megabytes. |
74 |
|
75 |
Also, no-one has yet considered all the useless fluff in the tree that |
76 |
nobody actually uses. By removing all ebuilds and eclasses related to |
77 |
emacs, kde, gnome, php, gaim or java related from the tree, as well as |
78 |
anything which is only supplied as a binary we could save... Well, I'll |
79 |
let you do the calculations yourselves. Although mathematics is not the |
80 |
main focus of my degree, I believe I understand enough to know that the |
81 |
result is a very big number. |
82 |
|
83 |
Similarly, all those "compile fix" patches we supply are obviously |
84 |
worthless. If anyone has any doubt, I suggest they just look at how |
85 |
many users are using broken CFLAGS and compilers -- clearly, working |
86 |
code is not a major concern. We should of course leave in security |
87 |
patches, since security is our number one priority. |
88 |
|
89 |
ChangeLogs are the next thing to fall under my scrutiny. Clearly these |
90 |
are entirely worthless, since anyone who cares can just read the cvs |
91 |
logs and use diff. Kiss goodbye to 14MBytes of junk. Hang on? Did I just |
92 |
say 14MBytes? Yes. Fourteen Megabytes. That's a one, then a four, then |
93 |
six zeros. That's fourteen million bytes, or over one hundred and ten |
94 |
million bits. When syncing my GPRS phone whilst sitting inside a large |
95 |
metal cage in north Yorkshire, that could save me over TWELVE HOURS on |
96 |
sync time. |
97 |
|
98 |
I understand that my previous point may cause a small amount of disquiet |
99 |
amongst a small proportion of our userbase. After all, how are they |
100 |
supposed to decide whether to update if they do not know what an update |
101 |
will change? To them, I must point out that whilst such an attitude is |
102 |
appropriate for a small hobbyist distribution aimed at skilled users, it |
103 |
is utterly at odds with what enterprise users require. For them, it is |
104 |
important that they can perform updates without having to know what they |
105 |
are doing -- remember that in a corporate environment, any information |
106 |
is too much information, and time spent reading ChangeLogs is time not |
107 |
spent doing useful work. Please do not forget that better enterprise |
108 |
support is our number one priority. |
109 |
|
110 |
Finally, I must draw KEYWORDS to your scrutiny, and in particular the |
111 |
misguided choice of ~ to indicate unstable. In ASCII, the tilde |
112 |
character is represented by the octet 0x7E (hexadecimal), or, in binary, |
113 |
01111110. A cursory glance at this will show that it contains |
114 |
significantly more 1 bits than 0 bits. As anyone who has had a basic |
115 |
schooling in the field of compression can tell you, 1 bits do not |
116 |
compress as well as 0 bits (they don't have as much empty space in the |
117 |
middle), so clearly we would be better off picking something else. I |
118 |
propose the ( character, which has only one 1 bit for every four 0 bits. |
119 |
Also, I suggest we drop the amd64 keyword and just use x86 to save |
120 |
space, since we all know fine well that amd64 is just like x86 with a |
121 |
few extra bits stuck onto the end. Or rather, the start, since x86 gets |
122 |
its bytes backwards... |
123 |
|
124 |
Gentlemen, ladies, jforman, I believe those remedies outlined herein are |
125 |
a far more sensible solution than any other current proposal. I eagerly |
126 |
await the implementation. |
127 |
|
128 |
[1]: http://www.google.ca/ |
129 |
|
130 |
-- |
131 |
Ciaran McCreesh : Gentoo Developer (Vim, Fluxbox, Sparc, Mips) |
132 |
Mail : ciaranm at gentoo.org |
133 |
Web : http://dev.gentoo.org/~ciaranm |