Gentoo Archives: gentoo-dev

From: Steve Long <slong@××××××××××××××××××.uk>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] Re: Re: Re: Default src_install for EAPI-2 or following EAPI
Date: Mon, 22 Sep 2008 00:44:38
Message-Id: gb6pop$997$1@ger.gmane.org
In Reply to: Re: [gentoo-dev] Re: Re: Default src_install for EAPI-2 or following EAPI by Vaeth
1 [Sorry for length]
2 Vaeth wrote:
3
4 > Steve Long wrote:
5 >
6 >> Vaeth wrote:
7 >>
8 >> > let me remark that the more clever way to this is
9 >> >
10 >> > [ -n "${DOCS}" ] && eval "dodoc ${DOCS}"
11 >> >
12 >> eval is _not_ clever. Try: /msg greybot eval
13 >> ..or check http://wooledge.org:8000/BashFAQ/048
14 >
15 > This is not at all related with my remark:
16 > We were speaking about the variable DOCS, which is supposed to be
17 > defined by a package author, not by an unreliable source.
18 > Of course, unreliable data here may allow execution of arbitrary code,
19 > but the package author can execute what he wants anyway.
20 >
21 My point wasn't about security so much as the fact that the author has to
22 worry about how the filenames will be interpreted. You state that
23 saying "it will be eval'ed" is enough. I disagree, as it makes it trickier
24 to handle.
25
26 >> > This way, people can simply quote as they like:
27 >> >
28 >> > DOCS="'filename with spaces' filename_without_space doc/*"
29 >> >
30 >> Yeuch.
31 >
32 > Well: DOCS=('filename with spaces' filename_without_space doc/*)
33 > I cannot see much difference: ( ) vs. " " would optically IMHO not be a
34 > reason to discuss, but the former works only in bash, the latter
35 > practically everyhwere (and so shell programmers should be used to the
36 > latter notation anyway).
37 >
38 That's the thing though; most Gentoo devs don't appear to be shell
39 programmers, and certainly not POSIX sh ones. BASH is simply much more
40 convenient to work with, especially if you are used to another language
41 (that has arrays for example.) That convenience adds up to saved time and
42 cleaner code.
43
44 Again, your formulation only works with eval. It doesn't work easily as a
45 generic thing; it requires thinking about, mental effort from devs who are
46 already overstretched. I guess it comes down to the debate over saving
47 programmer time vs CPU time.
48
49 >> > or also
50 >> >
51 >> > DOCS="just_one_filename_without_special_characters"
52 >> >
53 >> You don't need quotes there.
54 >
55 > This is true, but I wanted to show the way most people will use it.
56 >
57 Sure, but people should also be learning when quotes are needed and when
58 not; that is fundamental to shell-scripting after all?
59
60 >> > or also - when Push from /usr/bin/functions-eix.sh is used
61 >> > (which might be implemented simpler without using other functions):
62 >> >
63 >> > Push DOCS 'filename with spaces' filename_without_space "${S}"/doc/*
64 >> >
65 >> Or just do DOCS+=(foo/* someFile 'some other File') at any point.
66 >
67 > So the difference is saving two tokens. Is this worth to cement
68 > bash-dependency forever in many scripts?
69 >
70 No, my point was that it's part of basic BASH syntax, so anyone looking at
71 it who knows BASH knows exactly what it does, without having to dig through
72 an eclass or the like to make sure. It's cleaner to work with in the lib
73 code too.
74
75 >> BASH arrays will cope with *any* character apart from NUL, which isn't
76 >> allowed in filenames. Can you _guarantee_ the same?
77 >
78 > Yes, Push does _guarantee_ the same. It is actually rather simple to
79 > implement: It puts its argument in '...', separated by spaces,
80 > but replaces ' in the arguments before by '\'' (the last part is a bit
81 > tricky to do in POSIX [although not really hard - only in functions-eix.sh
82 > this is lengthy, because a more general replacement function is used
83 > there]. For the time being, I would not even argue against implementing
84 > Push in a sourced script in bash: This is only one place to change if one
85 > wants more compatibility later on).
86 >
87 Cool, I've seen that trick in makefiles (kernel uses it for echoing cmds
88 iirc.) If you're stuck with a shell that only implements a "stone-age"
89 standard, designed to allow a base common-denominator 15 or 20 years ago,
90 fair enough ;p
91
92 >> Ebuilds require BASH; get over it.
93 >
94 > My remark concerning arrays was meant to be general, not specific for
95 > ebuilds/portage only (although I couldn't find a passage in the bible
96 > where god claimed that ebuilds have to require bash.
97 Yes, hyperbole aside: ebuilds have been built on BASH from the start.
98
99 > Actually, 99% of
100 > the ebuilds would not need bash, if they would be modified in a completely
101 > trivial ways (for the remaining 1% it would need a bit more work)).
102 > If one encourages people to write ebuilds compatible, maybe even for
103 > portage some day a change is realistic (although I am completely aware
104 > that this is not a reasonable project for the near future).
105 >
106 The thing is those changes make the code harder to read and maintain, which
107 matters for the target scripters. It's important to be able to look at the
108 script and tell what it does quickly; it's also important to be able to
109 write and update it quickly and relatively easily.
110
111 >> BASH is as portable as GNU make is, and you clearly have no issue
112 >> depending on that, and Python or C++.
113 >
114 > Do you know which shell might be preferrable in 5 years or 10 years?
115 > Bets are good that thos shell will at least support POSIX;
116 > bets are worse that this shell will support the bash-specific
117 > treatment of arrays.
118 >
119 ksh, zsh and bash all have arrays. Since POSIX came along, the development
120 (which moves forward, remember) of most next-generation shells (ie not
121 those aiming for the embedded space, but for general use) have all included
122 arrays.
123
124 Put it another way: do you believe the GNU shell in 5 or 10 years time will
125 not support arrays?
126
127 >> BTW, POSIX sh doesn't need ${DOCS} or ${S} either, you're just wasting
128 >> characters.
129 >
130 > Yes, but that's the gentoo-recommended way to write variables -
131 > no need to change the style just for changing it.
132 >
133 Well OK, but imo no need to use it, since repoman deals fine with variables
134 without braces. Changing the style to make it easier to work with strikes
135 me as a good idea. (Especially when so many beginners think it means you
136 don't have to quote; it's just a distraction from learning what really
137 matters.)
138
139 >> > the array-less solution is also much simpler to
140 >> > implement, easy to understand from the source, and clearer in usage.
141 >>
142 >> Not to me it's not, it looks awful, to read and to type, as well as being
143 >> fragile.
144 >
145 > Yes, two symbols to type more is a nightmare :)
146 > "Fragile" is not the case as I showed above.
147 >
148 Again, it's not the two symbols. It's having to parse or write that string.
149
150 >> Furthermore you're bringing eval into the script new people are going to
151 >> look at to learn from (it's core functionality, fulfilling a basic task)
152 >
153 > So why should people learn bashisms instead of compatible shell
154 > programming?
155
156 Precisely because bashisms are the features that have been added by people
157 who really know Unix to make their system administration easier. These are
158 the people who really know scripting in an environment where the scarcest
159 resource is human time.
160
161 >> Actually if you factor out that isArr is a utility function (exactly like
162 >> Push) that code is very easy to follow
163 >
164 > Maybe my explanation was unclear here: I am not speaking about the code.
165 > I am speaking about the way it behaves.
166 > DOCS='"a b"' -> two files `"a' and `b"'
167 > DOCS=('"a b"') -> one file `"a b"'
168 > this is just creating confusion by special cases.
169
170 No, it's providing two ways to specify a config variable. One is the
171 backward compatible manner, so that old ebuilds won't break, and people can
172 continue to use the method they're used to for simple things. The other is
173 the way for the ebuild author to specify more complex cases.
174
175 I know for a fact that users like having both. It's providing mechanism, and
176 not enforcing policy. "You must make sure your variables are in a fit state
177 to be eval'ed" is the opposite; it both takes away an option and restricts
178 what the user can easily do.
179
180 And you said yourself above you couldn't see much difference (although the
181 BASH version is a bit cleaner.) All I'll say is the BASH arrays mean you
182 always know what you're quoting; if you use 'a b' it's always one
183 parameter, exactly like all the other quoting you do.
184
185 > If you say instead the argument is eval'ed, everybody who knows any shell
186 > knows what is going on and that you have to quote correspondingly.
187 > And the case distinction is necessary, since for arrays you cannot
188 > shortcut (i.e. you can _never_ avoid the ( ) part) - for variables
189 > you can (as you mentioned, in most cases you can even avoid the " " part).
190 >
191 You can only avoid quotes (and I prefer '' unless I want variable expansion)
192 when it's a single token with no characters like < > ( ) & | or ; which
193 affect tokenisation (a $ obviously affects things too). [ ? or * don't
194 actually matter, since pathname expansion doesn't happen in assignment.
195
196 You can say "everyone knows what is going on" but beginners simply don't,
197 and even advanced sh scripters sometimes get their eval strings wrong.
198 Devoting the extra headspace when you're just trying to get a bug fixed, or
199 your first ebuild written, is just counter to maintaining a distribution
200 imo.
201
202 As for the case distinction, the ebuild author or maintainer doesn't need to
203 make it. It's only relevant for an eclass or base function which actually
204 handles the variable in question, either using it to carry out a task for
205 the ebuild author, or manipulating it.
206
207 It would be easy enough to convert it to array once after sourcing the
208 ebuild so that all functions could rely on it being an array, if that's
209 desired, so that the test would only be run once. Granted it would be a bit
210 more complex if it had to operate on a list of those variables, but it
211 wouldn't need eval, since BASH has syntax designed to obviate the need for
212 eval in nearly all cases.
213
214 >> I'm willing to bet your sh scripts aren't really as portable as you
215 >> think. If you want to see how portable sh is done, read:
216 >> http://sources.redhat.com/autobook/autobook/autobook_210.html#SEC210
217 >> (all of it) and then try to persuade us that we should be writing ebuilds
218 >> like that.
219 >
220 > This is an old rhetorical trick (I don't know its name in English):
221 > You impute that I claimed things which I never said - of course, then it
222 > is easy for you to prove that these things are wrong.
223 What, like saying my point was only about saving two tokens?
224
225 > I _never_ suggested to use code from stone-age for ebuilds
226 You did as far as I am concerned.
227
228 > (I did more
229 > for the eix scripts, and I think that I succeeded meanwhile for all
230 > architectures supported by gentoo, but I did never suggest this for
231 > everybody.
232 I see; so you, a competent and knowledgeable sh scripter, are not even sure
233 whether your sh code works on every arch supported by Gentoo? Whereas BASH
234 is running on every single one of those and clearly ebuilds run on all of
235 them, or they wouldn't be supported. That reinforces my point about BASH
236 portability, which was actually why I posted the link to that doc.
237
238 > BTW: Even for these architectures only very few differences
239 > from POSIX arose - these really old shells which do not have even
240 > functions or other odd bugs seem to have really extinct. But this is a
241 > different topic).
242 >
243 > However, I strongly suggest to avoid bashisms unless absolutely
244 > necessary and reasonable. There are scripts where this is reasonable,
245 > but far too many scripts which use it do not belong to this category.
246
247 You seem to mixing up reasonable and necessary in the last sentence. Granted
248 ebuilds don't need bashisms in many cases; many could indeed be rewritten
249 to only use sh. Nonetheless, it's not about getting absolutely the most
250 efficient use of the processor, but about making it easy for people to
251 write and maintain ebuilds and eclasses.
252
253 Given things like the awkwardness and loss of flexibility[1] in only using [
254 it's entirely reasonable to specify that Gentoo ebuilds use BASH.
255
256 > Using arrays to pass parameters is one of the cases of unnecessary usage
257 > (although this is not widely known - that's that main reason why I posted
258 > the remark).
259
260 Thanks for the discussion, although I do feel we're covering old ground.[2]
261 Given that ebuilds need BASH, have always needed BASH, and will continue to
262 do so, can we get on with actually using BASH and not BASHiSH?
263
264 [1] http://wooledge.org:8000/BashFAQ/031
265 [2] http://thread.gmane.org/gmane.linux.gentoo.devel/52102

Replies

Subject Author
[gentoo-dev] Re: Default src_install for EAPI-2 or following EAPI Duncan <1i5t5.duncan@×××.net>