1 |
[Sorry for length] |
2 |
Vaeth wrote: |
3 |
|
4 |
> Steve Long wrote: |
5 |
> |
6 |
>> Vaeth wrote: |
7 |
>> |
8 |
>> > let me remark that the more clever way to this is |
9 |
>> > |
10 |
>> > [ -n "${DOCS}" ] && eval "dodoc ${DOCS}" |
11 |
>> > |
12 |
>> eval is _not_ clever. Try: /msg greybot eval |
13 |
>> ..or check http://wooledge.org:8000/BashFAQ/048 |
14 |
> |
15 |
> This is not at all related with my remark: |
16 |
> We were speaking about the variable DOCS, which is supposed to be |
17 |
> defined by a package author, not by an unreliable source. |
18 |
> Of course, unreliable data here may allow execution of arbitrary code, |
19 |
> but the package author can execute what he wants anyway. |
20 |
> |
21 |
My point wasn't about security so much as the fact that the author has to |
22 |
worry about how the filenames will be interpreted. You state that |
23 |
saying "it will be eval'ed" is enough. I disagree, as it makes it trickier |
24 |
to handle. |
25 |
|
26 |
>> > This way, people can simply quote as they like: |
27 |
>> > |
28 |
>> > DOCS="'filename with spaces' filename_without_space doc/*" |
29 |
>> > |
30 |
>> Yeuch. |
31 |
> |
32 |
> Well: DOCS=('filename with spaces' filename_without_space doc/*) |
33 |
> I cannot see much difference: ( ) vs. " " would optically IMHO not be a |
34 |
> reason to discuss, but the former works only in bash, the latter |
35 |
> practically everyhwere (and so shell programmers should be used to the |
36 |
> latter notation anyway). |
37 |
> |
38 |
That's the thing though; most Gentoo devs don't appear to be shell |
39 |
programmers, and certainly not POSIX sh ones. BASH is simply much more |
40 |
convenient to work with, especially if you are used to another language |
41 |
(that has arrays for example.) That convenience adds up to saved time and |
42 |
cleaner code. |
43 |
|
44 |
Again, your formulation only works with eval. It doesn't work easily as a |
45 |
generic thing; it requires thinking about, mental effort from devs who are |
46 |
already overstretched. I guess it comes down to the debate over saving |
47 |
programmer time vs CPU time. |
48 |
|
49 |
>> > or also |
50 |
>> > |
51 |
>> > DOCS="just_one_filename_without_special_characters" |
52 |
>> > |
53 |
>> You don't need quotes there. |
54 |
> |
55 |
> This is true, but I wanted to show the way most people will use it. |
56 |
> |
57 |
Sure, but people should also be learning when quotes are needed and when |
58 |
not; that is fundamental to shell-scripting after all? |
59 |
|
60 |
>> > or also - when Push from /usr/bin/functions-eix.sh is used |
61 |
>> > (which might be implemented simpler without using other functions): |
62 |
>> > |
63 |
>> > Push DOCS 'filename with spaces' filename_without_space "${S}"/doc/* |
64 |
>> > |
65 |
>> Or just do DOCS+=(foo/* someFile 'some other File') at any point. |
66 |
> |
67 |
> So the difference is saving two tokens. Is this worth to cement |
68 |
> bash-dependency forever in many scripts? |
69 |
> |
70 |
No, my point was that it's part of basic BASH syntax, so anyone looking at |
71 |
it who knows BASH knows exactly what it does, without having to dig through |
72 |
an eclass or the like to make sure. It's cleaner to work with in the lib |
73 |
code too. |
74 |
|
75 |
>> BASH arrays will cope with *any* character apart from NUL, which isn't |
76 |
>> allowed in filenames. Can you _guarantee_ the same? |
77 |
> |
78 |
> Yes, Push does _guarantee_ the same. It is actually rather simple to |
79 |
> implement: It puts its argument in '...', separated by spaces, |
80 |
> but replaces ' in the arguments before by '\'' (the last part is a bit |
81 |
> tricky to do in POSIX [although not really hard - only in functions-eix.sh |
82 |
> this is lengthy, because a more general replacement function is used |
83 |
> there]. For the time being, I would not even argue against implementing |
84 |
> Push in a sourced script in bash: This is only one place to change if one |
85 |
> wants more compatibility later on). |
86 |
> |
87 |
Cool, I've seen that trick in makefiles (kernel uses it for echoing cmds |
88 |
iirc.) If you're stuck with a shell that only implements a "stone-age" |
89 |
standard, designed to allow a base common-denominator 15 or 20 years ago, |
90 |
fair enough ;p |
91 |
|
92 |
>> Ebuilds require BASH; get over it. |
93 |
> |
94 |
> My remark concerning arrays was meant to be general, not specific for |
95 |
> ebuilds/portage only (although I couldn't find a passage in the bible |
96 |
> where god claimed that ebuilds have to require bash. |
97 |
Yes, hyperbole aside: ebuilds have been built on BASH from the start. |
98 |
|
99 |
> Actually, 99% of |
100 |
> the ebuilds would not need bash, if they would be modified in a completely |
101 |
> trivial ways (for the remaining 1% it would need a bit more work)). |
102 |
> If one encourages people to write ebuilds compatible, maybe even for |
103 |
> portage some day a change is realistic (although I am completely aware |
104 |
> that this is not a reasonable project for the near future). |
105 |
> |
106 |
The thing is those changes make the code harder to read and maintain, which |
107 |
matters for the target scripters. It's important to be able to look at the |
108 |
script and tell what it does quickly; it's also important to be able to |
109 |
write and update it quickly and relatively easily. |
110 |
|
111 |
>> BASH is as portable as GNU make is, and you clearly have no issue |
112 |
>> depending on that, and Python or C++. |
113 |
> |
114 |
> Do you know which shell might be preferrable in 5 years or 10 years? |
115 |
> Bets are good that thos shell will at least support POSIX; |
116 |
> bets are worse that this shell will support the bash-specific |
117 |
> treatment of arrays. |
118 |
> |
119 |
ksh, zsh and bash all have arrays. Since POSIX came along, the development |
120 |
(which moves forward, remember) of most next-generation shells (ie not |
121 |
those aiming for the embedded space, but for general use) have all included |
122 |
arrays. |
123 |
|
124 |
Put it another way: do you believe the GNU shell in 5 or 10 years time will |
125 |
not support arrays? |
126 |
|
127 |
>> BTW, POSIX sh doesn't need ${DOCS} or ${S} either, you're just wasting |
128 |
>> characters. |
129 |
> |
130 |
> Yes, but that's the gentoo-recommended way to write variables - |
131 |
> no need to change the style just for changing it. |
132 |
> |
133 |
Well OK, but imo no need to use it, since repoman deals fine with variables |
134 |
without braces. Changing the style to make it easier to work with strikes |
135 |
me as a good idea. (Especially when so many beginners think it means you |
136 |
don't have to quote; it's just a distraction from learning what really |
137 |
matters.) |
138 |
|
139 |
>> > the array-less solution is also much simpler to |
140 |
>> > implement, easy to understand from the source, and clearer in usage. |
141 |
>> |
142 |
>> Not to me it's not, it looks awful, to read and to type, as well as being |
143 |
>> fragile. |
144 |
> |
145 |
> Yes, two symbols to type more is a nightmare :) |
146 |
> "Fragile" is not the case as I showed above. |
147 |
> |
148 |
Again, it's not the two symbols. It's having to parse or write that string. |
149 |
|
150 |
>> Furthermore you're bringing eval into the script new people are going to |
151 |
>> look at to learn from (it's core functionality, fulfilling a basic task) |
152 |
> |
153 |
> So why should people learn bashisms instead of compatible shell |
154 |
> programming? |
155 |
|
156 |
Precisely because bashisms are the features that have been added by people |
157 |
who really know Unix to make their system administration easier. These are |
158 |
the people who really know scripting in an environment where the scarcest |
159 |
resource is human time. |
160 |
|
161 |
>> Actually if you factor out that isArr is a utility function (exactly like |
162 |
>> Push) that code is very easy to follow |
163 |
> |
164 |
> Maybe my explanation was unclear here: I am not speaking about the code. |
165 |
> I am speaking about the way it behaves. |
166 |
> DOCS='"a b"' -> two files `"a' and `b"' |
167 |
> DOCS=('"a b"') -> one file `"a b"' |
168 |
> this is just creating confusion by special cases. |
169 |
|
170 |
No, it's providing two ways to specify a config variable. One is the |
171 |
backward compatible manner, so that old ebuilds won't break, and people can |
172 |
continue to use the method they're used to for simple things. The other is |
173 |
the way for the ebuild author to specify more complex cases. |
174 |
|
175 |
I know for a fact that users like having both. It's providing mechanism, and |
176 |
not enforcing policy. "You must make sure your variables are in a fit state |
177 |
to be eval'ed" is the opposite; it both takes away an option and restricts |
178 |
what the user can easily do. |
179 |
|
180 |
And you said yourself above you couldn't see much difference (although the |
181 |
BASH version is a bit cleaner.) All I'll say is the BASH arrays mean you |
182 |
always know what you're quoting; if you use 'a b' it's always one |
183 |
parameter, exactly like all the other quoting you do. |
184 |
|
185 |
> If you say instead the argument is eval'ed, everybody who knows any shell |
186 |
> knows what is going on and that you have to quote correspondingly. |
187 |
> And the case distinction is necessary, since for arrays you cannot |
188 |
> shortcut (i.e. you can _never_ avoid the ( ) part) - for variables |
189 |
> you can (as you mentioned, in most cases you can even avoid the " " part). |
190 |
> |
191 |
You can only avoid quotes (and I prefer '' unless I want variable expansion) |
192 |
when it's a single token with no characters like < > ( ) & | or ; which |
193 |
affect tokenisation (a $ obviously affects things too). [ ? or * don't |
194 |
actually matter, since pathname expansion doesn't happen in assignment. |
195 |
|
196 |
You can say "everyone knows what is going on" but beginners simply don't, |
197 |
and even advanced sh scripters sometimes get their eval strings wrong. |
198 |
Devoting the extra headspace when you're just trying to get a bug fixed, or |
199 |
your first ebuild written, is just counter to maintaining a distribution |
200 |
imo. |
201 |
|
202 |
As for the case distinction, the ebuild author or maintainer doesn't need to |
203 |
make it. It's only relevant for an eclass or base function which actually |
204 |
handles the variable in question, either using it to carry out a task for |
205 |
the ebuild author, or manipulating it. |
206 |
|
207 |
It would be easy enough to convert it to array once after sourcing the |
208 |
ebuild so that all functions could rely on it being an array, if that's |
209 |
desired, so that the test would only be run once. Granted it would be a bit |
210 |
more complex if it had to operate on a list of those variables, but it |
211 |
wouldn't need eval, since BASH has syntax designed to obviate the need for |
212 |
eval in nearly all cases. |
213 |
|
214 |
>> I'm willing to bet your sh scripts aren't really as portable as you |
215 |
>> think. If you want to see how portable sh is done, read: |
216 |
>> http://sources.redhat.com/autobook/autobook/autobook_210.html#SEC210 |
217 |
>> (all of it) and then try to persuade us that we should be writing ebuilds |
218 |
>> like that. |
219 |
> |
220 |
> This is an old rhetorical trick (I don't know its name in English): |
221 |
> You impute that I claimed things which I never said - of course, then it |
222 |
> is easy for you to prove that these things are wrong. |
223 |
What, like saying my point was only about saving two tokens? |
224 |
|
225 |
> I _never_ suggested to use code from stone-age for ebuilds |
226 |
You did as far as I am concerned. |
227 |
|
228 |
> (I did more |
229 |
> for the eix scripts, and I think that I succeeded meanwhile for all |
230 |
> architectures supported by gentoo, but I did never suggest this for |
231 |
> everybody. |
232 |
I see; so you, a competent and knowledgeable sh scripter, are not even sure |
233 |
whether your sh code works on every arch supported by Gentoo? Whereas BASH |
234 |
is running on every single one of those and clearly ebuilds run on all of |
235 |
them, or they wouldn't be supported. That reinforces my point about BASH |
236 |
portability, which was actually why I posted the link to that doc. |
237 |
|
238 |
> BTW: Even for these architectures only very few differences |
239 |
> from POSIX arose - these really old shells which do not have even |
240 |
> functions or other odd bugs seem to have really extinct. But this is a |
241 |
> different topic). |
242 |
> |
243 |
> However, I strongly suggest to avoid bashisms unless absolutely |
244 |
> necessary and reasonable. There are scripts where this is reasonable, |
245 |
> but far too many scripts which use it do not belong to this category. |
246 |
|
247 |
You seem to mixing up reasonable and necessary in the last sentence. Granted |
248 |
ebuilds don't need bashisms in many cases; many could indeed be rewritten |
249 |
to only use sh. Nonetheless, it's not about getting absolutely the most |
250 |
efficient use of the processor, but about making it easy for people to |
251 |
write and maintain ebuilds and eclasses. |
252 |
|
253 |
Given things like the awkwardness and loss of flexibility[1] in only using [ |
254 |
it's entirely reasonable to specify that Gentoo ebuilds use BASH. |
255 |
|
256 |
> Using arrays to pass parameters is one of the cases of unnecessary usage |
257 |
> (although this is not widely known - that's that main reason why I posted |
258 |
> the remark). |
259 |
|
260 |
Thanks for the discussion, although I do feel we're covering old ground.[2] |
261 |
Given that ebuilds need BASH, have always needed BASH, and will continue to |
262 |
do so, can we get on with actually using BASH and not BASHiSH? |
263 |
|
264 |
[1] http://wooledge.org:8000/BashFAQ/031 |
265 |
[2] http://thread.gmane.org/gmane.linux.gentoo.devel/52102 |