Gentoo Archives: gentoo-dev

From: Ionen Wolkens <ionen@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] [PATCH 1/2] esed.eclass: new eclass
Date: Fri, 03 Jun 2022 09:26:34
Message-Id: YpnTvp1jMnF5Loy6@eversor
In Reply to: Re: [gentoo-dev] [PATCH 1/2] esed.eclass: new eclass by Ulrich Mueller
1 On Fri, Jun 03, 2022 at 09:14:05AM +0200, Ulrich Mueller wrote:
2 > >>>>> On Tue, 31 May 2022, Ionen Wolkens wrote:
3 >
4 > > +# @FUNCTION: esed
5 > > +# @USAGE: <sed-argument>...
6 > > +# @DESCRIPTION:
7 > > +# sed(1) wrapper that dies if the expression(s) did not modify any files.
8 > > +# sed's -i/--in-place is forced, and so stdin/out cannot be used.
9 >
10 > This sounds like a simple enough task ...
11 >
12 > > +esed() {
13 > > + local -i i
14 > > +
15 > > + if [[ ${esedexps@a} =~ a ]]; then
16 > > + # expression must be before -- but after the rest for e.g. -E to work
17 > > + local -i pos
18 > > + for ((pos=1; pos<=${#}; pos++)); do
19 > > + [[ ${!pos} == -- ]] && break
20 > > + done
21 > > +
22 > > + for ((i=0; i<${#esedexps[@]}; i++)); do
23 > > + [[ ${esedexps[i]} ]] &&
24 > > + esedexps= esed "${@:1:pos-1}" -e "${esedexps[i]}" "${@:pos}"
25 > > + done
26 > > +
27 > > + unset esedexps
28 > > + return 0
29 > > + fi
30 > > +
31 > > + # Roughly attempt to find files in arguments by checking if it's a
32 > > + # readable file (aka s/// is not a file) and does not start with -
33 > > + # (unless after --), then store contents for comparing after sed.
34 > > + local contents=() endopts files=()
35 > > + for ((i=1; i<=${#}; i++)); do
36 > > + if [[ ${!i} == -- && ! -v endopts ]]; then
37 > > + endopts=1
38 > > + elif [[ ${!i} =~ ^(-i|--in-place)$ && ! -v endopts ]]; then
39 > > + # detect rushed sed -i -> esed -i, -i also silently breaks enewsed
40 > > + die "passing ${!i} to ${FUNCNAME[0]} is invalid"
41 > > + elif [[ ${!i} =~ ^(-f|--file)$ && ! -v endopts ]]; then
42 > > + i+=1 # ignore script files
43 > > + elif [[ ( ${!i} != -* || -v endopts ) && -f ${!i} && -r ${!i} ]]; then
44 > > + files+=( "${!i}" )
45 > > +
46 > > + # eval 2>/dev/null to silence \0 warnings if sed binary files
47 > > + eval 'contents+=( "$(<"${!i}")" )' 2>/dev/null \
48 > > + || die "failed to read: ${!i}"
49 > > + fi
50 > > + done
51 > > + (( ${#files[@]} )) || die "no readable files found from '${*}' arguments"
52 > > +
53 > > + local verbose
54 > > + [[ ${ESED_VERBOSE} ]] && type diff &>/dev/null && verbose=1
55 > > +
56 > > + local changed newcontents
57 > > + if [[ -v _esed_output ]]; then
58 > > + [[ -v verbose ]] &&
59 > > + einfo "${FUNCNAME[0]}: sed ${*} > ${_esed_output} ..."
60 > > +
61 > > + sed "${@}" > "${_esed_output}" \
62 > > + || die "failed to run: sed ${*} > ${_esed_output}"
63 > > +
64 > > + eval 'newcontents=$(<${_esed_output})' 2>/dev/null \
65 > > + || die "failed to read: ${_esed_output}"
66 > > +
67 > > + local IFS=$'\n' # sed concats with newline even if none at EOF
68 > > + contents=${contents[*]}
69 > > + unset IFS
70 > > +
71 > > + [[ ${contents} != "${newcontents}" ]] && changed=1
72 > > +
73 > > + [[ -v verbose ]] &&
74 > > + diff -u --color --label="${files[*]}" --label="${_esed_output}" \
75 > > + <(echo "${contents}") <(echo "${newcontents}")
76 > > + else
77 > > + [[ -v verbose ]] && einfo "${FUNCNAME[0]}: sed -i ${*} ..."
78 > > +
79 > > + sed -i "${@}" || die "failed to run: sed -i ${*}"
80 > > +
81 > > + for ((i=0; i<${#files[@]}; i++)); do
82 > > + eval 'newcontents=$(<"${files[i]}")' 2>/dev/null \
83 > > + || die "failed to read: ${files[i]}"
84 > > +
85 > > + if [[ ${contents[i]} != "${newcontents}" ]]; then
86 > > + changed=1
87 > > + [[ -v verbose ]] || break
88 > > + fi
89 > > +
90 > > + [[ -v verbose ]] &&
91 > > + diff -u --color --label="${files[i]}"{,} \
92 > > + <(echo "${contents[i]}") <(echo "${newcontents}")
93 > > + done
94 > > + fi
95 > > +
96 > > + [[ -v changed ]] \
97 > > + || die "no-op: ${FUNCNAME[0]} ${*}${_esed_command:+ (from: ${_esed_command})}"
98 > > +}
99 >
100 > ... but then it's almost 100 lines of shell code, including very
101 > convoluted parsing of parameters. The code for detection whether a
102 > parameter is actually a file is a heuristic at best and looks rather
103
104 Even if it does pickup sed -e "s_im-a_real-file_", it's a semi
105 non-issue given the file won't change before and after and it'll
106 evaluate the others for difference instead.
107
108 Where it gets a bit more complicated is enewsed given this is valid:
109 sed s/// file1 file2 > file3
110 Of course, could always ban multiple files on enewsed if felt really
111 needed, albeit fwiw it's a limitation for a very unlikely scenario.
112
113 "--" support is probably what adds the most complexity here, but
114 will have a problem if any ebuild ever needs to sed a -file
115 (-- is also useful in esedfind fwiw)
116
117 Also I did write tests so the minor complexity hopefully doesn't
118 lead to regressions.
119
120 iwdevtools' qa-sed uses getopt(long) instead, but even that is not
121 perfect given sed has special rules (plus wouldn't want to depend
122 on || ( getopt util-linux ) here).
123
124 > brittle. Also, don't use eval because it is evil.
125
126 This is static evals, they're just evaluating a flat string and it's
127 no different than.. well just running it as-is except it works around
128 a noise issue (the 2>/dev/null doesn't register without it).
129
130 eval is mostly evil when it's:
131 eval "${expanded_variable}=${what_is_this_even}"
132
133 Seeing eval "var=\${not_expanded}" as "evil" makes no sense to me,
134 it's a harmless warning silencer. I feel this is just overreaction
135 to the eval keyword (also in other dev ML post).
136
137 To reproduce the warning:
138 $ var=$(printf "\0") # var=$(<file-with-null-bytes)
139 bash: warning: command substitution: ignored null byte in input
140
141 doesn't work: var=$(printf "\0") 2>/dev/null
142 works: eval 'var=$(printf "\0")' 2>/dev/null
143
144 >
145 > So IMHO this isn't the right approach to the problem. If anything, make
146 > it a simple function with well defined arguments, e.g. exactly one
147 > expression and exactly one file, and call "sed -i" on it.
148
149 I was hoping it wouldn't be a limitation to using sed, aka just add
150 checks and take very little away. Would've even liked to do 100% compat
151 like qa-sed (transparent stdin/out support), but that one is more
152 complicated and drew the line.
153
154 Lacking multiple files on esed (not enewsed) would be kinda annoying
155 I think. esedfind also makes use of the fact it can take multiple
156 files for when people need things like:
157
158 find . -name '*.c' -exec sed -i s/__AVX__/__AVX2__/ {} + || die
159 (esedfind would die if not one macro was replaced)
160
161 I tried to use esed in several ebuilds and see what felt needed or
162 is annoying without hoping people wouldn't mind using it over sed.
163
164 Goal here is not to replace sed nor encourage using it, but just add
165 a guard to ensure changes happened if it does get used. I've seen
166 plenty of ebuilds with a sed that's been broken for >5-10 years.
167
168 >
169 > Then again, we had something similar in the past ("dosed" as a package
170 > manager command) but later banned it for good reason.
171
172 Well, dosed /used/ to be kinda useful to avoid needing:
173
174 mv file "${T}"/tmpfile || die
175 sed s/// "${T}"/tmpfile > file || die
176
177 It's only later that dosed started using -i itself and mostly didn't
178 have a purpose anymore given ebuilds could now use -i too, see:
179
180 https://bugs.gentoo.org/152017
181
182 >
183 > Detection whether the file contents changed also seems complicated.
184 > Why not compute a hash of the file before and after sed operated on it?
185 > md5sum or even cksum should be good enough, since security isn't an
186 > issue here.
187
188 I don't think this would simplify beside now calling more external
189 commands twice then comparing the outputs (capturing command output vs
190 capturing file is not particularly different). May be faster on large
191 amount of files but probably not meaningful for single/few files.
192
193 This may make sed concatenation more complicated to handle too (cat
194 doesn't concatenate the same way sed does wrt newlines at EOF). Not
195 that couldn't ban that on enewsed, although it's more limitations
196 over not much.
197
198 >
199
200 All this aside, I'm not adamant about adding this. If not enough
201 interest then I'll just drop the idea. May try to implement multiple
202 expression support in iwdevtools' qa-sed instead, albeit being a
203 separate QA thing that doesn't know the maintainer's intend (aka
204 which seds are deterministic or not) and may also be missed, it
205 doesn't really ensure ebuilds' seds are kept up to date.
206
207 --
208 ionen

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-dev] [PATCH 1/2] esed.eclass: new eclass Ionen Wolkens <ionen@g.o>