Gentoo Archives: gentoo-commits

From: William Hubbs <williamh@g.o>
To: gentoo-commits@l.g.o
Subject: [gentoo-commits] proj/openrc:master commit in: /
Date: Tue, 09 Jan 2018 02:04:20
Message-Id: 1515441552.c2bd33e4838eb56bebe2707f6ca6bd05e9df5b24.williamh@OpenRC
1 commit: c2bd33e4838eb56bebe2707f6ca6bd05e9df5b24
2 Author: Michael Orlitzky <michael <AT> orlitzky <DOT> com>
3 AuthorDate: Mon Sep 4 21:58:09 2017 +0000
4 Commit: William Hubbs <williamh <AT> gentoo <DOT> org>
5 CommitDate: Mon Jan 8 19:59:12 2018 +0000
6 URL: https://gitweb.gentoo.org/proj/openrc.git/commit/?id=c2bd33e4
7
8 service-script-guide.md: new guide for service script authors.
9
10 This fixes #162.
11
12 service-script-guide.md | 381 ++++++++++++++++++++++++++++++++++++++++++++++++
13 1 file changed, 381 insertions(+)
14
15 diff --git a/service-script-guide.md b/service-script-guide.md
16 new file mode 100644
17 index 00000000..5806b808
18 --- /dev/null
19 +++ b/service-script-guide.md
20 @@ -0,0 +1,381 @@
21 +This document is aimed at upstream and distribution developers who
22 +write OpenRC service scripts, either for their own projects, or for
23 +the packages they maintain. It contains advice, suggestions, tips,
24 +tricks, hints, and counsel; cautions, warnings, heads-ups,
25 +admonitions, proscriptions, enjoinders, and reprimands.
26 +
27 +It is intended to prevent common mistakes that are found "in the wild"
28 +by pointing out those mistakes and suggesting alternatives. Each
29 +good/bad thing that you should/not do has a section devoted to it. We
30 +don't consider anything exotic, and assume that you will use
31 +start-stop-daemon to manage a fairly typical long-running UNIX
32 +process.
33 +
34 +# Don't write your own start/stop functions
35 +
36 +OpenRC is capable of stopping and starting most daemons based on the
37 +information that you give it. For a well-behaved daemon that
38 +backgrounds itself and writes its own PID file by default, the
39 +following OpenRC variables are likely all that you'll need:
40 +
41 + * command
42 + * command_args
43 + * pidfile
44 +
45 +Given those three pieces of information, OpenRC will be able to start
46 +and stop the daemon on its own. The following is taken from an
47 +[OpenNTPD](http://www.openntpd.org/) service script:
48 +
49 +```sh
50 +command="/usr/sbin/ntpd"
51 +
52 +# The special RC_SVCNAME variable contains the name of this service.
53 +pidfile="/run/${RC_SVCNAME}.pid"
54 +command_args="-p ${pidfile}"
55 +```
56 +
57 +If the daemon runs in the foreground by default but has options to
58 +background itself and to create a pidfile, then you'll also need
59 +
60 + * command_args_background
61 +
62 +That variable should contain the flags needed to background your
63 +daemon, and to make it write a PID file. Take for example the
64 +following snippet of an
65 +[NRPE](https://github.com/NagiosEnterprises/nrpe) service script:
66 +
67 +```sh
68 +command="/usr/bin/nrpe"
69 +command_args="--config=/etc/nagios/nrpe.cfg"
70 +command_args_background="--daemon"
71 +pidfile="/run/${RC_SVCNAME}.pid"
72 +```
73 +
74 +Since NRPE runs as *root* by default, it needs no special permissions
75 +to write to `/run/nrpe.pid`. OpenRC takes care of starting and
76 +stopping the daemon with the appropriate arguments, even passing the
77 +`--daemon` flag during startup to force NRPE into the background (NRPE
78 +knows how to write its own PID file).
79 +
80 +But what if the daemon isn't so well behaved? What if it doesn't know
81 +how to background itself or create a pidfile? If it can do neither,
82 +then use,
83 +
84 + * command_background=true
85 +
86 +which will additionally pass `--make-pidfile` to start-stop-daemon,
87 +causing it to create the `$pidfile` for you (rather than the daemon
88 +itself being responsible for creating the PID file).
89 +
90 +If your daemon doesn't know how to change its own user or group, then
91 +you can tell start-stop-daemon to launch it as an unprivileged user
92 +with
93 +
94 + * command_user="user:group"
95 +
96 +Finally, if your daemon always forks into the background but fails to
97 +create a PID file, then your only option is to use
98 +
99 + * procname
100 +
101 +With `procname`, OpenRC will try to find the running daemon by
102 +matching the name of its process. That's not so reliable, but daemons
103 +shouldn't background themselves without creating a PID file in the
104 +first place. The next example is part of the [CA NetConsole
105 +Daemon](https://oss.oracle.com/projects/cancd/) service script:
106 +
107 +```sh
108 +command="/usr/sbin/cancd"
109 +command_args="-p ${CANCD_PORT}
110 + -l ${CANCD_LOG_DIR}
111 + -o ${CANCD_LOG_FORMAT}"
112 +command_user="cancd"
113 +
114 +# cancd daemonizes itself, but doesn't write a PID file and doesn't
115 +# have an option to run in the foreground. So, the best we can do
116 +# is try to match the process name when stopping it.
117 +procname="cancd"
118 +```
119 +
120 +To recap, in order of preference:
121 +
122 + 1. If the daemon backgrounds itself and creates its own PID file, use
123 + `pidfile`.
124 + 2. If the daemon does not background itself (or has an option to run
125 + in the foreground) and does not create a PID file, then use
126 + `command_background=true` and `pidfile`.
127 + 3. If the daemon backgrounds itself and does not create a PID file,
128 + use `procname` instead of `pidfile`. But, if your daemon has the
129 + option to run in the foreground, then you should do that instead
130 + (that would be the case in the previous item).
131 + 4. The last case, where the daemon does not background itself but
132 + does create a PID file, doesn't make much sense. If there's a way
133 + to disable the daemon's PID file (or, to write it straight into the
134 + garbage), then do that, and use `command_background=true`.
135 +
136 +# Reloading your daemon's configuration
137 +
138 +Many daemons will reload their configuration files in response to a
139 +signal. Suppose your daemon will reload its configuration in response
140 +to a `SIGHUP`. It's possible to add a new "reload" command to your
141 +service script that performs this action. First, tell the service
142 +script about the new command.
143 +
144 +```sh
145 +extra_started_commands="reload"
146 +```
147 +
148 +We use `extra_started_commands` as opposed to `extra_commands` because
149 +the "reload" action is only valid while the daemon is running (that
150 +is, started). Now, start-stop-daemon can be used to send the signal to
151 +the appropriate process (assuming you've defined the `pidfile`
152 +variable elsewhere):
153 +
154 +```sh
155 +reload() {
156 + ebegin "Reloading ${RC_SVCNAME}"
157 + start-stop-daemon --signal HUP --pidfile "${pidfile}"
158 + eend $?
159 +}
160 +```
161 +
162 +# Don't restart/reload with a broken config
163 +
164 +Often, users will start a daemon, make some configuration change, and
165 +then attempt to restart the daemon. If the recent configuration change
166 +contains a mistake, the result will be that the daemon is stopped but
167 +then cannot be started again (due to the configuration error). It's
168 +possible to prevent that situation with a function that checks for
169 +configuration errors, and a combination of the `start_pre` and
170 +`stop_pre` hooks.
171 +
172 +```sh
173 +checkconfig() {
174 + # However you want to check this...
175 +}
176 +
177 +start_pre() {
178 + # If this isn't a restart, make sure that the user's config isn't
179 + # busted before we try to start the daemon (this will produce
180 + # better error messages than if we just try to start it blindly).
181 + #
182 + # If, on the other hand, this *is* a restart, then the stop_pre
183 + # action will have ensured that the config is usable and we don't
184 + # need to do that again.
185 + if [ "${RC_CMD}" != "restart" ] ; then
186 + checkconfig || return $?
187 + fi
188 +}
189 +
190 +stop_pre() {
191 + # If this is a restart, check to make sure the user's config
192 + # isn't busted before we stop the running daemon.
193 + if [ "${RC_CMD}" = "restart" ] ; then
194 + checkconfig || return $?
195 + fi
196 +}
197 +```
198 +
199 +To prevent a *reload* with a broken config, keep it simple:
200 +
201 +```sh
202 +reload() {
203 + checkconfig || return $?
204 + ebegin "Reloading ${RC_SVCNAME}"
205 + start-stop-daemon --signal HUP --pidfile "${pidfile}"
206 + eend $?
207 +}
208 +```
209 +
210 +# PID files should be writable only by root
211 +
212 +PID files must be writable only by *root*, which means additionally
213 +that they must live in a *root*-owned directory.
214 +
215 +Some daemons run as an unprivileged user account, and create their PID
216 +files (as the unprivileged user) in a path like
217 +`/run/foo/foo.pid`. That can usually be exploited by the unprivileged
218 +user to kill *root* processes, since when a service is stopped, *root*
219 +usually sends a SIGTERM to the contents of the PID file (which are
220 +controlled by the unprivileged user). The main warning sign for that
221 +problem is using `checkpath` to set ownership on the directory
222 +containing the PID file. For example,
223 +
224 +```sh
225 +# BAD BAD BAD BAD BAD BAD BAD BAD
226 +start_pre() {
227 + # Ensure that the pidfile directory is writable by the foo user/group.
228 + checkpath --directory --mode 0700 --owner foo:foo "/run/foo"
229 +}
230 +# BAD BAD BAD BAD BAD BAD BAD BAD
231 +```
232 +
233 +If the *foo* user owns `/run/foo`, then he can put whatever he wants
234 +in the `/run/foo/foo.pid` file. Even if *root* owns the PID file, the
235 +*foo* user can delete it and replace it with his own. To avoid
236 +security concerns, the PID file must be created as *root* and live in
237 +a *root*-owned directory. If your daemon is responsible for forking
238 +and writing its own PID file but the PID file is still owned by the
239 +unprivileged runtime user, then you may have an upstream issue.
240 +
241 +Once the PID file is being created as *root* (before dropping
242 +privileges), it can be written directly to a *root*-owned
243 +directory. Typically this will be `/run` on Linux, and `/var/run`
244 +elsewhere. For example, the *foo* daemon might write
245 +`/run/foo.pid`. No calls to checkpath are needed. Note: there is
246 +nothing technically wrong with using a directory structure like
247 +`/run/foo/foo.pid`, so long as *root* owns the PID file and the
248 +directory containing it.
249 +
250 +Ideally (see "Upstream your service scripts"), your service script
251 +will be integrated upstream and the build system will determine
252 +which of `/run` or `/var/run` is appropriate. For example,
253 +
254 +```sh
255 +pidfile="@piddir@/${RC_SVCNAME}.pid"
256 +```
257 +
258 +A decent example of this is the [Nagios core service
259 +script](https://github.com/NagiosEnterprises/nagioscore/blob/master/openrc-init.in),
260 +where the full path to the PID file is specified at build-time.
261 +
262 +# Don't let the user control the PID file location
263 +
264 +It's usually a mistake to let the end user control the PID file
265 +location through a conf.d variable, for a few reasons:
266 +
267 + 1. When the PID file path is controlled by the user, you need to
268 + ensure that its parent directory exists and is writable. This
269 + adds unnecessary code to the service script.
270 +
271 + 2. If the PID file path changes while the service is running, then
272 + you'll find yourself unable to stop the service.
273 +
274 + 3. The directory that should contain the PID file is best determined
275 + by the upstream build system (see "Upstream your service scripts").
276 + On Linux, the preferred location these days is `/run`. Other systems
277 + still use `/var/run`, though, and a `./configure` script is the
278 + best place to decide which one you want.
279 +
280 + 4. Nobody cares where the PID file is located, anyway.
281 +
282 +Since OpenRC service names must be unique, a value of
283 +
284 +```sh
285 +pidfile="/run/${RC_SVCNAME}.pid"
286 +```
287 +
288 +guarantees that your PID file has a unique name.
289 +
290 +# Upstream your service scripts (for distribution developers)
291 +
292 +The ideal place for an OpenRC service script is **upstream**. Much like
293 +systemd services, a well-crafted OpenRC service script should be
294 +distribution-agnostic, and the best place for it is upstream. Why? For
295 +two reasons. First, having it upstream means that there's a single
296 +authoritative source for improvements. Second, a few paths in every
297 +service script are dependent upon flags passed to the build system. For
298 +example,
299 +
300 +```sh
301 +command=/usr/bin/foo
302 +```
303 +
304 +in an autotools-based build system should really be
305 +
306 +```sh
307 +command=@bindir@/foo
308 +```
309 +
310 +so that the user's value of `--bindir` is respected. If you keep the
311 +service script in your own distribution's repository, then you have to
312 +keep the command path and package synchronized yourself, and that's no
313 +fun.
314 +
315 +# Be wary of "need net" dependencies
316 +
317 +There are two things you need to know about "need net" dependencies:
318 +
319 + 1. They are not satisfied by the loopback interface, so "need net"
320 + requires some *other* interface to be up.
321 +
322 + 2. Depending on the value of `rc_depend_strict` in `rc.conf`, the
323 + "need net" will be satisfied when either *any* non-loopback
324 + interface is up, or when *all* non-loopback interfaces are up.
325 +
326 +The first item means that "need net" is wrong for daemons that are
327 +happy with `0.0.0.0`, and the second point means that "need net" is
328 +wrong for daemons that need a particular (for example, the WAN)
329 +interface. We'll consider the two most common users of "need net";
330 +network clients who access some network resource, and network servers
331 +who provide them.
332 +
333 +## Network clients
334 +
335 +Network clients typically want the WAN interface to be up. That may
336 +tempt you to depend on the WAN interface; but first, you should ask
337 +yourself a question: does anything bad happen if the WAN interface is
338 +not available? In other words, if the administrator wants to disable
339 +the WAN, should the service be stopped? Usually the answer to that
340 +question is "no," and in that case, you should forego the "net"
341 +dependency entirely.
342 +
343 +Suppose, for example, that your service retrieves virus signature
344 +updates from the internet. In order to do its job correctly, it needs
345 +a (working) internet connection. However, the service itself does not
346 +require the WAN interface to be up: if it is, great; otherwise, the
347 +worst that will happen is that a "server unavailable" warning will be
348 +logged. The signature update service will not crash, and—perhaps more
349 +importantly—you don't want it to terminate if the administrator turns
350 +off the WAN interface for a second.
351 +
352 +## Network servers
353 +
354 +Network servers are generally easier to handle than their client
355 +counterparts. Most server daemons listen on `0.0.0.0` (all addresses)
356 +by default, and are therefore satisfied to have the loopback interface
357 +present and operational. OpenRC ships with the loopback service in the
358 +*boot* runlevel, and therefore most server daemons require no further
359 +network dependencies.
360 +
361 +The exceptions to this rule are those daemons who produce negative
362 +side-effects when the WAN is unavailable. For example, the Nagios
363 +server daemon will generate "the sky is falling" alerts for as long as
364 +your monitored hosts are unreachable. So in that case, you should
365 +require some other interface (often the WAN) to be up. A "need"
366 +dependency would be appropriate, because you want Nagios to be
367 +stopped before the network is taken down.
368 +
369 +If your daemon can optionally be configured to listen on a particular
370 +interface, then please see the "Depending on a particular interface"
371 +section.
372 +
373 +## Depending on a particular interface
374 +
375 +If you need to depend on one particular interface, usually it's not
376 +easy to determine programmatically what that interface is. For
377 +example, if your *sshd* daemon listens on `192.168.1.100` (rather than
378 +`0.0.0.0`), then you have two problems:
379 +
380 + 1. Parsing `sshd_config` to figure that out; and
381 +
382 + 2. Determining which network service name corresponds to the
383 + interface for `192.168.1.100`.
384 +
385 +It's generally a bad idea to parse config files in your service
386 +scripts, but the second problem is the harder one. Instead, the most
387 +robust (i.e. the laziest) approach is to make the user specify the
388 +dependency when he makes a change to sshd_config. Include something
389 +like the following in the service configuration file,
390 +
391 +```sh
392 +# Specify the network service that corresponds to the "bind" setting
393 +# in your configuration file. For example, if you bind to 127.0.0.1,
394 +# this should be set to "net.lo" which provides the loopback interface.
395 +rc_need="net.lo"
396 +```
397 +
398 +This is a sensible default for daemons that are happy with `0.0.0.0`,
399 +but lets the user specify something else, like `rc_need="net.wan"` if
400 +he needs it. The burden is on the user to determine the appropriate
401 +service whenever he changes the daemon's configuration file.