1 |
commit: c2bd33e4838eb56bebe2707f6ca6bd05e9df5b24 |
2 |
Author: Michael Orlitzky <michael <AT> orlitzky <DOT> com> |
3 |
AuthorDate: Mon Sep 4 21:58:09 2017 +0000 |
4 |
Commit: William Hubbs <williamh <AT> gentoo <DOT> org> |
5 |
CommitDate: Mon Jan 8 19:59:12 2018 +0000 |
6 |
URL: https://gitweb.gentoo.org/proj/openrc.git/commit/?id=c2bd33e4 |
7 |
|
8 |
service-script-guide.md: new guide for service script authors. |
9 |
|
10 |
This fixes #162. |
11 |
|
12 |
service-script-guide.md | 381 ++++++++++++++++++++++++++++++++++++++++++++++++ |
13 |
1 file changed, 381 insertions(+) |
14 |
|
15 |
diff --git a/service-script-guide.md b/service-script-guide.md |
16 |
new file mode 100644 |
17 |
index 00000000..5806b808 |
18 |
--- /dev/null |
19 |
+++ b/service-script-guide.md |
20 |
@@ -0,0 +1,381 @@ |
21 |
+This document is aimed at upstream and distribution developers who |
22 |
+write OpenRC service scripts, either for their own projects, or for |
23 |
+the packages they maintain. It contains advice, suggestions, tips, |
24 |
+tricks, hints, and counsel; cautions, warnings, heads-ups, |
25 |
+admonitions, proscriptions, enjoinders, and reprimands. |
26 |
+ |
27 |
+It is intended to prevent common mistakes that are found "in the wild" |
28 |
+by pointing out those mistakes and suggesting alternatives. Each |
29 |
+good/bad thing that you should/not do has a section devoted to it. We |
30 |
+don't consider anything exotic, and assume that you will use |
31 |
+start-stop-daemon to manage a fairly typical long-running UNIX |
32 |
+process. |
33 |
+ |
34 |
+# Don't write your own start/stop functions |
35 |
+ |
36 |
+OpenRC is capable of stopping and starting most daemons based on the |
37 |
+information that you give it. For a well-behaved daemon that |
38 |
+backgrounds itself and writes its own PID file by default, the |
39 |
+following OpenRC variables are likely all that you'll need: |
40 |
+ |
41 |
+ * command |
42 |
+ * command_args |
43 |
+ * pidfile |
44 |
+ |
45 |
+Given those three pieces of information, OpenRC will be able to start |
46 |
+and stop the daemon on its own. The following is taken from an |
47 |
+[OpenNTPD](http://www.openntpd.org/) service script: |
48 |
+ |
49 |
+```sh |
50 |
+command="/usr/sbin/ntpd" |
51 |
+ |
52 |
+# The special RC_SVCNAME variable contains the name of this service. |
53 |
+pidfile="/run/${RC_SVCNAME}.pid" |
54 |
+command_args="-p ${pidfile}" |
55 |
+``` |
56 |
+ |
57 |
+If the daemon runs in the foreground by default but has options to |
58 |
+background itself and to create a pidfile, then you'll also need |
59 |
+ |
60 |
+ * command_args_background |
61 |
+ |
62 |
+That variable should contain the flags needed to background your |
63 |
+daemon, and to make it write a PID file. Take for example the |
64 |
+following snippet of an |
65 |
+[NRPE](https://github.com/NagiosEnterprises/nrpe) service script: |
66 |
+ |
67 |
+```sh |
68 |
+command="/usr/bin/nrpe" |
69 |
+command_args="--config=/etc/nagios/nrpe.cfg" |
70 |
+command_args_background="--daemon" |
71 |
+pidfile="/run/${RC_SVCNAME}.pid" |
72 |
+``` |
73 |
+ |
74 |
+Since NRPE runs as *root* by default, it needs no special permissions |
75 |
+to write to `/run/nrpe.pid`. OpenRC takes care of starting and |
76 |
+stopping the daemon with the appropriate arguments, even passing the |
77 |
+`--daemon` flag during startup to force NRPE into the background (NRPE |
78 |
+knows how to write its own PID file). |
79 |
+ |
80 |
+But what if the daemon isn't so well behaved? What if it doesn't know |
81 |
+how to background itself or create a pidfile? If it can do neither, |
82 |
+then use, |
83 |
+ |
84 |
+ * command_background=true |
85 |
+ |
86 |
+which will additionally pass `--make-pidfile` to start-stop-daemon, |
87 |
+causing it to create the `$pidfile` for you (rather than the daemon |
88 |
+itself being responsible for creating the PID file). |
89 |
+ |
90 |
+If your daemon doesn't know how to change its own user or group, then |
91 |
+you can tell start-stop-daemon to launch it as an unprivileged user |
92 |
+with |
93 |
+ |
94 |
+ * command_user="user:group" |
95 |
+ |
96 |
+Finally, if your daemon always forks into the background but fails to |
97 |
+create a PID file, then your only option is to use |
98 |
+ |
99 |
+ * procname |
100 |
+ |
101 |
+With `procname`, OpenRC will try to find the running daemon by |
102 |
+matching the name of its process. That's not so reliable, but daemons |
103 |
+shouldn't background themselves without creating a PID file in the |
104 |
+first place. The next example is part of the [CA NetConsole |
105 |
+Daemon](https://oss.oracle.com/projects/cancd/) service script: |
106 |
+ |
107 |
+```sh |
108 |
+command="/usr/sbin/cancd" |
109 |
+command_args="-p ${CANCD_PORT} |
110 |
+ -l ${CANCD_LOG_DIR} |
111 |
+ -o ${CANCD_LOG_FORMAT}" |
112 |
+command_user="cancd" |
113 |
+ |
114 |
+# cancd daemonizes itself, but doesn't write a PID file and doesn't |
115 |
+# have an option to run in the foreground. So, the best we can do |
116 |
+# is try to match the process name when stopping it. |
117 |
+procname="cancd" |
118 |
+``` |
119 |
+ |
120 |
+To recap, in order of preference: |
121 |
+ |
122 |
+ 1. If the daemon backgrounds itself and creates its own PID file, use |
123 |
+ `pidfile`. |
124 |
+ 2. If the daemon does not background itself (or has an option to run |
125 |
+ in the foreground) and does not create a PID file, then use |
126 |
+ `command_background=true` and `pidfile`. |
127 |
+ 3. If the daemon backgrounds itself and does not create a PID file, |
128 |
+ use `procname` instead of `pidfile`. But, if your daemon has the |
129 |
+ option to run in the foreground, then you should do that instead |
130 |
+ (that would be the case in the previous item). |
131 |
+ 4. The last case, where the daemon does not background itself but |
132 |
+ does create a PID file, doesn't make much sense. If there's a way |
133 |
+ to disable the daemon's PID file (or, to write it straight into the |
134 |
+ garbage), then do that, and use `command_background=true`. |
135 |
+ |
136 |
+# Reloading your daemon's configuration |
137 |
+ |
138 |
+Many daemons will reload their configuration files in response to a |
139 |
+signal. Suppose your daemon will reload its configuration in response |
140 |
+to a `SIGHUP`. It's possible to add a new "reload" command to your |
141 |
+service script that performs this action. First, tell the service |
142 |
+script about the new command. |
143 |
+ |
144 |
+```sh |
145 |
+extra_started_commands="reload" |
146 |
+``` |
147 |
+ |
148 |
+We use `extra_started_commands` as opposed to `extra_commands` because |
149 |
+the "reload" action is only valid while the daemon is running (that |
150 |
+is, started). Now, start-stop-daemon can be used to send the signal to |
151 |
+the appropriate process (assuming you've defined the `pidfile` |
152 |
+variable elsewhere): |
153 |
+ |
154 |
+```sh |
155 |
+reload() { |
156 |
+ ebegin "Reloading ${RC_SVCNAME}" |
157 |
+ start-stop-daemon --signal HUP --pidfile "${pidfile}" |
158 |
+ eend $? |
159 |
+} |
160 |
+``` |
161 |
+ |
162 |
+# Don't restart/reload with a broken config |
163 |
+ |
164 |
+Often, users will start a daemon, make some configuration change, and |
165 |
+then attempt to restart the daemon. If the recent configuration change |
166 |
+contains a mistake, the result will be that the daemon is stopped but |
167 |
+then cannot be started again (due to the configuration error). It's |
168 |
+possible to prevent that situation with a function that checks for |
169 |
+configuration errors, and a combination of the `start_pre` and |
170 |
+`stop_pre` hooks. |
171 |
+ |
172 |
+```sh |
173 |
+checkconfig() { |
174 |
+ # However you want to check this... |
175 |
+} |
176 |
+ |
177 |
+start_pre() { |
178 |
+ # If this isn't a restart, make sure that the user's config isn't |
179 |
+ # busted before we try to start the daemon (this will produce |
180 |
+ # better error messages than if we just try to start it blindly). |
181 |
+ # |
182 |
+ # If, on the other hand, this *is* a restart, then the stop_pre |
183 |
+ # action will have ensured that the config is usable and we don't |
184 |
+ # need to do that again. |
185 |
+ if [ "${RC_CMD}" != "restart" ] ; then |
186 |
+ checkconfig || return $? |
187 |
+ fi |
188 |
+} |
189 |
+ |
190 |
+stop_pre() { |
191 |
+ # If this is a restart, check to make sure the user's config |
192 |
+ # isn't busted before we stop the running daemon. |
193 |
+ if [ "${RC_CMD}" = "restart" ] ; then |
194 |
+ checkconfig || return $? |
195 |
+ fi |
196 |
+} |
197 |
+``` |
198 |
+ |
199 |
+To prevent a *reload* with a broken config, keep it simple: |
200 |
+ |
201 |
+```sh |
202 |
+reload() { |
203 |
+ checkconfig || return $? |
204 |
+ ebegin "Reloading ${RC_SVCNAME}" |
205 |
+ start-stop-daemon --signal HUP --pidfile "${pidfile}" |
206 |
+ eend $? |
207 |
+} |
208 |
+``` |
209 |
+ |
210 |
+# PID files should be writable only by root |
211 |
+ |
212 |
+PID files must be writable only by *root*, which means additionally |
213 |
+that they must live in a *root*-owned directory. |
214 |
+ |
215 |
+Some daemons run as an unprivileged user account, and create their PID |
216 |
+files (as the unprivileged user) in a path like |
217 |
+`/run/foo/foo.pid`. That can usually be exploited by the unprivileged |
218 |
+user to kill *root* processes, since when a service is stopped, *root* |
219 |
+usually sends a SIGTERM to the contents of the PID file (which are |
220 |
+controlled by the unprivileged user). The main warning sign for that |
221 |
+problem is using `checkpath` to set ownership on the directory |
222 |
+containing the PID file. For example, |
223 |
+ |
224 |
+```sh |
225 |
+# BAD BAD BAD BAD BAD BAD BAD BAD |
226 |
+start_pre() { |
227 |
+ # Ensure that the pidfile directory is writable by the foo user/group. |
228 |
+ checkpath --directory --mode 0700 --owner foo:foo "/run/foo" |
229 |
+} |
230 |
+# BAD BAD BAD BAD BAD BAD BAD BAD |
231 |
+``` |
232 |
+ |
233 |
+If the *foo* user owns `/run/foo`, then he can put whatever he wants |
234 |
+in the `/run/foo/foo.pid` file. Even if *root* owns the PID file, the |
235 |
+*foo* user can delete it and replace it with his own. To avoid |
236 |
+security concerns, the PID file must be created as *root* and live in |
237 |
+a *root*-owned directory. If your daemon is responsible for forking |
238 |
+and writing its own PID file but the PID file is still owned by the |
239 |
+unprivileged runtime user, then you may have an upstream issue. |
240 |
+ |
241 |
+Once the PID file is being created as *root* (before dropping |
242 |
+privileges), it can be written directly to a *root*-owned |
243 |
+directory. Typically this will be `/run` on Linux, and `/var/run` |
244 |
+elsewhere. For example, the *foo* daemon might write |
245 |
+`/run/foo.pid`. No calls to checkpath are needed. Note: there is |
246 |
+nothing technically wrong with using a directory structure like |
247 |
+`/run/foo/foo.pid`, so long as *root* owns the PID file and the |
248 |
+directory containing it. |
249 |
+ |
250 |
+Ideally (see "Upstream your service scripts"), your service script |
251 |
+will be integrated upstream and the build system will determine |
252 |
+which of `/run` or `/var/run` is appropriate. For example, |
253 |
+ |
254 |
+```sh |
255 |
+pidfile="@piddir@/${RC_SVCNAME}.pid" |
256 |
+``` |
257 |
+ |
258 |
+A decent example of this is the [Nagios core service |
259 |
+script](https://github.com/NagiosEnterprises/nagioscore/blob/master/openrc-init.in), |
260 |
+where the full path to the PID file is specified at build-time. |
261 |
+ |
262 |
+# Don't let the user control the PID file location |
263 |
+ |
264 |
+It's usually a mistake to let the end user control the PID file |
265 |
+location through a conf.d variable, for a few reasons: |
266 |
+ |
267 |
+ 1. When the PID file path is controlled by the user, you need to |
268 |
+ ensure that its parent directory exists and is writable. This |
269 |
+ adds unnecessary code to the service script. |
270 |
+ |
271 |
+ 2. If the PID file path changes while the service is running, then |
272 |
+ you'll find yourself unable to stop the service. |
273 |
+ |
274 |
+ 3. The directory that should contain the PID file is best determined |
275 |
+ by the upstream build system (see "Upstream your service scripts"). |
276 |
+ On Linux, the preferred location these days is `/run`. Other systems |
277 |
+ still use `/var/run`, though, and a `./configure` script is the |
278 |
+ best place to decide which one you want. |
279 |
+ |
280 |
+ 4. Nobody cares where the PID file is located, anyway. |
281 |
+ |
282 |
+Since OpenRC service names must be unique, a value of |
283 |
+ |
284 |
+```sh |
285 |
+pidfile="/run/${RC_SVCNAME}.pid" |
286 |
+``` |
287 |
+ |
288 |
+guarantees that your PID file has a unique name. |
289 |
+ |
290 |
+# Upstream your service scripts (for distribution developers) |
291 |
+ |
292 |
+The ideal place for an OpenRC service script is **upstream**. Much like |
293 |
+systemd services, a well-crafted OpenRC service script should be |
294 |
+distribution-agnostic, and the best place for it is upstream. Why? For |
295 |
+two reasons. First, having it upstream means that there's a single |
296 |
+authoritative source for improvements. Second, a few paths in every |
297 |
+service script are dependent upon flags passed to the build system. For |
298 |
+example, |
299 |
+ |
300 |
+```sh |
301 |
+command=/usr/bin/foo |
302 |
+``` |
303 |
+ |
304 |
+in an autotools-based build system should really be |
305 |
+ |
306 |
+```sh |
307 |
+command=@bindir@/foo |
308 |
+``` |
309 |
+ |
310 |
+so that the user's value of `--bindir` is respected. If you keep the |
311 |
+service script in your own distribution's repository, then you have to |
312 |
+keep the command path and package synchronized yourself, and that's no |
313 |
+fun. |
314 |
+ |
315 |
+# Be wary of "need net" dependencies |
316 |
+ |
317 |
+There are two things you need to know about "need net" dependencies: |
318 |
+ |
319 |
+ 1. They are not satisfied by the loopback interface, so "need net" |
320 |
+ requires some *other* interface to be up. |
321 |
+ |
322 |
+ 2. Depending on the value of `rc_depend_strict` in `rc.conf`, the |
323 |
+ "need net" will be satisfied when either *any* non-loopback |
324 |
+ interface is up, or when *all* non-loopback interfaces are up. |
325 |
+ |
326 |
+The first item means that "need net" is wrong for daemons that are |
327 |
+happy with `0.0.0.0`, and the second point means that "need net" is |
328 |
+wrong for daemons that need a particular (for example, the WAN) |
329 |
+interface. We'll consider the two most common users of "need net"; |
330 |
+network clients who access some network resource, and network servers |
331 |
+who provide them. |
332 |
+ |
333 |
+## Network clients |
334 |
+ |
335 |
+Network clients typically want the WAN interface to be up. That may |
336 |
+tempt you to depend on the WAN interface; but first, you should ask |
337 |
+yourself a question: does anything bad happen if the WAN interface is |
338 |
+not available? In other words, if the administrator wants to disable |
339 |
+the WAN, should the service be stopped? Usually the answer to that |
340 |
+question is "no," and in that case, you should forego the "net" |
341 |
+dependency entirely. |
342 |
+ |
343 |
+Suppose, for example, that your service retrieves virus signature |
344 |
+updates from the internet. In order to do its job correctly, it needs |
345 |
+a (working) internet connection. However, the service itself does not |
346 |
+require the WAN interface to be up: if it is, great; otherwise, the |
347 |
+worst that will happen is that a "server unavailable" warning will be |
348 |
+logged. The signature update service will not crash, and—perhaps more |
349 |
+importantly—you don't want it to terminate if the administrator turns |
350 |
+off the WAN interface for a second. |
351 |
+ |
352 |
+## Network servers |
353 |
+ |
354 |
+Network servers are generally easier to handle than their client |
355 |
+counterparts. Most server daemons listen on `0.0.0.0` (all addresses) |
356 |
+by default, and are therefore satisfied to have the loopback interface |
357 |
+present and operational. OpenRC ships with the loopback service in the |
358 |
+*boot* runlevel, and therefore most server daemons require no further |
359 |
+network dependencies. |
360 |
+ |
361 |
+The exceptions to this rule are those daemons who produce negative |
362 |
+side-effects when the WAN is unavailable. For example, the Nagios |
363 |
+server daemon will generate "the sky is falling" alerts for as long as |
364 |
+your monitored hosts are unreachable. So in that case, you should |
365 |
+require some other interface (often the WAN) to be up. A "need" |
366 |
+dependency would be appropriate, because you want Nagios to be |
367 |
+stopped before the network is taken down. |
368 |
+ |
369 |
+If your daemon can optionally be configured to listen on a particular |
370 |
+interface, then please see the "Depending on a particular interface" |
371 |
+section. |
372 |
+ |
373 |
+## Depending on a particular interface |
374 |
+ |
375 |
+If you need to depend on one particular interface, usually it's not |
376 |
+easy to determine programmatically what that interface is. For |
377 |
+example, if your *sshd* daemon listens on `192.168.1.100` (rather than |
378 |
+`0.0.0.0`), then you have two problems: |
379 |
+ |
380 |
+ 1. Parsing `sshd_config` to figure that out; and |
381 |
+ |
382 |
+ 2. Determining which network service name corresponds to the |
383 |
+ interface for `192.168.1.100`. |
384 |
+ |
385 |
+It's generally a bad idea to parse config files in your service |
386 |
+scripts, but the second problem is the harder one. Instead, the most |
387 |
+robust (i.e. the laziest) approach is to make the user specify the |
388 |
+dependency when he makes a change to sshd_config. Include something |
389 |
+like the following in the service configuration file, |
390 |
+ |
391 |
+```sh |
392 |
+# Specify the network service that corresponds to the "bind" setting |
393 |
+# in your configuration file. For example, if you bind to 127.0.0.1, |
394 |
+# this should be set to "net.lo" which provides the loopback interface. |
395 |
+rc_need="net.lo" |
396 |
+``` |
397 |
+ |
398 |
+This is a sensible default for daemons that are happy with `0.0.0.0`, |
399 |
+but lets the user specify something else, like `rc_need="net.wan"` if |
400 |
+he needs it. The burden is on the user to determine the appropriate |
401 |
+service whenever he changes the daemon's configuration file. |