1 |
Services that use Gentoo init scripts often report a status of [started] |
2 |
or |
3 |
[OK] even though they fail to start. The most recent bug like this that |
4 |
I've |
5 |
found is with snort. If you have a bad rule, snort will initialize, the |
6 |
rc-scripts will give it an [OK] status, and then it will die once it |
7 |
parses the |
8 |
rules. |
9 |
|
10 |
The real problem is not that the daemons don't return errors, but that |
11 |
our init |
12 |
scripts do not make reasonable attempts to verify service startup. If a |
13 |
Gentoo |
14 |
init script claims that a service started, it should make an effort to |
15 |
check |
16 |
that the processes are actually running shortly after the script is run, |
17 |
even if |
18 |
start-stop-daemon says the parent process initialized. Relying on the |
19 |
return |
20 |
value of start-stop-daemon is simply insufficient for some services. |
21 |
|
22 |
I am aware that there are services that can monitor the status of other |
23 |
services |
24 |
(app-admin/mon?) but I think this issue is a little different. If an |
25 |
ebuild |
26 |
developer is aware of an error condition can commonly occur shortly |
27 |
after a |
28 |
daemon initializes, why not attempt to catch those errors? Most of them |
29 |
could |
30 |
probably be caught by simply checking to see if the process is still |
31 |
running |
32 |
shortly after the script is run. |
33 |
|
34 |
I propose increasing developer awareness of this problem, perhaps |
35 |
through some |
36 |
formal guidelines for ebuild developers. At the very least, I would |
37 |
like to see |
38 |
these bugs being acknowledged in bugs.gentoo.org instead of getting the |
39 |
same old |
40 |
upstream/it's not our fault response. We are responsible for our init |
41 |
scripts, |
42 |
and they are important to our users. |
43 |
|
44 |
I have 2 ideas for the actual implementation: |
45 |
|
46 |
1) Some kind of check() function in the init.d script, or a generic |
47 |
check() function |
48 |
that just checks with ps | grep. This might typically be called after |
49 |
having the |
50 |
init script sleep for a certain amount of time. |
51 |
|
52 |
2) Some kind of special init script that checks registered daemons after |
53 |
all services |
54 |
have started. (i.e. it depends on all daemons, or they are put into it's |
55 |
config file). |
56 |
With this scheme we could avoid excessive sleeping during startup (to |
57 |
keep it fast), |
58 |
And perhaps even keep using service specific check() functions |
59 |
|
60 |
|
61 |
Does anyone else think this idea is worth looking into? |