1 |
On Tue, 2005-07-19 at 12:42 -0400, Eric Brown wrote: |
2 |
> Services that use Gentoo init scripts often report a status of [started] or |
3 |
> [OK] even though they fail to start. The most recent bug like this that I've |
4 |
> found is with snort. If you have a bad rule, snort will initialize, the |
5 |
> rc-scripts will give it an [OK] status, and then it will die once it parses the |
6 |
> rules. |
7 |
|
8 |
So snort shouldn't be giving the OK until it really is OK. |
9 |
> |
10 |
> The real problem is not that the daemons don't return errors, but that our init |
11 |
> scripts do not make reasonable attempts to verify service startup. If a Gentoo |
12 |
> init script claims that a service started, it should make an effort to check |
13 |
> that the processes are actually running shortly after the script is run, even if |
14 |
> start-stop-daemon says the parent process initialized. Relying on the return |
15 |
> value of start-stop-daemon is simply insufficient for some services. |
16 |
|
17 |
Not really. An init script is simply a script. It doesn't guarantee |
18 |
anything other than what the service told it. If a service is returning |
19 |
status codes when it really isn't completed its initialization, that is |
20 |
a bug in that service, not in the init script code. While code might |
21 |
need to be adjusted in the init script, this will most likely require |
22 |
patches to the upstream sources. |
23 |
> |
24 |
> I am aware that there are services that can monitor the status of other services |
25 |
> (app-admin/mon?) but I think this issue is a little different. If an ebuild |
26 |
> developer is aware of an error condition can commonly occur shortly after a |
27 |
> daemon initializes, why not attempt to catch those errors? Most of them could |
28 |
> probably be caught by simply checking to see if the process is still running |
29 |
> shortly after the script is run. |
30 |
|
31 |
I agree with you that we should catch the errors, but running another |
32 |
check is simply a waste of time. The service should not ever show a |
33 |
completed state until it is completed. It shouldn't ever be like "Yes, |
34 |
snort worked.......... oh wait, no it didn't." That is even more |
35 |
confusing for users. |
36 |
|
37 |
> I propose increasing developer awareness of this problem, perhaps through some |
38 |
> formal guidelines for ebuild developers. At the very least, I would like to see |
39 |
> these bugs being acknowledged in bugs.gentoo.org instead of getting the same old |
40 |
> upstream/it's not our fault response. We are responsible for our init scripts, |
41 |
> and they are important to our users. |
42 |
|
43 |
You really need to take this up with the developers in question, as this |
44 |
is not a global matter, but really a matter with specific packages. |
45 |
Those are bugs in those packages. If the ebuild maintainers are |
46 |
refusing to resolve issues in the init scripts, which are definitely |
47 |
Gentoo works, please take it up with user relations or attempt to |
48 |
provide a fix for the problem. |
49 |
> |
50 |
> I have 2 ideas for the actual implementation: |
51 |
> |
52 |
> 1) Some kind of check() function in the init.d script, or a generic check() function |
53 |
> that just checks with ps | grep. This might typically be called after having the |
54 |
> init script sleep for a certain amount of time. |
55 |
|
56 |
I would object to this. Having a function to check the status of a |
57 |
service for all of the possible services, when it is only a few that are |
58 |
showing this error, is a bad idea. It adds extra load on all developers |
59 |
that have any init scripts, and is unnecessary in most cases. |
60 |
> |
61 |
> 2) Some kind of special init script that checks registered daemons after all services |
62 |
> have started. (i.e. it depends on all daemons, or they are put into it’s config file). |
63 |
> With this scheme we could avoid excessive sleeping during startup (to keep it fast), |
64 |
> And perhaps even keep using service specific check() functions |
65 |
|
66 |
This would require much more knowledge on the end-user's part. Plus, it |
67 |
will need to be aware of init script dependencies. All in all, it |
68 |
sounds like a bad patch for a situation. |
69 |
|
70 |
-- |
71 |
Chris Gianelloni |
72 |
Release Engineering - Strategic Lead/QA Manager |
73 |
Games - Developer |
74 |
Gentoo Linux |