Gentoo Archives: gentoo-dev

From: "Paul B. Henson" <henson@×××.org>
To: gentoo-dev@l.g.o
Subject: RE: [gentoo-dev] logging in openntpd 20080406-r3+
Date: Fri, 22 Nov 2013 23:45:04
Message-Id: 000901cee7dc$d7481590$85d840b0$@acm.org
In Reply to: Re: [gentoo-dev] logging in openntpd 20080406-r3+ by Dirkjan Ochtman
1 > From: Dirkjan Ochtman [mailto:djc@g.o]
2 > Sent: Friday, November 22, 2013 12:30 PM
3 >
4 > - Without -s, it can take a *very* long time to get close to an
5 > acceptable time error, whereas my initial expectation was that
6 > "starting my ntpd should fix the time error fairly quickly". But for
7 > me this, this is partly about starting ntpd while the machine is
8 > online, not just at boot.
9
10 In general, ntpd tries not to violate the presumption that time is monotonically increasing. Rather, it adjusts the clock rate such that your system time approaches real time; if your current time is too far behind, the clock runs faster, and each "second" takes less than a second, if your time is too far ahead, each "second" takes more than a second. However, if your time is very far off, that will take a considerable amount of time (heh heh) to synchronize. The -s option makes makes ntpd simply set the time to exactly whatever the current time is, regardless of what the system clock happens to say. This could be a huge jump, possibly into the "past" from the perspective of the system. Generally, this is only done at boot, typically before other processes are started that might need the correct time. Depending on what services your system is running, something might be quite unhappy if suddenly it is eight minutes earlier than it appeared to be when the process started.
11
12 > - Second, with -s, the boot delays can be quite long. I'm pretty sure
13 > I've seen delays that are quite a bit longer than 15s, probably in the
14 > case where there's no network or maybe where DNS doesn't resolve well;
15
16 I've tested a variety of scenarios, from the network interface being down/unplugged, providing invalid NTP servers, etc., and I haven't seen a delay longer than 15 seconds. If you look at the source code in ntpd.c:
17
18 while ((ch = getopt(argc, argv, "df:nsSv")) != -1) {
19 [...]
20 case 's':
21 lconf.settime = 1;
22
23 If you supply the -s option lconf.settime is set, and:
24
25 if (!lconf.settime) {
26 log_init(lconf.debug);
27 if (!lconf.debug)
28 if (daemon(1, 0))
29 fatal("daemon");
30 } else
31 timeout = SETTIME_TIMEOUT * 1000;
32
33 rather than immediately daemonizing, it sets a 15 second timeout (SETTIME_TIMEOUT is defined to 15 in ntpd.h).
34
35 It then enters the main loop, where if a response is not received within 15 seconds:
36
37 if ((nfds = poll(pfd, 1, timeout)) == -1)
38
39 if (nfds == 0 && lconf.settime) {
40 lconf.settime = 0;
41 timeout = INFTIM;
42 log_init(lconf.debug);
43 log_debug("no reply received in time, skipping initial "
44 "time setting");
45 if (!lconf.debug)
46 if (daemon(1, 0))
47 fatal("daemon");
48 }
49
50 It backgrounds anyway and aborts the initial time set. I'm not saying there isn't some bug or scenario which would result in a longer delay, but if so, that is a bug in ntpd, not an issue to be worked around in the startup script.
51
52 > in any case, when you're trying to debug issues in a data center
53 > environment, waiting for a bunch of machines to come up is not much
54 > fun. (Or when you've had a machine go down and you're waiting to see
55 > if it comes up again.)
56
57 In my data center, NTP is considered a critical service and provisioned with fault tolerance. If a box trying to boot cannot reach an NTP server, that is as much of a problem as whatever is wrong with the box booting. I don't believe I've ever seen a boot delay caused by NTP on any of our production systems ever. If you can provide a reproducible failure mode where ntpd takes longer than 15 seconds to start I'd be willing to take a look and see what's going on. Ideally though, this should be reproducible on a system already running, not something only happening during boot, as it would be more difficult to debug the process at that state.
58
59 > Now, for my use case, it is not all that important that the time error
60 > is minimized before resuming the boot process, but I really wanted to
61 > minimize boot delays.
62
63 Then I advise you not to use the -s option, in which case there will never be a delay, no matter what.
64
65 > Also, I'm really not sure how the change to logging to stderr/file and
66 > running in debug mode helps with the boot delays.
67
68 Basically, the new startup script does something like:
69
70 /usr/sbin/ntpd -d [-s] 2>/var/log/ntpd.log &
71
72 The process is immediately put into the background and the startup sequence continues. This eliminates the boot delay, but at the cost of not actually setting the time before other processes are started (if the -s option is provided), using really kludgy logging, and always running the process in debug mode.
73
74 Personally, I think it should all be put back to the way it was to begin with, which was perfectly functional. If you want the time set at boot, use -s, and live with whatever delay that might cause in an environment with unreliable NTP service, if you don't want any boot delays, don't use -s, and live with a potentially inaccurate clock. Or, if you want the clock set more or less at startup, but don't want to have a delay, rather than starting ntpd in the run level, start it from a local.d script after everything else is started where any delay won't be an issue.
75
76 Perhaps a note could be added to the default conf.d file pointing out that adding the -s option will cause the boot to be delayed until the time is set or a timeout occurs...

Replies

Subject Author
Re: [gentoo-dev] logging in openntpd 20080406-r3+ "Paul B. Henson" <henson@×××.org>