Gentoo Archives: gentoo-user

From: Martin Vaeth <martin@×××××.de>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Re: Recommendations for scheduler
Date: Mon, 04 Aug 2014 10:12:17
Message-Id: slrnltun2t.8io.martin@bois.imp.fu-berlin.de
In Reply to: Re: [gentoo-user] Re: Recommendations for scheduler by "J. Roeleveld"
1 J. Roeleveld <joost@××××××××.org> wrote:
2 >
3 > These schedules then also can't be restarted from the beginning
4 > when they stop halfway through without risking massive consistency
5 > problems in the final data.
6
7 So you have a command which might break due to hardware error
8 and cannot be rerun. I cannot see how any general-purpose scheduler
9 might help you here: You either need to be able to split your command
10 into several (sequential) commands or you need something adapted
11 for your particular command.
12
13 > And then multiple of those starting at random times with
14 > occasionally a whole bunch of the same schedule put into the
15 > queue with dependencies to the previous run.
16
17 That's not a problem. Only if the granularity of one command is
18 not fine enough, it becomes a problem.
19
20 > If, during that time, one of the machines has a hardware failure
21 > or the scheduling process crashes on one or more of the servers,
22 > the last state needs to be recoverable.
23
24 One must distinguish two cases:
25
26 1. The machine running "schedule-server" has a hardware failure.
27 (Let us assume tha "schedule-server" does not have a software failure -
28 otherwise, you have problems anyway.)
29 2. Some other machine has a hardware failure.
30
31 Case 2. is not bad (as concerns the scheduling): Of course, the
32 machine will not report that it completed the job, and you will
33 have to think how to complete the job. But it is clear that in
34 such exceptional cases you have to interfere manually in some sense.
35
36 In order to deal with case 1., you can regularly (e.g. each minute)
37 dump the output of "schedule list" (possibly suppressing non-important
38 data through the options to keep it short).
39 One could add a logging option to decrease the possible race of 1 minute,
40 but in case of hardware failure a possible race cannot be excluded anyway.
41
42 In case 1. you manually have to re-queue the jobs and think what to do
43 with the already started jobs. However, I cannot imagine that this
44 occurs so frequently that this exceptional case becomes something
45 one should seriously think about.

Replies

Subject Author
Re: [gentoo-user] Re: Recommendations for scheduler "J. Roeleveld" <joost@××××××××.org>