Gentoo Archives: gentoo-user

From: Martin Vaeth <martin@×××××.de>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Re: Recommendations for scheduler
Date: Mon, 04 Aug 2014 13:32:18
Message-Id: slrnltv2q1.93r.martin@lounge.imp.fu-berlin.de
In Reply to: Re: [gentoo-user] Re: Recommendations for scheduler by "J. Roeleveld"
1 J. Roeleveld <joost@××××××××.org> wrote:
2 >>
3 >> So you have a command which might break due to hardware error
4 >> and cannot be rerun. I cannot see how any general-purpose scheduler
5 >> might help you here: You either need to be able to split your command
6 >> into several (sequential) commands or you need something adapted
7 >> for your particular command.
8 >
9 > A general-purpose scheduler can work, as they do exist.
10
11 I doubt that they can solve your problem.
12 Let me repeat: You have a single program which accesses the database
13 in a complex way and somewhere in the course of accessing it, the
14 machine (or program) crashes.
15 No general-purpose program can recover from this: You need
16 particular knowledge of the database and the program if you even
17 want to have a *chance* to recover from such a situation.
18 A program with such a particular knowledge can hardly be called
19 "general-purpose".
20
21 > If, during one of these steps, the database or ETL process suffers a
22 > crash, the activities of the ETL process need to be rolled back to
23 > the point where you can restart it.
24
25 I agree, but you need particular knowledge of the database and
26 your tasks to do this which is far beyond the job of a scheduler.
27 As already mentioned by someone in this thread, your problem needs
28 to be solved on the level of the database (using
29 snapshopt capabilities etc.)
30
31 >> In order to deal with case 1., you can regularly (e.g. each minute)
32 >> dump the output of "schedule list" (possibly suppressing non-important
33 >> data through the options to keep it short).
34 >
35 > Or all the necessary information is kept in-sync on persistent storage.
36 > This would then also allow easy fail-over if the master-schedule-node
37 > fails
38
39 No, it wouldn't, since jobs just finishing and wanting to report their
40 status cannot do this when there is no server. You would need a rather
41 involved protocol to deal with such situations dynamically.
42 It can certainly be done, but it is not something which can
43 easily be "added" as a feature: If this is required, it has to be the
44 fundamental concept from the very beginning and everything else has to
45 follow this first aim. You need different protocols than TCP sockets,
46 to start with; something like "dbus over IP" with servers being able
47 to announce their new presence, etc.

Replies

Subject Author
Re: [gentoo-user] Re: Recommendations for scheduler Alan McKinnon <alan.mckinnon@×××××.com>
Re: [gentoo-user] Re: Recommendations for scheduler "J. Roeleveld" <joost@××××××××.org>