1 |
J. Roeleveld <joost@××××××××.org> wrote: |
2 |
>> |
3 |
>> So you have a command which might break due to hardware error |
4 |
>> and cannot be rerun. I cannot see how any general-purpose scheduler |
5 |
>> might help you here: You either need to be able to split your command |
6 |
>> into several (sequential) commands or you need something adapted |
7 |
>> for your particular command. |
8 |
> |
9 |
> A general-purpose scheduler can work, as they do exist. |
10 |
|
11 |
I doubt that they can solve your problem. |
12 |
Let me repeat: You have a single program which accesses the database |
13 |
in a complex way and somewhere in the course of accessing it, the |
14 |
machine (or program) crashes. |
15 |
No general-purpose program can recover from this: You need |
16 |
particular knowledge of the database and the program if you even |
17 |
want to have a *chance* to recover from such a situation. |
18 |
A program with such a particular knowledge can hardly be called |
19 |
"general-purpose". |
20 |
|
21 |
> If, during one of these steps, the database or ETL process suffers a |
22 |
> crash, the activities of the ETL process need to be rolled back to |
23 |
> the point where you can restart it. |
24 |
|
25 |
I agree, but you need particular knowledge of the database and |
26 |
your tasks to do this which is far beyond the job of a scheduler. |
27 |
As already mentioned by someone in this thread, your problem needs |
28 |
to be solved on the level of the database (using |
29 |
snapshopt capabilities etc.) |
30 |
|
31 |
>> In order to deal with case 1., you can regularly (e.g. each minute) |
32 |
>> dump the output of "schedule list" (possibly suppressing non-important |
33 |
>> data through the options to keep it short). |
34 |
> |
35 |
> Or all the necessary information is kept in-sync on persistent storage. |
36 |
> This would then also allow easy fail-over if the master-schedule-node |
37 |
> fails |
38 |
|
39 |
No, it wouldn't, since jobs just finishing and wanting to report their |
40 |
status cannot do this when there is no server. You would need a rather |
41 |
involved protocol to deal with such situations dynamically. |
42 |
It can certainly be done, but it is not something which can |
43 |
easily be "added" as a feature: If this is required, it has to be the |
44 |
fundamental concept from the very beginning and everything else has to |
45 |
follow this first aim. You need different protocols than TCP sockets, |
46 |
to start with; something like "dbus over IP" with servers being able |
47 |
to announce their new presence, etc. |