1 |
> >> a) Why do scire have daemon and normal mode? Why isn't daemon the |
2 |
> >> normal mode? |
3 |
> > |
4 |
> > My own view of this is that you don't always want a program running in |
5 |
> > daemon mode. Maybe you want to call the client from a crontab, as an |
6 |
> > alternative. |
7 |
|
8 |
The initial design goal was to allow both push and pull methods by the |
9 |
clients. |
10 |
So in daemon, you maintain the connection, and the scire server can |
11 |
tell you what to do, whereas in fetch mode or non-daemon mode or |
12 |
whatever you want to call it, the scire client is run and fetches |
13 |
it's jobs, runs them, and completes and exits. The reasons for doing |
14 |
this are security related, in terms of firewalls and open ports. Less |
15 |
open ports == less ways to compromise a box. |
16 |
|
17 |
> >> c) server/modules/job.py : gen_summary(client_digest, jobs) what is |
18 |
> >> this function for? On the comment says: |
19 |
> Let's say the client downloads the jobs and start to execute them; |
20 |
> meanwhile it could happen that the client (by the loop interval) query |
21 |
> again for jobs... will this function avoid a duplication of jobs on |
22 |
> the client so it doesn't do the same job again? |
23 |
|
24 |
Exactly. The client can be in the middle of executing a bunch of long |
25 |
jobs and say it's run in cron every 5 minutes. it'll run and fetch |
26 |
the summary list of jobs to run, look in it's summary to see if it's |
27 |
already got those jobs, and not download the ones it already has. |
28 |
just a simple efficiency thing here. nothing deeper than that. |
29 |
|
30 |
> |
31 |
> > |
32 |
> >> If you remember a few days ago I asked on IRC about the origjob column |
33 |
> >> and codeman told me that it was discuss and was decided to kick it |
34 |
> >> out. On the wiki says: |
35 |
> >> |
36 |
> >> Ideally the multi-client table in the database will be very |
37 |
> >> lightweight and have a row per client with a multi-client jobid and |
38 |
> >> then a clientid for each client receiving the job. The rest of the |
39 |
> >> information needed can be gathered by looking at any of the individual |
40 |
> >> jobs' information. We had discussed only allowing a multi-client jobs |
41 |
> >> to be applied to a "group". |
42 |
> Just for the record. Are this groups we are talking about the EBMs | |
43 |
> MTAs | image servers that appear on the screenshot tour made by codeman? |
44 |
|
45 |
yes. the jobs_clients table was designed to map jobs to clients. for |
46 |
single-client jobs it's one row that says this job goes to this |
47 |
client. for multi-client jobs, there is a row per client. the UI |
48 |
will take the group and make a row per client. that's a simple |
49 |
solution, i'm not sure it's the best, but i think it should work. |
50 |
|
51 |
> >> I have an idea on this ... since we only have multi-jobs per group, |
52 |
> >> why don't treat all jobs as multijobs? |
53 |
|
54 |
i do NOT intend to make multi-client jobs exclusive to groups. Jobs |
55 |
can be assigned to a set of clients, whether they be in a group or |
56 |
not. Just like a permission can be assigned to any set of clients, so |
57 |
can a job. i guess i need to fix that wiki up. |
58 |
|
59 |
|
60 |
On the rest below, I don't really have a comment because it is a bit |
61 |
confusing to me. The jobs_clients table is a mapping, but the |
62 |
jobs_conditions table is designed to be extended fields to affect a |
63 |
job, not the clients. I don't see how it is useful to combine these |
64 |
tables when they serve different purposes. One is designed to scale |
65 |
out to one row per job per client, the other is meant for one row per |
66 |
job. I will quote the rest here so that people on gentoo-scire can |
67 |
see the conversation we've had. maybe someone has a better idea. |
68 |
-Preston |
69 |
|
70 |
|
71 |
|
72 |
> > |
73 |
> >> On the scire.sql there is a jobs_client table and jobs_conditions I |
74 |
> >> suggest to merge both, on a jobs_details table. |
75 |
> > |
76 |
> > If we consider all jobs to be multi-jobs, merging jobs_clients and |
77 |
> > job_conditions makes sense. Right now the only column job_condition is |
78 |
> > missing from jobs_clients is the groupid. So in general I like this |
79 |
> > idea but I don't like some of the other new columns you suggest; I'll |
80 |
> > explain below. |
81 |
> > |
82 |
> > |
83 |
> >> Here we can have BOTH clientid and gropuid set. This situation would |
84 |
> >> mean that this is a multijob but has been modified for this particular |
85 |
> >> client. That way we can handle both ends of the spectrum: |
86 |
> > |
87 |
> > Ok, good. |
88 |
> > |
89 |
> >> Scire will be able to provide the job view from both ends of the |
90 |
> >> spectrum: modifying all of the jobs at once (assuming they have not |
91 |
> >> yet deployed), or working with the individual per-jobs that are |
92 |
> >> created from the multi-client job |
93 |
> >> [http://agaffney.org/mediawiki/index.php/Notes_on_Multi-client_jobs_and_recurring_jobs] |
94 |
> >> |
95 |
> >> This is a lightweight approach to multijobs, so even if the job is |
96 |
> >> customized for a given machine it would only span as one more |
97 |
> >> job_details row. With this approach is more feasible the "Staging" |
98 |
> >> feature, as marking the job as "BETA" or "PRODUCTION READY" would be |
99 |
> >> way easier. As we unify the multi and uni jobs we can deal with |
100 |
> >> concurrency more easily. |
101 |
> > |
102 |
> > Could you explain how this would make concurrency easier? What exactly |
103 |
> > do you mean by concurrency in this case? |
104 |
> |
105 |
> Ok, I have some problems with my English... sorry. I mean |
106 |
> recurrency. Having only one type of job will simplify the recurrency |
107 |
> management. But maybe there aren't any recurrency problems with the |
108 |
> current approach... so this will bear no advantage. |
109 |
> |
110 |
> > |
111 |
> > |
112 |
> >> Advantages: |
113 |
> >> * One job per multijob (lightweight) |
114 |
> >> |
115 |
> >> * Customization doesn't create too much overhead |
116 |
> >> |
117 |
> >> * With the clients/success/failure columns you can know how many |
118 |
> >> clients have complete the job, how many succed or failed, and how |
119 |
> >> many didn't reported anything about the job. It's hard for us to |
120 |
> >> know when to mark a multijob as completed or failed. Let's just |
121 |
> >> present fuzzy results (60% success). With this numbers is simple |
122 |
> >> math. |
123 |
> > |
124 |
> |
125 |
> Ok, maybe only having too general numbers doesn't help. But I still |
126 |
> think that presenting the results on a fuzzy way (on the summary at |
127 |
> least) is a good idea. For some jobs an 60% of success is good but for |
128 |
> others is a failure. All the idea centers on the logic that success of |
129 |
> a job is too subjective. |
130 |
> |
131 |
> > ^ That is the part I don't like. :-) More below. |
132 |
> > |
133 |
> >> * One way of dealing with all jobs, no need to make a difference |
134 |
> >> |
135 |
> >> Disadvantages or open problems: |
136 |
> >> |
137 |
> >> * The main issue with this approach is that there is no way of |
138 |
> >> getting specific information about a given participant of a |
139 |
> >> multijob (if it succed of failed, etc.) you only get global |
140 |
> >> numbers. How to do this without creating a new table is something I |
141 |
> >> haven't figure out jet. This may be done asking everybody involved |
142 |
> >> on the job to send their job status and their log.... or could be |
143 |
> >> done by creating a big TEXT on the history where is saved the LOG |
144 |
> >> of the job, this should be a string easily parseable that will |
145 |
> >> contain the specific information of everybody. This TEXT could be |
146 |
> >> created on the way or at the end_period. But is only an idea |
147 |
> > |
148 |
> > Right now, the job_history table is where we keep a record of the job |
149 |
> > status from each client. I don't like the limitation of only having |
150 |
> > global numbers because I believe Scire needs to provide a full picture |
151 |
> > of exactly what's happening on the network. The job_history table |
152 |
> > provides a flexible way to do that. The idea of a TEXT column |
153 |
> > containing the status of all clients is not very scalable, we should |
154 |
> > definitely not do it that way (although you are right, it is |
155 |
> > technically an option that "solves" the problem). |
156 |
> > |
157 |
> > If there are not other objections to the "all jobs are multi-jobs |
158 |
> > approach," I vote for combining most elements of it with the way we |
159 |
> > have been using the job_history table. That is, we continue to use the |
160 |
> > job_history table to record job status and output. When the frontend |
161 |
> > needs to know about what clients have completed, there will be a query |
162 |
> > (specifically, it will be a join) on the job_history table. And we |
163 |
> > don't maintain client completion status at all in the |
164 |
> > job_conditions/job_details table. |
165 |
> > |
166 |
> |
167 |
> It's a good idea :) |
168 |
> |
169 |
> > |
170 |
> >> * The other big issue is how to control that a single machines |
171 |
> >> doesn't change the counter several times |
172 |
> > |
173 |
> > Yes, another reason I don't like the counter approach. :-) |
174 |
> > |
175 |
-- |
176 |
gentoo-scire@g.o mailing list |