Gentoo Archives: gentoo-scire

From: Preston Cody <codeman@g.o>
To: Rodrigo Lazo <rlazo.paz@×××××.com>
Cc: Matt Disney <mdisney@×××××.com>, Andrew Gaffney <agaffney@g.o>, gentoo-scire@l.g.o
Subject: [gentoo-scire] Re: Some questions :)
Date: Wed, 16 May 2007 22:49:09
Message-Id: 5c18b7fe0705161548g34d438a1p1cfa475ce1b2a78c@mail.gmail.com
1 > >> a) Why do scire have daemon and normal mode? Why isn't daemon the
2 > >> normal mode?
3 > >
4 > > My own view of this is that you don't always want a program running in
5 > > daemon mode. Maybe you want to call the client from a crontab, as an
6 > > alternative.
7
8 The initial design goal was to allow both push and pull methods by the
9 clients.
10 So in daemon, you maintain the connection, and the scire server can
11 tell you what to do, whereas in fetch mode or non-daemon mode or
12 whatever you want to call it, the scire client is run and fetches
13 it's jobs, runs them, and completes and exits. The reasons for doing
14 this are security related, in terms of firewalls and open ports. Less
15 open ports == less ways to compromise a box.
16
17 > >> c) server/modules/job.py : gen_summary(client_digest, jobs) what is
18 > >> this function for? On the comment says:
19 > Let's say the client downloads the jobs and start to execute them;
20 > meanwhile it could happen that the client (by the loop interval) query
21 > again for jobs... will this function avoid a duplication of jobs on
22 > the client so it doesn't do the same job again?
23
24 Exactly. The client can be in the middle of executing a bunch of long
25 jobs and say it's run in cron every 5 minutes. it'll run and fetch
26 the summary list of jobs to run, look in it's summary to see if it's
27 already got those jobs, and not download the ones it already has.
28 just a simple efficiency thing here. nothing deeper than that.
29
30 >
31 > >
32 > >> If you remember a few days ago I asked on IRC about the origjob column
33 > >> and codeman told me that it was discuss and was decided to kick it
34 > >> out. On the wiki says:
35 > >>
36 > >> Ideally the multi-client table in the database will be very
37 > >> lightweight and have a row per client with a multi-client jobid and
38 > >> then a clientid for each client receiving the job. The rest of the
39 > >> information needed can be gathered by looking at any of the individual
40 > >> jobs' information. We had discussed only allowing a multi-client jobs
41 > >> to be applied to a "group".
42 > Just for the record. Are this groups we are talking about the EBMs |
43 > MTAs | image servers that appear on the screenshot tour made by codeman?
44
45 yes. the jobs_clients table was designed to map jobs to clients. for
46 single-client jobs it's one row that says this job goes to this
47 client. for multi-client jobs, there is a row per client. the UI
48 will take the group and make a row per client. that's a simple
49 solution, i'm not sure it's the best, but i think it should work.
50
51 > >> I have an idea on this ... since we only have multi-jobs per group,
52 > >> why don't treat all jobs as multijobs?
53
54 i do NOT intend to make multi-client jobs exclusive to groups. Jobs
55 can be assigned to a set of clients, whether they be in a group or
56 not. Just like a permission can be assigned to any set of clients, so
57 can a job. i guess i need to fix that wiki up.
58
59
60 On the rest below, I don't really have a comment because it is a bit
61 confusing to me. The jobs_clients table is a mapping, but the
62 jobs_conditions table is designed to be extended fields to affect a
63 job, not the clients. I don't see how it is useful to combine these
64 tables when they serve different purposes. One is designed to scale
65 out to one row per job per client, the other is meant for one row per
66 job. I will quote the rest here so that people on gentoo-scire can
67 see the conversation we've had. maybe someone has a better idea.
68 -Preston
69
70
71
72 > >
73 > >> On the scire.sql there is a jobs_client table and jobs_conditions I
74 > >> suggest to merge both, on a jobs_details table.
75 > >
76 > > If we consider all jobs to be multi-jobs, merging jobs_clients and
77 > > job_conditions makes sense. Right now the only column job_condition is
78 > > missing from jobs_clients is the groupid. So in general I like this
79 > > idea but I don't like some of the other new columns you suggest; I'll
80 > > explain below.
81 > >
82 > >
83 > >> Here we can have BOTH clientid and gropuid set. This situation would
84 > >> mean that this is a multijob but has been modified for this particular
85 > >> client. That way we can handle both ends of the spectrum:
86 > >
87 > > Ok, good.
88 > >
89 > >> Scire will be able to provide the job view from both ends of the
90 > >> spectrum: modifying all of the jobs at once (assuming they have not
91 > >> yet deployed), or working with the individual per-jobs that are
92 > >> created from the multi-client job
93 > >> [http://agaffney.org/mediawiki/index.php/Notes_on_Multi-client_jobs_and_recurring_jobs]
94 > >>
95 > >> This is a lightweight approach to multijobs, so even if the job is
96 > >> customized for a given machine it would only span as one more
97 > >> job_details row. With this approach is more feasible the "Staging"
98 > >> feature, as marking the job as "BETA" or "PRODUCTION READY" would be
99 > >> way easier. As we unify the multi and uni jobs we can deal with
100 > >> concurrency more easily.
101 > >
102 > > Could you explain how this would make concurrency easier? What exactly
103 > > do you mean by concurrency in this case?
104 >
105 > Ok, I have some problems with my English... sorry. I mean
106 > recurrency. Having only one type of job will simplify the recurrency
107 > management. But maybe there aren't any recurrency problems with the
108 > current approach... so this will bear no advantage.
109 >
110 > >
111 > >
112 > >> Advantages:
113 > >> * One job per multijob (lightweight)
114 > >>
115 > >> * Customization doesn't create too much overhead
116 > >>
117 > >> * With the clients/success/failure columns you can know how many
118 > >> clients have complete the job, how many succed or failed, and how
119 > >> many didn't reported anything about the job. It's hard for us to
120 > >> know when to mark a multijob as completed or failed. Let's just
121 > >> present fuzzy results (60% success). With this numbers is simple
122 > >> math.
123 > >
124 >
125 > Ok, maybe only having too general numbers doesn't help. But I still
126 > think that presenting the results on a fuzzy way (on the summary at
127 > least) is a good idea. For some jobs an 60% of success is good but for
128 > others is a failure. All the idea centers on the logic that success of
129 > a job is too subjective.
130 >
131 > > ^ That is the part I don't like. :-) More below.
132 > >
133 > >> * One way of dealing with all jobs, no need to make a difference
134 > >>
135 > >> Disadvantages or open problems:
136 > >>
137 > >> * The main issue with this approach is that there is no way of
138 > >> getting specific information about a given participant of a
139 > >> multijob (if it succed of failed, etc.) you only get global
140 > >> numbers. How to do this without creating a new table is something I
141 > >> haven't figure out jet. This may be done asking everybody involved
142 > >> on the job to send their job status and their log.... or could be
143 > >> done by creating a big TEXT on the history where is saved the LOG
144 > >> of the job, this should be a string easily parseable that will
145 > >> contain the specific information of everybody. This TEXT could be
146 > >> created on the way or at the end_period. But is only an idea
147 > >
148 > > Right now, the job_history table is where we keep a record of the job
149 > > status from each client. I don't like the limitation of only having
150 > > global numbers because I believe Scire needs to provide a full picture
151 > > of exactly what's happening on the network. The job_history table
152 > > provides a flexible way to do that. The idea of a TEXT column
153 > > containing the status of all clients is not very scalable, we should
154 > > definitely not do it that way (although you are right, it is
155 > > technically an option that "solves" the problem).
156 > >
157 > > If there are not other objections to the "all jobs are multi-jobs
158 > > approach," I vote for combining most elements of it with the way we
159 > > have been using the job_history table. That is, we continue to use the
160 > > job_history table to record job status and output. When the frontend
161 > > needs to know about what clients have completed, there will be a query
162 > > (specifically, it will be a join) on the job_history table. And we
163 > > don't maintain client completion status at all in the
164 > > job_conditions/job_details table.
165 > >
166 >
167 > It's a good idea :)
168 >
169 > >
170 > >> * The other big issue is how to control that a single machines
171 > >> doesn't change the counter several times
172 > >
173 > > Yes, another reason I don't like the counter approach. :-)
174 > >
175 --
176 gentoo-scire@g.o mailing list