Gentoo Archives: gentoo-scire

From: Preston Cody <codeman@g.o>
To: Rodrigo Lazo <rlazo.paz@×××××.com>
Cc: Matt Disney <mdisney@×××××.com>, Andrew Gaffney <agaffney@g.o>, gentoo-scire@l.g.o
Subject: [gentoo-scire] Re: Some questions :)
Date: Wed, 16 May 2007 22:49:09
> >> a) Why do scire have daemon and normal mode? Why isn't daemon the > >> normal mode? > > > > My own view of this is that you don't always want a program running in > > daemon mode. Maybe you want to call the client from a crontab, as an > > alternative.
The initial design goal was to allow both push and pull methods by the clients. So in daemon, you maintain the connection, and the scire server can tell you what to do, whereas in fetch mode or non-daemon mode or whatever you want to call it, the scire client is run and fetches it's jobs, runs them, and completes and exits. The reasons for doing this are security related, in terms of firewalls and open ports. Less open ports == less ways to compromise a box.
> >> c) server/modules/ : gen_summary(client_digest, jobs) what is > >> this function for? On the comment says: > Let's say the client downloads the jobs and start to execute them; > meanwhile it could happen that the client (by the loop interval) query > again for jobs... will this function avoid a duplication of jobs on > the client so it doesn't do the same job again?
Exactly. The client can be in the middle of executing a bunch of long jobs and say it's run in cron every 5 minutes. it'll run and fetch the summary list of jobs to run, look in it's summary to see if it's already got those jobs, and not download the ones it already has. just a simple efficiency thing here. nothing deeper than that.
> > > > >> If you remember a few days ago I asked on IRC about the origjob column > >> and codeman told me that it was discuss and was decided to kick it > >> out. On the wiki says: > >> > >> Ideally the multi-client table in the database will be very > >> lightweight and have a row per client with a multi-client jobid and > >> then a clientid for each client receiving the job. The rest of the > >> information needed can be gathered by looking at any of the individual > >> jobs' information. We had discussed only allowing a multi-client jobs > >> to be applied to a "group". > Just for the record. Are this groups we are talking about the EBMs | > MTAs | image servers that appear on the screenshot tour made by codeman?
yes. the jobs_clients table was designed to map jobs to clients. for single-client jobs it's one row that says this job goes to this client. for multi-client jobs, there is a row per client. the UI will take the group and make a row per client. that's a simple solution, i'm not sure it's the best, but i think it should work.
> >> I have an idea on this ... since we only have multi-jobs per group, > >> why don't treat all jobs as multijobs?
i do NOT intend to make multi-client jobs exclusive to groups. Jobs can be assigned to a set of clients, whether they be in a group or not. Just like a permission can be assigned to any set of clients, so can a job. i guess i need to fix that wiki up. On the rest below, I don't really have a comment because it is a bit confusing to me. The jobs_clients table is a mapping, but the jobs_conditions table is designed to be extended fields to affect a job, not the clients. I don't see how it is useful to combine these tables when they serve different purposes. One is designed to scale out to one row per job per client, the other is meant for one row per job. I will quote the rest here so that people on gentoo-scire can see the conversation we've had. maybe someone has a better idea. -Preston
> > > >> On the scire.sql there is a jobs_client table and jobs_conditions I > >> suggest to merge both, on a jobs_details table. > > > > If we consider all jobs to be multi-jobs, merging jobs_clients and > > job_conditions makes sense. Right now the only column job_condition is > > missing from jobs_clients is the groupid. So in general I like this > > idea but I don't like some of the other new columns you suggest; I'll > > explain below. > > > > > >> Here we can have BOTH clientid and gropuid set. This situation would > >> mean that this is a multijob but has been modified for this particular > >> client. That way we can handle both ends of the spectrum: > > > > Ok, good. > > > >> Scire will be able to provide the job view from both ends of the > >> spectrum: modifying all of the jobs at once (assuming they have not > >> yet deployed), or working with the individual per-jobs that are > >> created from the multi-client job > >> [] > >> > >> This is a lightweight approach to multijobs, so even if the job is > >> customized for a given machine it would only span as one more > >> job_details row. With this approach is more feasible the "Staging" > >> feature, as marking the job as "BETA" or "PRODUCTION READY" would be > >> way easier. As we unify the multi and uni jobs we can deal with > >> concurrency more easily. > > > > Could you explain how this would make concurrency easier? What exactly > > do you mean by concurrency in this case? > > Ok, I have some problems with my English... sorry. I mean > recurrency. Having only one type of job will simplify the recurrency > management. But maybe there aren't any recurrency problems with the > current approach... so this will bear no advantage. > > > > > > >> Advantages: > >> * One job per multijob (lightweight) > >> > >> * Customization doesn't create too much overhead > >> > >> * With the clients/success/failure columns you can know how many > >> clients have complete the job, how many succed or failed, and how > >> many didn't reported anything about the job. It's hard for us to > >> know when to mark a multijob as completed or failed. Let's just > >> present fuzzy results (60% success). With this numbers is simple > >> math. > > > > Ok, maybe only having too general numbers doesn't help. But I still > think that presenting the results on a fuzzy way (on the summary at > least) is a good idea. For some jobs an 60% of success is good but for > others is a failure. All the idea centers on the logic that success of > a job is too subjective. > > > ^ That is the part I don't like. :-) More below. > > > >> * One way of dealing with all jobs, no need to make a difference > >> > >> Disadvantages or open problems: > >> > >> * The main issue with this approach is that there is no way of > >> getting specific information about a given participant of a > >> multijob (if it succed of failed, etc.) you only get global > >> numbers. How to do this without creating a new table is something I > >> haven't figure out jet. This may be done asking everybody involved > >> on the job to send their job status and their log.... or could be > >> done by creating a big TEXT on the history where is saved the LOG > >> of the job, this should be a string easily parseable that will > >> contain the specific information of everybody. This TEXT could be > >> created on the way or at the end_period. But is only an idea > > > > Right now, the job_history table is where we keep a record of the job > > status from each client. I don't like the limitation of only having > > global numbers because I believe Scire needs to provide a full picture > > of exactly what's happening on the network. The job_history table > > provides a flexible way to do that. The idea of a TEXT column > > containing the status of all clients is not very scalable, we should > > definitely not do it that way (although you are right, it is > > technically an option that "solves" the problem). > > > > If there are not other objections to the "all jobs are multi-jobs > > approach," I vote for combining most elements of it with the way we > > have been using the job_history table. That is, we continue to use the > > job_history table to record job status and output. When the frontend > > needs to know about what clients have completed, there will be a query > > (specifically, it will be a join) on the job_history table. And we > > don't maintain client completion status at all in the > > job_conditions/job_details table. > > > > It's a good idea :) > > > > >> * The other big issue is how to control that a single machines > >> doesn't change the counter several times > > > > Yes, another reason I don't like the counter approach. :-) > >
-- gentoo-scire@g.o mailing list