[gentoo-scire] Re: Some questions :) - gentoo-scire

From:	Preston Cody <codeman@g.o>
To:	Rodrigo Lazo <rlazo.paz@×××××.com>
Cc:	Matt Disney <mdisney@×××××.com>, Andrew Gaffney <agaffney@g.o>, gentoo-scire@l.g.o
Subject:	[gentoo-scire] Re: Some questions :)
Date:	Wed, 16 May 2007 22:49:09
Message-Id:	`5c18b7fe0705161548g34d438a1p1cfa475ce1b2a78c@mail.gmail.com`

1

> >> a) Why do scire have daemon and normal mode? Why isn't daemon the

2

> >> normal mode?

3

> >

4

> > My own view of this is that you don't always want a program running in

5

> > daemon mode. Maybe you want to call the client from a crontab, as an

6

> > alternative.

7

8

The initial design goal was to allow both push and pull methods by the

9

clients.

10

So in daemon,  you maintain the connection, and the scire server can

11

tell you what to do, whereas in fetch mode or non-daemon mode or

12

whatever you want  to call it, the scire client is run and fetches

13

it's jobs, runs them, and completes and exits.  The reasons for doing

14

this are security related, in terms of firewalls and open ports.  Less

15

open ports == less ways to compromise a box.

16

17

> >> c) server/modules/job.py : gen_summary(client_digest, jobs) what is

18

> >> this function for? On the comment says:

19

> Let's say the client downloads the jobs and start to execute them;

20

> meanwhile it could happen that the client (by the loop interval) query

21

> again for jobs... will this function avoid a duplication of jobs on

22

> the client so it doesn't do the same job again?

23

24

Exactly.  The client can be in the middle of executing a bunch of long

25

jobs and say it's run in cron every 5 minutes.  it'll run and fetch

26

the summary list of jobs to run, look in it's summary to see if it's

27

already got those jobs, and not download the ones it already has.

28

just a simple efficiency thing here.  nothing deeper than that.

29

30

>

31

> >

32

> >> If you remember a few days ago I asked on IRC about the origjob column

33

> >> and codeman told me that it was discuss and was decided to kick it

34

> >> out.  On the wiki says:

35

> >>

36

> >> Ideally the multi-client table in the database will be very

37

> >> lightweight and have a row per client with a multi-client jobid and

38

> >> then a clientid for each client receiving the job. The rest of the

39

> >> information needed can be gathered by looking at any of the individual

40

> >> jobs' information. We had discussed only allowing a multi-client jobs

41

> >> to be applied to a "group".

42

> Just for the record. Are this groups we are talking about  the EBMs |

43

> MTAs | image servers that appear on the screenshot tour made by codeman?

44

45

yes.  the jobs_clients table was designed to map jobs to clients. for

46

single-client jobs it's one row that says this job goes to this

47

client.  for multi-client jobs, there is a row per client.  the UI

48

will take the group and make a row per client.  that's a simple

49

solution, i'm not sure it's the best, but i think it should work.

50

51

> >> I have an idea on this ... since we only have multi-jobs per group,

52

> >> why don't treat all jobs as multijobs?

53

54

i do NOT intend to make multi-client jobs exclusive to groups.  Jobs

55

can be assigned to a set of clients, whether they be in a group or

56

not.  Just like a permission can be assigned to any set of clients, so

57

can a job.  i guess i need to fix that wiki up.

58

59

60

On the rest below, I don't really have a comment because it is a bit

61

confusing to me.  The jobs_clients table is a mapping, but the

62

jobs_conditions table is designed to be extended fields to affect a

63

job, not the clients.  I don't see how it is useful to combine these

64

tables when they serve different purposes.  One is designed to scale

65

out to one row per job per client, the other is meant for one row per

66

job.  I will quote the rest here so that people on gentoo-scire can

67

see the conversation we've had.  maybe someone has a better idea.

68

-Preston

> >

73

> >> On the scire.sql there is a jobs_client table and jobs_conditions I

74

> >> suggest to merge both, on a jobs_details table.

75

> >

76

> > If we consider all jobs to be multi-jobs, merging jobs_clients and

77

> > job_conditions makes sense. Right now the only column job_condition is

78

> > missing from jobs_clients is the groupid. So in general I like this

79

> > idea but I don't like some of the other new columns you suggest; I'll

80

> > explain below.

81

> >

82

> >

83

> >> Here we can have BOTH clientid and gropuid set. This situation would

84

> >> mean that this is a multijob but has been modified for this particular

85

> >> client. That way we can handle both ends of the spectrum:

86

> >

87

> > Ok, good.

88

> >

89

> >> Scire will be able to provide the job view from both ends of the

90

> >> spectrum: modifying all of the jobs at once (assuming they have not

91

> >> yet deployed), or working with the individual per-jobs that are

92

> >> created from the multi-client job

93

> >> [http://agaffney.org/mediawiki/index.php/Notes_on_Multi-client_jobs_and_recurring_jobs]

94

> >>

95

> >> This is a lightweight approach to multijobs, so even if the job is

96

> >> customized for a given machine it would only span as one more

97

> >> job_details row. With this approach is more feasible the "Staging"

98

> >> feature, as marking the job as "BETA" or "PRODUCTION READY" would be

99

> >> way easier. As we unify the multi and uni jobs we can deal with

100

> >> concurrency more easily.

101

> >

102

> > Could you explain how this would make concurrency easier? What exactly

103

> > do you mean by concurrency in this case?

104

>

105

> Ok, I have some problems with my English... sorry. I mean

106

> recurrency. Having only one type of job will simplify the recurrency

107

> management. But maybe there aren't any recurrency problems with the

108

> current approach... so this will bear no advantage.

109

>

110

> >

111

> >

112

> >> Advantages:

113

> >>  * One job per multijob (lightweight)

114

> >>

115

> >>  * Customization doesn't create too much overhead

116

> >>

117

> >>  * With the clients/success/failure columns you can know how many

118

> >>    clients have complete the job, how many succed or failed, and how

119

> >>    many didn't reported anything about the job. It's hard for us to

120

> >>    know when to mark a multijob as completed or failed. Let's just

121

> >>    present fuzzy results (60% success). With this numbers is simple

122

> >>    math.

123

> >

124

>

125

> Ok, maybe only having too general numbers doesn't help. But I still

126

> think that presenting the results on a fuzzy way (on the summary at

127

> least) is a good idea. For some jobs an 60% of success is good but for

128

> others is a failure. All the idea centers on the logic that success of

129

> a job is too subjective.

130

>

131

> > ^ That is the part I don't like. :-) More below.

132

> >

133

> >>  * One way of dealing with all jobs, no need to make a difference

134

> >>

135

> >> Disadvantages or open problems:

136

> >>

137

> >>  * The main issue with this approach is that there is no way of

138

> >>    getting specific information about a given participant of a

139

> >>    multijob (if it succed of failed, etc.) you only get global

140

> >>    numbers. How to do this without creating a new table is something I

141

> >>    haven't figure out jet. This may be done asking everybody involved

142

> >>    on the job to send their job status and their log.... or could be

143

> >>    done by creating a big TEXT on the history where is saved the LOG

144

> >>    of the job, this should be a string easily parseable that will

145

> >>    contain the specific information of everybody. This TEXT could be

146

> >>    created on the way or at the end_period. But is only an idea

147

> >

148

> > Right now, the job_history table is where we keep a record of the job

149

> > status from each client. I don't like the limitation of only having

150

> > global numbers because I believe Scire needs to provide a full picture

151

> > of exactly what's happening on the network. The job_history table

152

> > provides a flexible way to do that. The idea of a TEXT column

153

> > containing the status of all clients is not very scalable, we should

154

> > definitely not do it that way (although you are right, it is

155

> > technically an option that "solves" the problem).

156

> >

157

> > If there are not other objections to the "all jobs are multi-jobs

158

> > approach," I vote for combining most elements of it with the way we

159

> > have been using the job_history table. That is, we continue to use the

160

> > job_history table to record job status and output. When the frontend

161

> > needs to know about what clients have completed, there will be a query

162

> > (specifically, it will be a join) on the job_history table. And we

163

> > don't maintain client completion status at all in the

164

> > job_conditions/job_details table.

165

> >

166

>

167

> It's a good idea :)

168

>

169

> >

170

> >>  * The other big issue is how to control that a single machines

171

> >>    doesn't change the counter several times

172

> >

173

> > Yes, another reason I don't like the counter approach. :-)

174

> >

175

--

176

gentoo-scire@g.o mailing list

1	> >> a) Why do scire have daemon and normal mode? Why isn't daemon the
2	> >> normal mode?
3	> >
4	> > My own view of this is that you don't always want a program running in
5	> > daemon mode. Maybe you want to call the client from a crontab, as an
6	> > alternative.
7
8	The initial design goal was to allow both push and pull methods by the
9	clients.
10	So in daemon, you maintain the connection, and the scire server can
11	tell you what to do, whereas in fetch mode or non-daemon mode or
12	whatever you want to call it, the scire client is run and fetches
13	it's jobs, runs them, and completes and exits. The reasons for doing
14	this are security related, in terms of firewalls and open ports. Less
15	open ports == less ways to compromise a box.
16
17	> >> c) server/modules/job.py : gen_summary(client_digest, jobs) what is
18	> >> this function for? On the comment says:
19	> Let's say the client downloads the jobs and start to execute them;
20	> meanwhile it could happen that the client (by the loop interval) query
21	> again for jobs... will this function avoid a duplication of jobs on
22	> the client so it doesn't do the same job again?
23
24	Exactly. The client can be in the middle of executing a bunch of long
25	jobs and say it's run in cron every 5 minutes. it'll run and fetch
26	the summary list of jobs to run, look in it's summary to see if it's
27	already got those jobs, and not download the ones it already has.
28	just a simple efficiency thing here. nothing deeper than that.
29
30	>
31	> >
32	> >> If you remember a few days ago I asked on IRC about the origjob column
33	> >> and codeman told me that it was discuss and was decided to kick it
34	> >> out. On the wiki says:
35	> >>
36	> >> Ideally the multi-client table in the database will be very
37	> >> lightweight and have a row per client with a multi-client jobid and
38	> >> then a clientid for each client receiving the job. The rest of the
39	> >> information needed can be gathered by looking at any of the individual
40	> >> jobs' information. We had discussed only allowing a multi-client jobs
41	> >> to be applied to a "group".
42	> Just for the record. Are this groups we are talking about the EBMs \|
43	> MTAs \| image servers that appear on the screenshot tour made by codeman?
44
45	yes. the jobs_clients table was designed to map jobs to clients. for
46	single-client jobs it's one row that says this job goes to this
47	client. for multi-client jobs, there is a row per client. the UI
48	will take the group and make a row per client. that's a simple
49	solution, i'm not sure it's the best, but i think it should work.
50
51	> >> I have an idea on this ... since we only have multi-jobs per group,
52	> >> why don't treat all jobs as multijobs?
53
54	i do NOT intend to make multi-client jobs exclusive to groups. Jobs
55	can be assigned to a set of clients, whether they be in a group or
56	not. Just like a permission can be assigned to any set of clients, so
57	can a job. i guess i need to fix that wiki up.
58
59
60	On the rest below, I don't really have a comment because it is a bit
61	confusing to me. The jobs_clients table is a mapping, but the
62	jobs_conditions table is designed to be extended fields to affect a
63	job, not the clients. I don't see how it is useful to combine these
64	tables when they serve different purposes. One is designed to scale
65	out to one row per job per client, the other is meant for one row per
66	job. I will quote the rest here so that people on gentoo-scire can
67	see the conversation we've had. maybe someone has a better idea.
68	-Preston
69
70
71
72	> >
73	> >> On the scire.sql there is a jobs_client table and jobs_conditions I
74	> >> suggest to merge both, on a jobs_details table.
75	> >
76	> > If we consider all jobs to be multi-jobs, merging jobs_clients and
77	> > job_conditions makes sense. Right now the only column job_condition is
78	> > missing from jobs_clients is the groupid. So in general I like this
79	> > idea but I don't like some of the other new columns you suggest; I'll
80	> > explain below.
81	> >
82	> >
83	> >> Here we can have BOTH clientid and gropuid set. This situation would
84	> >> mean that this is a multijob but has been modified for this particular
85	> >> client. That way we can handle both ends of the spectrum:
86	> >
87	> > Ok, good.
88	> >
89	> >> Scire will be able to provide the job view from both ends of the
90	> >> spectrum: modifying all of the jobs at once (assuming they have not
91	> >> yet deployed), or working with the individual per-jobs that are
92	> >> created from the multi-client job
93	> >> [http://agaffney.org/mediawiki/index.php/Notes_on_Multi-client_jobs_and_recurring_jobs]
94	> >>
95	> >> This is a lightweight approach to multijobs, so even if the job is
96	> >> customized for a given machine it would only span as one more
97	> >> job_details row. With this approach is more feasible the "Staging"
98	> >> feature, as marking the job as "BETA" or "PRODUCTION READY" would be
99	> >> way easier. As we unify the multi and uni jobs we can deal with
100	> >> concurrency more easily.
101	> >
102	> > Could you explain how this would make concurrency easier? What exactly
103	> > do you mean by concurrency in this case?
104	>
105	> Ok, I have some problems with my English... sorry. I mean
106	> recurrency. Having only one type of job will simplify the recurrency
107	> management. But maybe there aren't any recurrency problems with the
108	> current approach... so this will bear no advantage.
109	>
110	> >
111	> >
112	> >> Advantages:
113	> >> * One job per multijob (lightweight)
114	> >>
115	> >> * Customization doesn't create too much overhead
116	> >>
117	> >> * With the clients/success/failure columns you can know how many
118	> >> clients have complete the job, how many succed or failed, and how
119	> >> many didn't reported anything about the job. It's hard for us to
120	> >> know when to mark a multijob as completed or failed. Let's just
121	> >> present fuzzy results (60% success). With this numbers is simple
122	> >> math.
123	> >
124	>
125	> Ok, maybe only having too general numbers doesn't help. But I still
126	> think that presenting the results on a fuzzy way (on the summary at
127	> least) is a good idea. For some jobs an 60% of success is good but for
128	> others is a failure. All the idea centers on the logic that success of
129	> a job is too subjective.
130	>
131	> > ^ That is the part I don't like. :-) More below.
132	> >
133	> >> * One way of dealing with all jobs, no need to make a difference
134	> >>
135	> >> Disadvantages or open problems:
136	> >>
137	> >> * The main issue with this approach is that there is no way of
138	> >> getting specific information about a given participant of a
139	> >> multijob (if it succed of failed, etc.) you only get global
140	> >> numbers. How to do this without creating a new table is something I
141	> >> haven't figure out jet. This may be done asking everybody involved
142	> >> on the job to send their job status and their log.... or could be
143	> >> done by creating a big TEXT on the history where is saved the LOG
144	> >> of the job, this should be a string easily parseable that will
145	> >> contain the specific information of everybody. This TEXT could be
146	> >> created on the way or at the end_period. But is only an idea
147	> >
148	> > Right now, the job_history table is where we keep a record of the job
149	> > status from each client. I don't like the limitation of only having
150	> > global numbers because I believe Scire needs to provide a full picture
151	> > of exactly what's happening on the network. The job_history table
152	> > provides a flexible way to do that. The idea of a TEXT column
153	> > containing the status of all clients is not very scalable, we should
154	> > definitely not do it that way (although you are right, it is
155	> > technically an option that "solves" the problem).
156	> >
157	> > If there are not other objections to the "all jobs are multi-jobs
158	> > approach," I vote for combining most elements of it with the way we
159	> > have been using the job_history table. That is, we continue to use the
160	> > job_history table to record job status and output. When the frontend
161	> > needs to know about what clients have completed, there will be a query
162	> > (specifically, it will be a join) on the job_history table. And we
163	> > don't maintain client completion status at all in the
164	> > job_conditions/job_details table.
165	> >
166	>
167	> It's a good idea :)
168	>
169	> >
170	> >> * The other big issue is how to control that a single machines
171	> >> doesn't change the counter several times
172	> >
173	> > Yes, another reason I don't like the counter approach. :-)
174	> >
175	--
176	gentoo-scire@g.o mailing list

Gentoo Archives: gentoo-scire