Re: [gentoo-portage-dev] [PATCH] sync repositories in parallel (bug 557426) - gentoo-portage-dev

From:	Brian Dolbec <dolsen@g.o>
To:	gentoo-portage-dev@l.g.o
Subject:	Re: [gentoo-portage-dev] [PATCH] sync repositories in parallel (bug 557426)
Date:	Thu, 13 Aug 2015 19:06:20
Message-Id:	`20150813120613.2e35fabd.dolsen@gentoo.org`
In Reply to:	Re: [gentoo-portage-dev] [PATCH] sync repositories in parallel (bug 557426) by Zac Medico

1

Hell with it, top posting this one ;)

2

3

Thanks, questions answered :)

4

5

On Thu, 13 Aug 2015 12:00:46 -0700

6

Zac Medico <zmedico@g.o> wrote:

7

8

> On 08/13/2015 07:29 AM, Brian Dolbec wrote:

9

> > On Thu, 13 Aug 2015 01:34:36 -0700

10

> > Zac Medico <zmedico@g.o> wrote:

11

> >> +	def _task_exit(self, task):

12

> >> +		'''

13

> >> +		Remove the task from the graph, in order to expose

14

> >> +		more leaf nodes.

15

> >> +		'''

16

> >> +		self._running_tasks.discard(task)

17

> >> +		returncode = task.returncode

18

> >> +		if task.returncode == os.EX_OK:

19

> >> +			returncode, message, updatecache_flg =

20

> >> task.result

21

> >> +			if message:

22

> >> +				self.msgs.append(message)

23

> >> +		repo = task.kwargs['repo'].name

24

> >> +		self._running_repos.remove(repo)

25

> >> +		self.retvals.append((repo, returncode))

26

> >> +		self._sync_graph.remove(repo)

27

> >> +		self._update_leaf_nodes()

28

> >> +		super(SyncScheduler, self)._task_exit(self)

29

> >> +

30

> >

31

> >

32

> > returncode in this function seems to be assigned too man times for

33

> > no real reason.  Is there no other way to get message than to

34

> > reassign returncode?  and it task.returncode is not os.EX_OK is not

35

> > task.result still accessible?

36

>

37

> No, it's not accessible, because this means that an unexpected

38

> exception has occurred in the child process. In practice, this case

39

> should not happen, but it could if there's a bug in any one of the

40

> sync modules.

41

>

42

> This is why we have the odd looking returncode assignment, since we

43

> can only access the real sync module returncode if it didn't throw an

44

> unexpected exception.

45

>

46

> > maybe there is a a message still?

47

>

48

> There's no message available if the sync module raised an unexpected

49

> exception. The AsyncFunction class will print a traceback in this

50

> case, and exit with returncode 1.

51

>

52

> > It sounds like

53

> > message should be made available like returncode and forget about

54

> > splitting task.result...  maybe it will be clearer when I fifnish

55

> > reviewing the code. It is early in the morning ... still waking up

56

>

57

> I've left the returncode and message handling as they were before the

58

> patch. It's just that now they're passed back to the main process via

59

> a pipe (handled by AsyncFunction), and we have to handle the case

60

> where the child process might have raised an unexpected exception.

61

>

62

> >> +	def _can_add_job(self):

63

> >> +		'''

64

> >> +		Returns False if there are no leaf nodes

65

> >> available.

66

> >> +		'''

67

> >> +		if not AsyncScheduler._can_add_job(self):

68

> >> +			return False

69

> >> +		return bool(self._leaf_nodes) and not

70

> >> self._terminated.is_set() +

71

> >> +	def _keep_scheduling(self):

72

> >> +		'''

73

> >> +		Schedule as long as the graph is non-empty, and we

74

> >> haven't

75

> >> +		been terminated.

76

> >> +		'''

77

> >> +		return bool(self._sync_graph) and not

78

> >> self._terminated.is_set()

79

> >

80

> >

81

> >

82

> > These 2 functions look like barely different versions of the same

83

> > thing.  doesn't  bool(self._leaf_nodes) and bool(self._sync_graph)

84

> > really evaluate the same? if there are no leaf nodes, then sync

85

> > graph should be false????

86

>

87

> No, it's more complex than that, because this is a non-deterministic

88

> scheduler like the one that we use for parallel builds. The number of

89

> leaf nodes will be zero when a master repo such as 'gentoo' is

90

> syncing. After the master finishes syncing, the master node is

91

> removed from the graph, creating leaf nodes. Then the

92

> _update_leaf_nodes method is called to update self._leaf_nodes.

93

>

94

>

95

> >> +	def _sync_callback(self, proc):

96

> >> +		"""

97

> >> +		This is called in the parent process, serially,

98

> >> for each of the

99

> >> +		sync jobs when they complete. Some cache backends

100

> >> such as sqlite

101

> >> +		may require that cache access be performed

102

> >> serially in the

103

> >> +		parent process like this.

104

> >> +		"""

105

> >> +		repo = proc.kwargs['repo']

106

> >> +		exitcode = proc.returncode

107

> >> +		updatecache_flg = False

108

> >> +		if proc.returncode == os.EX_OK:

109

> >> +			exitcode, message, updatecache_flg =

110

> >> proc.result

111

> >> -	def _sync_callback(self, exitcode, updatecache_flg):

112

> >>  		if updatecache_flg and "metadata-transfer" not in

113

> >> self.settings.features: updatecache_flg = False

114

> >>

115

> >>  		if updatecache_flg and \

116

> >>  			os.path.exists(os.path.join(

117

> >> -			self.repo.location, 'metadata',

118

> >> 'md5-cache')):

119

> >> +			repo.location, 'metadata', 'md5-cache')):

120

> >>

121

> >>

122

> >

123

> >

124

> > Again, this looks unnecessarily messy dealing with

125

> > returncode/exitcode and updatecache_flag  assigning and

126

> > reassigning...

127

>

128

> The situation is analogous to the other one that you complained about.

129

> The exitcode, message, and updatecache_flag variables are simply not

130

> available if the child process has raised an unexpected exception. It

131

> should not happen in practice, but it could if one of the sync modules

132

> has a bug.

--

137

Brian Dolbec <dolsen>

1	Hell with it, top posting this one ;)
2
3	Thanks, questions answered :)
4
5	On Thu, 13 Aug 2015 12:00:46 -0700
6	Zac Medico <zmedico@g.o> wrote:
7
8	> On 08/13/2015 07:29 AM, Brian Dolbec wrote:
9	> > On Thu, 13 Aug 2015 01:34:36 -0700
10	> > Zac Medico <zmedico@g.o> wrote:
11	> >> + def _task_exit(self, task):
12	> >> + '''
13	> >> + Remove the task from the graph, in order to expose
14	> >> + more leaf nodes.
15	> >> + '''
16	> >> + self._running_tasks.discard(task)
17	> >> + returncode = task.returncode
18	> >> + if task.returncode == os.EX_OK:
19	> >> + returncode, message, updatecache_flg =
20	> >> task.result
21	> >> + if message:
22	> >> + self.msgs.append(message)
23	> >> + repo = task.kwargs['repo'].name
24	> >> + self._running_repos.remove(repo)
25	> >> + self.retvals.append((repo, returncode))
26	> >> + self._sync_graph.remove(repo)
27	> >> + self._update_leaf_nodes()
28	> >> + super(SyncScheduler, self)._task_exit(self)
29	> >> +
30	> >
31	> >
32	> > returncode in this function seems to be assigned too man times for
33	> > no real reason. Is there no other way to get message than to
34	> > reassign returncode? and it task.returncode is not os.EX_OK is not
35	> > task.result still accessible?
36	>
37	> No, it's not accessible, because this means that an unexpected
38	> exception has occurred in the child process. In practice, this case
39	> should not happen, but it could if there's a bug in any one of the
40	> sync modules.
41	>
42	> This is why we have the odd looking returncode assignment, since we
43	> can only access the real sync module returncode if it didn't throw an
44	> unexpected exception.
45	>
46	> > maybe there is a a message still?
47	>
48	> There's no message available if the sync module raised an unexpected
49	> exception. The AsyncFunction class will print a traceback in this
50	> case, and exit with returncode 1.
51	>
52	> > It sounds like
53	> > message should be made available like returncode and forget about
54	> > splitting task.result... maybe it will be clearer when I fifnish
55	> > reviewing the code. It is early in the morning ... still waking up
56	>
57	> I've left the returncode and message handling as they were before the
58	> patch. It's just that now they're passed back to the main process via
59	> a pipe (handled by AsyncFunction), and we have to handle the case
60	> where the child process might have raised an unexpected exception.
61	>
62	> >> + def _can_add_job(self):
63	> >> + '''
64	> >> + Returns False if there are no leaf nodes
65	> >> available.
66	> >> + '''
67	> >> + if not AsyncScheduler._can_add_job(self):
68	> >> + return False
69	> >> + return bool(self._leaf_nodes) and not
70	> >> self._terminated.is_set() +
71	> >> + def _keep_scheduling(self):
72	> >> + '''
73	> >> + Schedule as long as the graph is non-empty, and we
74	> >> haven't
75	> >> + been terminated.
76	> >> + '''
77	> >> + return bool(self._sync_graph) and not
78	> >> self._terminated.is_set()
79	> >
80	> >
81	> >
82	> > These 2 functions look like barely different versions of the same
83	> > thing. doesn't bool(self._leaf_nodes) and bool(self._sync_graph)
84	> > really evaluate the same? if there are no leaf nodes, then sync
85	> > graph should be false????
86	>
87	> No, it's more complex than that, because this is a non-deterministic
88	> scheduler like the one that we use for parallel builds. The number of
89	> leaf nodes will be zero when a master repo such as 'gentoo' is
90	> syncing. After the master finishes syncing, the master node is
91	> removed from the graph, creating leaf nodes. Then the
92	> _update_leaf_nodes method is called to update self._leaf_nodes.
93	>
94	>
95	> >> + def _sync_callback(self, proc):
96	> >> + """
97	> >> + This is called in the parent process, serially,
98	> >> for each of the
99	> >> + sync jobs when they complete. Some cache backends
100	> >> such as sqlite
101	> >> + may require that cache access be performed
102	> >> serially in the
103	> >> + parent process like this.
104	> >> + """
105	> >> + repo = proc.kwargs['repo']
106	> >> + exitcode = proc.returncode
107	> >> + updatecache_flg = False
108	> >> + if proc.returncode == os.EX_OK:
109	> >> + exitcode, message, updatecache_flg =
110	> >> proc.result
111	> >> - def _sync_callback(self, exitcode, updatecache_flg):
112	> >> if updatecache_flg and "metadata-transfer" not in
113	> >> self.settings.features: updatecache_flg = False
114	> >>
115	> >> if updatecache_flg and \
116	> >> os.path.exists(os.path.join(
117	> >> - self.repo.location, 'metadata',
118	> >> 'md5-cache')):
119	> >> + repo.location, 'metadata', 'md5-cache')):
120	> >>
121	> >>
122	> >
123	> >
124	> > Again, this looks unnecessarily messy dealing with
125	> > returncode/exitcode and updatecache_flag assigning and
126	> > reassigning...
127	>
128	> The situation is analogous to the other one that you complained about.
129	> The exitcode, message, and updatecache_flag variables are simply not
130	> available if the child process has raised an unexpected exception. It
131	> should not happen in practice, but it could if one of the sync modules
132	> has a bug.
133
134
135
136	--
137	Brian Dolbec <dolsen>

Gentoo Archives: gentoo-portage-dev