Gentoo Archives: gentoo-portage-dev

From: Brian Dolbec <dolsen@g.o>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] [PATCH] sync repositories in parallel (bug 557426)
Date: Thu, 13 Aug 2015 19:06:20
Message-Id: 20150813120613.2e35fabd.dolsen@gentoo.org
In Reply to: Re: [gentoo-portage-dev] [PATCH] sync repositories in parallel (bug 557426) by Zac Medico
1 Hell with it, top posting this one ;)
2
3 Thanks, questions answered :)
4
5 On Thu, 13 Aug 2015 12:00:46 -0700
6 Zac Medico <zmedico@g.o> wrote:
7
8 > On 08/13/2015 07:29 AM, Brian Dolbec wrote:
9 > > On Thu, 13 Aug 2015 01:34:36 -0700
10 > > Zac Medico <zmedico@g.o> wrote:
11 > >> + def _task_exit(self, task):
12 > >> + '''
13 > >> + Remove the task from the graph, in order to expose
14 > >> + more leaf nodes.
15 > >> + '''
16 > >> + self._running_tasks.discard(task)
17 > >> + returncode = task.returncode
18 > >> + if task.returncode == os.EX_OK:
19 > >> + returncode, message, updatecache_flg =
20 > >> task.result
21 > >> + if message:
22 > >> + self.msgs.append(message)
23 > >> + repo = task.kwargs['repo'].name
24 > >> + self._running_repos.remove(repo)
25 > >> + self.retvals.append((repo, returncode))
26 > >> + self._sync_graph.remove(repo)
27 > >> + self._update_leaf_nodes()
28 > >> + super(SyncScheduler, self)._task_exit(self)
29 > >> +
30 > >
31 > >
32 > > returncode in this function seems to be assigned too man times for
33 > > no real reason. Is there no other way to get message than to
34 > > reassign returncode? and it task.returncode is not os.EX_OK is not
35 > > task.result still accessible?
36 >
37 > No, it's not accessible, because this means that an unexpected
38 > exception has occurred in the child process. In practice, this case
39 > should not happen, but it could if there's a bug in any one of the
40 > sync modules.
41 >
42 > This is why we have the odd looking returncode assignment, since we
43 > can only access the real sync module returncode if it didn't throw an
44 > unexpected exception.
45 >
46 > > maybe there is a a message still?
47 >
48 > There's no message available if the sync module raised an unexpected
49 > exception. The AsyncFunction class will print a traceback in this
50 > case, and exit with returncode 1.
51 >
52 > > It sounds like
53 > > message should be made available like returncode and forget about
54 > > splitting task.result... maybe it will be clearer when I fifnish
55 > > reviewing the code. It is early in the morning ... still waking up
56 >
57 > I've left the returncode and message handling as they were before the
58 > patch. It's just that now they're passed back to the main process via
59 > a pipe (handled by AsyncFunction), and we have to handle the case
60 > where the child process might have raised an unexpected exception.
61 >
62 > >> + def _can_add_job(self):
63 > >> + '''
64 > >> + Returns False if there are no leaf nodes
65 > >> available.
66 > >> + '''
67 > >> + if not AsyncScheduler._can_add_job(self):
68 > >> + return False
69 > >> + return bool(self._leaf_nodes) and not
70 > >> self._terminated.is_set() +
71 > >> + def _keep_scheduling(self):
72 > >> + '''
73 > >> + Schedule as long as the graph is non-empty, and we
74 > >> haven't
75 > >> + been terminated.
76 > >> + '''
77 > >> + return bool(self._sync_graph) and not
78 > >> self._terminated.is_set()
79 > >
80 > >
81 > >
82 > > These 2 functions look like barely different versions of the same
83 > > thing. doesn't bool(self._leaf_nodes) and bool(self._sync_graph)
84 > > really evaluate the same? if there are no leaf nodes, then sync
85 > > graph should be false????
86 >
87 > No, it's more complex than that, because this is a non-deterministic
88 > scheduler like the one that we use for parallel builds. The number of
89 > leaf nodes will be zero when a master repo such as 'gentoo' is
90 > syncing. After the master finishes syncing, the master node is
91 > removed from the graph, creating leaf nodes. Then the
92 > _update_leaf_nodes method is called to update self._leaf_nodes.
93 >
94 >
95 > >> + def _sync_callback(self, proc):
96 > >> + """
97 > >> + This is called in the parent process, serially,
98 > >> for each of the
99 > >> + sync jobs when they complete. Some cache backends
100 > >> such as sqlite
101 > >> + may require that cache access be performed
102 > >> serially in the
103 > >> + parent process like this.
104 > >> + """
105 > >> + repo = proc.kwargs['repo']
106 > >> + exitcode = proc.returncode
107 > >> + updatecache_flg = False
108 > >> + if proc.returncode == os.EX_OK:
109 > >> + exitcode, message, updatecache_flg =
110 > >> proc.result
111 > >> - def _sync_callback(self, exitcode, updatecache_flg):
112 > >> if updatecache_flg and "metadata-transfer" not in
113 > >> self.settings.features: updatecache_flg = False
114 > >>
115 > >> if updatecache_flg and \
116 > >> os.path.exists(os.path.join(
117 > >> - self.repo.location, 'metadata',
118 > >> 'md5-cache')):
119 > >> + repo.location, 'metadata', 'md5-cache')):
120 > >>
121 > >>
122 > >
123 > >
124 > > Again, this looks unnecessarily messy dealing with
125 > > returncode/exitcode and updatecache_flag assigning and
126 > > reassigning...
127 >
128 > The situation is analogous to the other one that you complained about.
129 > The exitcode, message, and updatecache_flag variables are simply not
130 > available if the child process has raised an unexpected exception. It
131 > should not happen in practice, but it could if one of the sync modules
132 > has a bug.
133
134
135
136 --
137 Brian Dolbec <dolsen>