Gentoo Archives: gentoo-portage-dev

From: Zac Medico <zmedico@g.o>
To: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] [PATCH] sync repositories in parallel (bug 557426)
Date: Thu, 13 Aug 2015 19:00:52
Message-Id: 55CCE95E.5060506@gentoo.org
In Reply to: Re: [gentoo-portage-dev] [PATCH] sync repositories in parallel (bug 557426) by Brian Dolbec
1 On 08/13/2015 07:29 AM, Brian Dolbec wrote:
2 > On Thu, 13 Aug 2015 01:34:36 -0700
3 > Zac Medico <zmedico@g.o> wrote:
4 >> + def _task_exit(self, task):
5 >> + '''
6 >> + Remove the task from the graph, in order to expose
7 >> + more leaf nodes.
8 >> + '''
9 >> + self._running_tasks.discard(task)
10 >> + returncode = task.returncode
11 >> + if task.returncode == os.EX_OK:
12 >> + returncode, message, updatecache_flg =
13 >> task.result
14 >> + if message:
15 >> + self.msgs.append(message)
16 >> + repo = task.kwargs['repo'].name
17 >> + self._running_repos.remove(repo)
18 >> + self.retvals.append((repo, returncode))
19 >> + self._sync_graph.remove(repo)
20 >> + self._update_leaf_nodes()
21 >> + super(SyncScheduler, self)._task_exit(self)
22 >> +
23 >
24 >
25 > returncode in this function seems to be assigned too man times for no
26 > real reason. Is there no other way to get message than to reassign
27 > returncode? and it task.returncode is not os.EX_OK is not task.result
28 > still accessible?
29
30 No, it's not accessible, because this means that an unexpected exception
31 has occurred in the child process. In practice, this case should not
32 happen, but it could if there's a bug in any one of the sync modules.
33
34 This is why we have the odd looking returncode assignment, since we can
35 only access the real sync module returncode if it didn't throw an
36 unexpected exception.
37
38 > maybe there is a a message still?
39
40 There's no message available if the sync module raised an unexpected
41 exception. The AsyncFunction class will print a traceback in this case,
42 and exit with returncode 1.
43
44 > It sounds like
45 > message should be made available like returncode and forget about
46 > splitting task.result... maybe it will be clearer when I fifnish
47 > reviewing the code. It is early in the morning ... still waking up
48
49 I've left the returncode and message handling as they were before the
50 patch. It's just that now they're passed back to the main process via a
51 pipe (handled by AsyncFunction), and we have to handle the case where
52 the child process might have raised an unexpected exception.
53
54 >> + def _can_add_job(self):
55 >> + '''
56 >> + Returns False if there are no leaf nodes available.
57 >> + '''
58 >> + if not AsyncScheduler._can_add_job(self):
59 >> + return False
60 >> + return bool(self._leaf_nodes) and not
61 >> self._terminated.is_set() +
62 >> + def _keep_scheduling(self):
63 >> + '''
64 >> + Schedule as long as the graph is non-empty, and we
65 >> haven't
66 >> + been terminated.
67 >> + '''
68 >> + return bool(self._sync_graph) and not
69 >> self._terminated.is_set()
70 >
71 >
72 >
73 > These 2 functions look like barely different versions of the same
74 > thing. doesn't bool(self._leaf_nodes) and bool(self._sync_graph)
75 > really evaluate the same? if there are no leaf nodes, then sync graph
76 > should be false????
77
78 No, it's more complex than that, because this is a non-deterministic
79 scheduler like the one that we use for parallel builds. The number of
80 leaf nodes will be zero when a master repo such as 'gentoo' is syncing.
81 After the master finishes syncing, the master node is removed from the
82 graph, creating leaf nodes. Then the _update_leaf_nodes method is called
83 to update self._leaf_nodes.
84
85
86 >> + def _sync_callback(self, proc):
87 >> + """
88 >> + This is called in the parent process, serially, for
89 >> each of the
90 >> + sync jobs when they complete. Some cache backends
91 >> such as sqlite
92 >> + may require that cache access be performed serially
93 >> in the
94 >> + parent process like this.
95 >> + """
96 >> + repo = proc.kwargs['repo']
97 >> + exitcode = proc.returncode
98 >> + updatecache_flg = False
99 >> + if proc.returncode == os.EX_OK:
100 >> + exitcode, message, updatecache_flg =
101 >> proc.result
102 >> - def _sync_callback(self, exitcode, updatecache_flg):
103 >> if updatecache_flg and "metadata-transfer" not in
104 >> self.settings.features: updatecache_flg = False
105 >>
106 >> if updatecache_flg and \
107 >> os.path.exists(os.path.join(
108 >> - self.repo.location, 'metadata',
109 >> 'md5-cache')):
110 >> + repo.location, 'metadata', 'md5-cache')):
111 >>
112 >>
113 >
114 >
115 > Again, this looks unnecessarily messy dealing with returncode/exitcode
116 > and updatecache_flag assigning and reassigning...
117
118 The situation is analogous to the other one that you complained about.
119 The exitcode, message, and updatecache_flag variables are simply not
120 available if the child process has raised an unexpected exception. It
121 should not happen in practice, but it could if one of the sync modules
122 has a bug.
123 --
124 Thanks,
125 Zac

Replies