1 |
On Sun, 16 Jun 2013 19:21:38 +0200 |
2 |
Pacho Ramos <pacho@g.o> wrote: |
3 |
|
4 |
> El dom, 16-06-2013 a las 10:09 -0700, Brian Dolbec escribió: |
5 |
> [...] |
6 |
> > Thank you for considering helping. I have stayed away form the |
7 |
> > intricate details of package management in the past, but I also do |
8 |
> > not like how long portage is taking now for dep calculations. |
9 |
> |
10 |
> And, cannot that efforts be put in enhancing portage instead? |
11 |
|
12 |
To make you see the problems and decisions, I'm going to elaborate a |
13 |
little and would like you to ask yourself some questions. |
14 |
|
15 |
Is it possible to reasonable enhance the Portage code to improve dep |
16 |
calculations in a reasonable amount of time? |
17 |
|
18 |
Let's take a call graph, demonstrating the amount of calls on the |
19 |
arrows and the amount of ticks spend in the call in the boxes: |
20 |
|
21 |
http://i.imgur.com/A93CdNR.png |
22 |
|
23 |
Which part do you think is problematic? What can we do to get an |
24 |
improvement in time that you can actually benefit from? Which |
25 |
improvements are reasonable to implement? ...? |
26 |
|
27 |
Ignoring that call graph, you could look at what has recently been |
28 |
introduced to increase the amount of time needed to calculate the |
29 |
dependency graph; you don't have to look far. |
30 |
|
31 |
http://blogs.gentoo.org/mgorny/2013/05/27/the-pointless-art-of-subslots/ |
32 |
|
33 |
While I don't want point out the contents of that blog post, the title |
34 |
is relevant; implementing features like subslots on an algorithm that |
35 |
was not written with subslots in mind introduces a lot of inefficiency. |
36 |
|
37 |
And it's not just subslots, newer features keep getting added to the |
38 |
dependency graph calculation and it gets slower and slower over time. |
39 |
It feels like revdep-rebuild moved into the dependency calculation! |
40 |
|
41 |
A combination of these two above arguments ("where do we start?" and |
42 |
"why try to fix something broken by design?") makes me feel that there |
43 |
is need for a huge refactoring; ask yourself another question, what if |
44 |
these systems were written from scratch with subslots in mind? |
45 |
|
46 |
Exactly, if you write a system with the features in mind you can write |
47 |
it much more efficient and don't have to deal with historical decisions |
48 |
that you have made in the past; you can continue writing without having |
49 |
to change half of your code, because you though about it in advance. |
50 |
But well, this is true in the start; some EAPIs later, history repeats! |
51 |
|
52 |
So, when we acknowledge that it is useless to waste effort on fixes |
53 |
that are unlikely to have a huge benefit or rewriting something that |
54 |
might as well get replaced some years later; we should instead look for |
55 |
better opportunities to do, there are multiple options: |
56 |
|
57 |
1) Spend an awful lot of time refactoring our well known Portage; |
58 |
this doesn't only involve moving code around, but also nuking |
59 |
legacy code you can't deal with (see my earlier response) as well |
60 |
as writing new code where it is needed. It may sound easy, it isn't. |
61 |
|
62 |
2) Write a system from scratch that is certain to be future proof; |
63 |
it is designed with the current and future specifications in mind, |
64 |
not based on specifications that were awesome some time in the past. |
65 |
|
66 |
3) Spend time on learning how pkgcore is implemented, then improve it; |
67 |
pkgcore can be somewhat seen as a a mix from (1) and (2). |
68 |
|
69 |
Then, which option would you pick? |
70 |
|
71 |
I'm not saying Portage or the team behind it is bad; it is just a bit |
72 |
at the end of its time, I'm afraid of what the future will do to it. |
73 |
|
74 |
For option (1), I've thinked slightly further than to just generate the |
75 |
dependency graph, there are two things that came to mind that might be |
76 |
interesting _if_ we can get it to somehow work: |
77 |
|
78 |
A) Multiple threads |
79 |
|
80 |
I think the way trees work (branches), threads are a huge benefit. |
81 |
|
82 |
Maybe zmedico can elaborate on this again, but there were some |
83 |
problems that make it not easy to introduce these threads; there |
84 |
was something regarding a global interpreter lock, but I can't |
85 |
recall the details and am not that acquainted with Python. |
86 |
|
87 |
Besides that, the threads will have to be organized; so properly |
88 |
designing the threads, locks and inter-thread interactions might be |
89 |
an interesting task that could require some effort. |
90 |
|
91 |
B) Additional caching |
92 |
|
93 |
Some of the parts of the dependency graph calculation could benefit |
94 |
from a bit of caching; implementing caching right might be a tricky |
95 |
thing, too small cache is useless and too large cache is overhead. |
96 |
|
97 |
Let me take one example part; resolving the USE flags to consider |
98 |
which packages are dependencies, this happens again and again. |
99 |
|
100 |
For example, let's say you have |
101 |
|
102 |
>=dev-libs/glib-2.33:2 |
103 |
gnome-keyring? ( >=app-crypt/libsecret-0.5 ) |
104 |
introspection? ( >=dev-libs/gobject-introspection-1 ) |
105 |
|
106 |
then Portage has to go and turn that into maybe something like |
107 |
|
108 |
>=dev-libs/glib-2.33:2 |
109 |
|
110 |
because the user has neither USE flag set; and it is not only the |
111 |
USE flags the user has set, but also the USE flag in profiles, the |
112 |
default USE flags, the REQUIRED_USE and sometimes even other USE |
113 |
flags like "use1? ( use2? ( ATOM ) )". Heh, complexity! |
114 |
|
115 |
So, let's say we want to cache this operation, then we could store |
116 |
a pair of the following details in the cache: |
117 |
|
118 |
- Checksum of the ebuild. |
119 |
- USE flags that the user relevant to the ebuild. |
120 |
- Resulting dependency variables. |
121 |
|
122 |
So, instead of having to compute the whole thing, it simply can |
123 |
look up the checksum and the USE flags the user has set and |
124 |
immediately get back the right dependency string. That sounds |
125 |
awesome, but how well does the cache function? |
126 |
|
127 |
To know that, we would have to look at cache invalidation. |
128 |
|
129 |
- How often does the ebuild checksum change? |
130 |
--> Not so much, especially not for stable ebuilds. |
131 |
|
132 |
- How often do the users USE flags change for a specific ebuild? |
133 |
--> Not so much, only when the user explicitly changes it or some |
134 |
masking happens in the profile which both don't happen too much. |
135 |
|
136 |
That's really great, but now three sad questions: |
137 |
|
138 |
- But how big does this cache grow? |
139 |
No idea, it requires another study that implements half of this. |
140 |
|
141 |
- But how much does this really speed up? |
142 |
Hard to tell without trying it. |
143 |
|
144 |
- Erm, what about the USE flags the reverse dependencies force? |
145 |
Oops, no idea is perfect; can we resolve that?! Heh, no idea... |
146 |
|
147 |
You can see that it is not hard to come up with ideas like (A) and (B); |
148 |
but, it is much harder to come up with ideas that actually work, which |
149 |
is why I think we will not see any beneficial improvement to Portage |
150 |
tree soon, unless we are going to drop features and write alternatives. |
151 |
|
152 |
Back to the options... |
153 |
|
154 |
For option (2) I made a very small start and might continue with this |
155 |
over the vacation; but before I do that, I'm likely going to evaluate |
156 |
option (3) if other people are going to jump in as well, perhaps |
157 |
helping along pkgcore can help me gain knowledge to better write (2) |
158 |
further in the future when pkgcore is found to be past its time. |
159 |
|
160 |
Whatever we do, I hope a good educated choice is made... |
161 |
|
162 |
Until then, I hope I can continue to reasonably use Portage. |
163 |
|
164 |
-- |
165 |
With kind regards, |
166 |
|
167 |
Tom Wijsman (TomWij) |
168 |
Gentoo Developer |
169 |
|
170 |
E-mail address : TomWij@g.o |
171 |
GPG Public Key : 6D34E57D |
172 |
GPG Fingerprint : C165 AF18 AB4C 400B C3D2 ABF0 95B2 1FCD 6D34 E57D |