Gentoo Archives: gentoo-dev

From: Tom Wijsman <TomWij@g.o>
To: gentoo-dev@l.g.o
Cc: pacho@g.o
Subject: Re: [gentoo-dev] Packages up for grabs
Date: Sun, 16 Jun 2013 18:26:18
Message-Id: 20130616202324.45cb3262@TOMWIJ-GENTOO
In Reply to: Re: [gentoo-dev] Packages up for grabs by Pacho Ramos
1 On Sun, 16 Jun 2013 19:21:38 +0200
2 Pacho Ramos <pacho@g.o> wrote:
3
4 > El dom, 16-06-2013 a las 10:09 -0700, Brian Dolbec escribió:
5 > [...]
6 > > Thank you for considering helping. I have stayed away form the
7 > > intricate details of package management in the past, but I also do
8 > > not like how long portage is taking now for dep calculations.
9 >
10 > And, cannot that efforts be put in enhancing portage instead?
11
12 To make you see the problems and decisions, I'm going to elaborate a
13 little and would like you to ask yourself some questions.
14
15 Is it possible to reasonable enhance the Portage code to improve dep
16 calculations in a reasonable amount of time?
17
18 Let's take a call graph, demonstrating the amount of calls on the
19 arrows and the amount of ticks spend in the call in the boxes:
20
21 http://i.imgur.com/A93CdNR.png
22
23 Which part do you think is problematic? What can we do to get an
24 improvement in time that you can actually benefit from? Which
25 improvements are reasonable to implement? ...?
26
27 Ignoring that call graph, you could look at what has recently been
28 introduced to increase the amount of time needed to calculate the
29 dependency graph; you don't have to look far.
30
31 http://blogs.gentoo.org/mgorny/2013/05/27/the-pointless-art-of-subslots/
32
33 While I don't want point out the contents of that blog post, the title
34 is relevant; implementing features like subslots on an algorithm that
35 was not written with subslots in mind introduces a lot of inefficiency.
36
37 And it's not just subslots, newer features keep getting added to the
38 dependency graph calculation and it gets slower and slower over time.
39 It feels like revdep-rebuild moved into the dependency calculation!
40
41 A combination of these two above arguments ("where do we start?" and
42 "why try to fix something broken by design?") makes me feel that there
43 is need for a huge refactoring; ask yourself another question, what if
44 these systems were written from scratch with subslots in mind?
45
46 Exactly, if you write a system with the features in mind you can write
47 it much more efficient and don't have to deal with historical decisions
48 that you have made in the past; you can continue writing without having
49 to change half of your code, because you though about it in advance.
50 But well, this is true in the start; some EAPIs later, history repeats!
51
52 So, when we acknowledge that it is useless to waste effort on fixes
53 that are unlikely to have a huge benefit or rewriting something that
54 might as well get replaced some years later; we should instead look for
55 better opportunities to do, there are multiple options:
56
57 1) Spend an awful lot of time refactoring our well known Portage;
58 this doesn't only involve moving code around, but also nuking
59 legacy code you can't deal with (see my earlier response) as well
60 as writing new code where it is needed. It may sound easy, it isn't.
61
62 2) Write a system from scratch that is certain to be future proof;
63 it is designed with the current and future specifications in mind,
64 not based on specifications that were awesome some time in the past.
65
66 3) Spend time on learning how pkgcore is implemented, then improve it;
67 pkgcore can be somewhat seen as a a mix from (1) and (2).
68
69 Then, which option would you pick?
70
71 I'm not saying Portage or the team behind it is bad; it is just a bit
72 at the end of its time, I'm afraid of what the future will do to it.
73
74 For option (1), I've thinked slightly further than to just generate the
75 dependency graph, there are two things that came to mind that might be
76 interesting _if_ we can get it to somehow work:
77
78 A) Multiple threads
79
80 I think the way trees work (branches), threads are a huge benefit.
81
82 Maybe zmedico can elaborate on this again, but there were some
83 problems that make it not easy to introduce these threads; there
84 was something regarding a global interpreter lock, but I can't
85 recall the details and am not that acquainted with Python.
86
87 Besides that, the threads will have to be organized; so properly
88 designing the threads, locks and inter-thread interactions might be
89 an interesting task that could require some effort.
90
91 B) Additional caching
92
93 Some of the parts of the dependency graph calculation could benefit
94 from a bit of caching; implementing caching right might be a tricky
95 thing, too small cache is useless and too large cache is overhead.
96
97 Let me take one example part; resolving the USE flags to consider
98 which packages are dependencies, this happens again and again.
99
100 For example, let's say you have
101
102 >=dev-libs/glib-2.33:2
103 gnome-keyring? ( >=app-crypt/libsecret-0.5 )
104 introspection? ( >=dev-libs/gobject-introspection-1 )
105
106 then Portage has to go and turn that into maybe something like
107
108 >=dev-libs/glib-2.33:2
109
110 because the user has neither USE flag set; and it is not only the
111 USE flags the user has set, but also the USE flag in profiles, the
112 default USE flags, the REQUIRED_USE and sometimes even other USE
113 flags like "use1? ( use2? ( ATOM ) )". Heh, complexity!
114
115 So, let's say we want to cache this operation, then we could store
116 a pair of the following details in the cache:
117
118 - Checksum of the ebuild.
119 - USE flags that the user relevant to the ebuild.
120 - Resulting dependency variables.
121
122 So, instead of having to compute the whole thing, it simply can
123 look up the checksum and the USE flags the user has set and
124 immediately get back the right dependency string. That sounds
125 awesome, but how well does the cache function?
126
127 To know that, we would have to look at cache invalidation.
128
129 - How often does the ebuild checksum change?
130 --> Not so much, especially not for stable ebuilds.
131
132 - How often do the users USE flags change for a specific ebuild?
133 --> Not so much, only when the user explicitly changes it or some
134 masking happens in the profile which both don't happen too much.
135
136 That's really great, but now three sad questions:
137
138 - But how big does this cache grow?
139 No idea, it requires another study that implements half of this.
140
141 - But how much does this really speed up?
142 Hard to tell without trying it.
143
144 - Erm, what about the USE flags the reverse dependencies force?
145 Oops, no idea is perfect; can we resolve that?! Heh, no idea...
146
147 You can see that it is not hard to come up with ideas like (A) and (B);
148 but, it is much harder to come up with ideas that actually work, which
149 is why I think we will not see any beneficial improvement to Portage
150 tree soon, unless we are going to drop features and write alternatives.
151
152 Back to the options...
153
154 For option (2) I made a very small start and might continue with this
155 over the vacation; but before I do that, I'm likely going to evaluate
156 option (3) if other people are going to jump in as well, perhaps
157 helping along pkgcore can help me gain knowledge to better write (2)
158 further in the future when pkgcore is found to be past its time.
159
160 Whatever we do, I hope a good educated choice is made...
161
162 Until then, I hope I can continue to reasonably use Portage.
163
164 --
165 With kind regards,
166
167 Tom Wijsman (TomWij)
168 Gentoo Developer
169
170 E-mail address : TomWij@g.o
171 GPG Public Key : 6D34E57D
172 GPG Fingerprint : C165 AF18 AB4C 400B C3D2 ABF0 95B2 1FCD 6D34 E57D

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
[gentoo-dev] Re: Packages up for grabs Duncan <1i5t5.duncan@×××.net>
Re: [gentoo-dev] Packages up for grabs Ciaran McCreesh <ciaran.mccreesh@××××××××××.com>
Re: [gentoo-dev] Packages up for grabs Zac Medico <zmedico@g.o>