Gentoo Archives: gentoo-soc

From: Brian Dolbec <brian.dolbec@×××××.com>
To: gentoo-soc@l.g.o
Subject: Re: [gentoo-soc] Re: New idea: network eclean
Date: Sun, 21 Mar 2010 23:59:25
Message-Id: 1269215944.2559.335.camel@big_daddy.dol-sen.ca
In Reply to: Re: [gentoo-soc] Re: New idea: network eclean by Dmitry Bashkatov
1 On Sun, 2010-03-21 at 22:10 +0300, Dmitry Bashkatov wrote:
2 > 2010/3/20 Brian Dolbec <brian.dolbec@×××××.com>:
3 > > Dmitry, eclean is written in python same as portage. For it to
4 > > integrate into eclean it too would also need to be written in python.
5 > > Unfortunately your script would not fit.
6 > >
7 > > There are 2 pieces of information that would be required from the client
8 > > pc's the installed pkg list and the exclude file. The exclude file
9 > > being a debatable one. Since portage now needs a min. of python-2.6 it
10 > > should be possible for the network eclean module to import portage from
11 > > the client machine and obtain the installed pkg list. From there it
12 > > would be added to a global installed pkgs list that would then remove
13 > > any source files they claim to own. I think all other info and checks
14 > > would be handled by the eclean app running on the server.
15 > >
16 > > As for the exclude file it may need to be transferred and parsed to
17 > > accumulate the results (trickier, due to possible conflicts).
18 > > alternatively that control might be better left controlled only on the
19 > > server.
20 > >
21 > > Eclean would not need to installed on the client machines at all.
22 > >
23 > > P.S. eclean assumes all files dirty and in need of cleaning unless
24 > > proven otherwise (due to the dynamic nature of the tree).
25 > > --
26 > > Brian Dolbec <brian.dolbec@×××××.com>
27 > >
28 >
29 > Thanks for explanation, Brian. I was misleaded that you interested in
30 > any working solution including bash script.
31 >
32 Dmitry: I was not misleading you, (I think you meant misunderstood), and
33 yes I was interested in your script. You had not stated it was a bash
34 script, so could have been a python script. I needed to look at it to
35 see what/how/if it might be useable. I do not consider myself to be an
36 expert, so I welcome other ideas, methods.
37
38 I am interested in working with you if you would like to work on it, in
39 a way that would be integrated with the current python code base. But
40 also as it was stated, this is too small a job to be used as a soc
41 project. I think it would take about 2 days tops including a few hours
42 of initial coding, lots of testing debugging, creating unit tests,
43 updating man pages, ...
44
45 I found that your script along with Nirbheek's idea of running eclean on
46 each machine and then finding the common files, is a poor, although
47 simple way of doing it. The reason I think it is poor is that since the
48 distfiles are NFS shared, and that each instance of eclean accesses
49 those files, it's an unnecessary use of resources for files that are
50 largely common to all/nearly all systems. The other thing is that a
51 large part of the search for files to clean means accessing the portage
52 tree to obtain the source file names for installed packages by using
53 portage function calls. The tree is also most likely being shared,
54 which again unnecessarily uses resources for those pkgs and versions in
55 common. Now imagine an install with 100 clients using the same
56 distfiles and portage tree server all doing that for 1,000 installed
57 ebuilds. It would be a tremendous waste of resources, not to mention a
58 huge increase in run time.
59
60 I have been reviewing the newly re-written code again and my original
61 place/method of adding network support is not the best way to do it. It
62 is still easy to modify the correct location in the code with a slightly
63 different approach. First the currently available eclean versions were
64 flawed in that if an ebuild version or complete pkg was deleted from the
65 tree, it did not check the installed pkg db for the "SRC_URI" in order
66 to match up the source filename(s) to an installed pkg. It would
67 therefore delete installed pkg sources if that ebuild was no longer in
68 the tree. That has been fixed in my re-write version. So to continue
69 that methodology, it must be assumed that at worst case the NFS
70 distfiles and portage tree server was a minimal server system and most
71 of the installed pkgs sources are used in the clients (not in the
72 server). So there are 2 key pieces of data needed from the imported
73 portage instances from each client, making the tasks to be:
74
75 1) get and accumulate the installed pkg list via the
76 vardb.dbapi.cpv_all() for each client
77
78 2) after accumulating a complete list, it then determines which pkgs are
79 unigue to which clients and tasks them to retrieve the "SRC_URI" and
80 optionally the "RESTRICT" info from the clients installed db's.
81
82 3) depending on the number of clients decide how to split up the pkgs in
83 common and task each with a portion of cpv's for the
84 "SRC_URI","RESTRICT" info and accumulate them.
85
86 4) pass that info into the DistfilesSearch class and run the
87 findDistfiles() which will then determine the files to be cleaned and
88 continue with normal operation.
89
90 This would offload some or most of the portage system calls to the
91 clients and prevent any installed deprecated pkgs or versions sources
92 from being deleted. It should also eliminate repeatedly doing the same
93 identical information look-ups on each machine. The client machines
94 would not require eclean to be installed, and quite possibly even be
95 blocked from being installed.
96
97
98 As I previuosly stated, I think it would be best to only consider the
99 servers distfiles.exclude file. If any files are to be protected in a
100 networked environment such as this, it should be done by an
101 administrator who is authorized to be running eclean. If a client
102 system needed to protect some sources from being deleted then that
103 should be done on the server by an administrator. I do not know this,
104 but I believe permissions would probably be set to prevent a client from
105 deleting files on the server supplying the shared distfiles.
106
107 I am open to thoughts and suggestions, so, if anyone sees any flaws in
108 my logic please speak up :) Also I am not very experienced with NFS
109 shares and larger networked installations, nor do I have such a system
110 for thorough testing. If someone has a small, yet large enough system
111 that could be used for proper testing ( -p, --pretend mode of course)
112 also speak up.
113 --
114 Brian Dolbec <brian.dolbec@×××××.com>

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-soc] Re: New idea: network eclean Dmitry Bashkatov <me@×××××××.name>