1 |
On Sun, 2010-03-21 at 22:10 +0300, Dmitry Bashkatov wrote: |
2 |
> 2010/3/20 Brian Dolbec <brian.dolbec@×××××.com>: |
3 |
> > Dmitry, eclean is written in python same as portage. For it to |
4 |
> > integrate into eclean it too would also need to be written in python. |
5 |
> > Unfortunately your script would not fit. |
6 |
> > |
7 |
> > There are 2 pieces of information that would be required from the client |
8 |
> > pc's the installed pkg list and the exclude file. The exclude file |
9 |
> > being a debatable one. Since portage now needs a min. of python-2.6 it |
10 |
> > should be possible for the network eclean module to import portage from |
11 |
> > the client machine and obtain the installed pkg list. From there it |
12 |
> > would be added to a global installed pkgs list that would then remove |
13 |
> > any source files they claim to own. I think all other info and checks |
14 |
> > would be handled by the eclean app running on the server. |
15 |
> > |
16 |
> > As for the exclude file it may need to be transferred and parsed to |
17 |
> > accumulate the results (trickier, due to possible conflicts). |
18 |
> > alternatively that control might be better left controlled only on the |
19 |
> > server. |
20 |
> > |
21 |
> > Eclean would not need to installed on the client machines at all. |
22 |
> > |
23 |
> > P.S. eclean assumes all files dirty and in need of cleaning unless |
24 |
> > proven otherwise (due to the dynamic nature of the tree). |
25 |
> > -- |
26 |
> > Brian Dolbec <brian.dolbec@×××××.com> |
27 |
> > |
28 |
> |
29 |
> Thanks for explanation, Brian. I was misleaded that you interested in |
30 |
> any working solution including bash script. |
31 |
> |
32 |
Dmitry: I was not misleading you, (I think you meant misunderstood), and |
33 |
yes I was interested in your script. You had not stated it was a bash |
34 |
script, so could have been a python script. I needed to look at it to |
35 |
see what/how/if it might be useable. I do not consider myself to be an |
36 |
expert, so I welcome other ideas, methods. |
37 |
|
38 |
I am interested in working with you if you would like to work on it, in |
39 |
a way that would be integrated with the current python code base. But |
40 |
also as it was stated, this is too small a job to be used as a soc |
41 |
project. I think it would take about 2 days tops including a few hours |
42 |
of initial coding, lots of testing debugging, creating unit tests, |
43 |
updating man pages, ... |
44 |
|
45 |
I found that your script along with Nirbheek's idea of running eclean on |
46 |
each machine and then finding the common files, is a poor, although |
47 |
simple way of doing it. The reason I think it is poor is that since the |
48 |
distfiles are NFS shared, and that each instance of eclean accesses |
49 |
those files, it's an unnecessary use of resources for files that are |
50 |
largely common to all/nearly all systems. The other thing is that a |
51 |
large part of the search for files to clean means accessing the portage |
52 |
tree to obtain the source file names for installed packages by using |
53 |
portage function calls. The tree is also most likely being shared, |
54 |
which again unnecessarily uses resources for those pkgs and versions in |
55 |
common. Now imagine an install with 100 clients using the same |
56 |
distfiles and portage tree server all doing that for 1,000 installed |
57 |
ebuilds. It would be a tremendous waste of resources, not to mention a |
58 |
huge increase in run time. |
59 |
|
60 |
I have been reviewing the newly re-written code again and my original |
61 |
place/method of adding network support is not the best way to do it. It |
62 |
is still easy to modify the correct location in the code with a slightly |
63 |
different approach. First the currently available eclean versions were |
64 |
flawed in that if an ebuild version or complete pkg was deleted from the |
65 |
tree, it did not check the installed pkg db for the "SRC_URI" in order |
66 |
to match up the source filename(s) to an installed pkg. It would |
67 |
therefore delete installed pkg sources if that ebuild was no longer in |
68 |
the tree. That has been fixed in my re-write version. So to continue |
69 |
that methodology, it must be assumed that at worst case the NFS |
70 |
distfiles and portage tree server was a minimal server system and most |
71 |
of the installed pkgs sources are used in the clients (not in the |
72 |
server). So there are 2 key pieces of data needed from the imported |
73 |
portage instances from each client, making the tasks to be: |
74 |
|
75 |
1) get and accumulate the installed pkg list via the |
76 |
vardb.dbapi.cpv_all() for each client |
77 |
|
78 |
2) after accumulating a complete list, it then determines which pkgs are |
79 |
unigue to which clients and tasks them to retrieve the "SRC_URI" and |
80 |
optionally the "RESTRICT" info from the clients installed db's. |
81 |
|
82 |
3) depending on the number of clients decide how to split up the pkgs in |
83 |
common and task each with a portion of cpv's for the |
84 |
"SRC_URI","RESTRICT" info and accumulate them. |
85 |
|
86 |
4) pass that info into the DistfilesSearch class and run the |
87 |
findDistfiles() which will then determine the files to be cleaned and |
88 |
continue with normal operation. |
89 |
|
90 |
This would offload some or most of the portage system calls to the |
91 |
clients and prevent any installed deprecated pkgs or versions sources |
92 |
from being deleted. It should also eliminate repeatedly doing the same |
93 |
identical information look-ups on each machine. The client machines |
94 |
would not require eclean to be installed, and quite possibly even be |
95 |
blocked from being installed. |
96 |
|
97 |
|
98 |
As I previuosly stated, I think it would be best to only consider the |
99 |
servers distfiles.exclude file. If any files are to be protected in a |
100 |
networked environment such as this, it should be done by an |
101 |
administrator who is authorized to be running eclean. If a client |
102 |
system needed to protect some sources from being deleted then that |
103 |
should be done on the server by an administrator. I do not know this, |
104 |
but I believe permissions would probably be set to prevent a client from |
105 |
deleting files on the server supplying the shared distfiles. |
106 |
|
107 |
I am open to thoughts and suggestions, so, if anyone sees any flaws in |
108 |
my logic please speak up :) Also I am not very experienced with NFS |
109 |
shares and larger networked installations, nor do I have such a system |
110 |
for thorough testing. If someone has a small, yet large enough system |
111 |
that could be used for proper testing ( -p, --pretend mode of course) |
112 |
also speak up. |
113 |
-- |
114 |
Brian Dolbec <brian.dolbec@×××××.com> |