1 |
On Sun, Apr 25, 2010 at 01:18:25PM +0200, Angelo Arrifano wrote: |
2 |
> Hello developers developers and developers, |
3 |
> |
4 |
> Ever wondered how much crap is left in your X-years old Gentoo box? |
5 |
> |
6 |
> I just developed a python utility to efficiently find orphaned files in |
7 |
> the system. By orphaned files I mean the files that are present on |
8 |
> system directories and don't belong to any installed package. |
9 |
> |
10 |
> The package builds a virtual filesystem (cache) on the RAM using python |
11 |
> hash tables. Then it uses the cache to find the ownership of files |
12 |
> inside user-specified dirs. |
13 |
> |
14 |
> Building the cache takes less than 10 seconds here in a system with 1366 |
15 |
> installed packages. |
16 |
> |
17 |
> This is not intended to be a finished program yet, I'm looking forward |
18 |
> for your constructive commentaries. |
19 |
|
20 |
You're going to want to do realpathing here... also you'll need to |
21 |
handle syms, and spaces are allowed in paths. I'd personally suggest |
22 |
using one of the PM api's for this. |
23 |
|
24 |
Part of the reason I advise poking at the PM apis is that it covers up |
25 |
some of the nastier details w/ contents and others w/ parsing; simple |
26 |
example, |
27 |
|
28 |
python -c " |
29 |
import sys |
30 |
from pkgcore.config import load_config |
31 |
from pkgcore.fs import contents, livefs |
32 |
contents = contents.contentsSet() |
33 |
for pkg in load_config().get_default('domain').named_repos['vdb']: |
34 |
contents.update(pkg.contents); |
35 |
stream = (x for x in livefs.iter_scan(sys.argv[1]) if x not in |
36 |
contents) |
37 |
print '\n'.join(map(str, sorted(stream))) |
38 |
" desired-path |
39 |
|
40 |
Note also that's a *very* quick writing. I'd personally look at |
41 |
serializing the sorted lists to disk for both streams (what contents |
42 |
says is on disk vs what is on disk), and then lockstep walking the |
43 |
lists; via that you can keep the memory usage down. |
44 |
|
45 |
~harring |