1 |
On Thursday 17 April 2003 07:41 am, leon j. breedt wrote: |
2 |
> i use the attached script to scan for unpackaged files on my filesystem, |
3 |
|
4 |
There was a thread about such scripts here in early March, if you didn't know. |
5 |
You could find it in the archives; might give you some ideas. Subject was |
6 |
"Cruft detecting script". |
7 |
|
8 |
> if you have a system with lots of packages, its going to take some time, |
9 |
> as it caches all the /var/db/pkg/**/CONTENTS entries in a Berkeley hashdb |
10 |
> for quick lookups, then runs /usr/bin/find on /, and compares results. |
11 |
|
12 |
Hmm.... Have you timed your script yet? What sorts of run times are you |
13 |
getting with that implementation? |
14 |
|
15 |
Back in the aforementioned March thread, I posted a pretty naive script |
16 |
(included at bottom for reference) to do the same thing; on my system it ran |
17 |
in about 46 seconds. Is that faster or slower than your script? Granted yours |
18 |
almost certainly does more than mine, but I would think the run time would be |
19 |
dominated by the generation and comparison of the two file manifests, so the |
20 |
numbers should be comparable. |
21 |
|
22 |
The reason I even mention it is that I'm wondering if the hash table is a good |
23 |
data structure in this situation. My gut tells me it isn't, but I can't argue |
24 |
with timing numbers that say otherwise. |
25 |
|
26 |
I'm thinking: |
27 |
1) creating the hash table and creating two sorted lists (find output and the |
28 |
CONTENTS) are tasks of roughly equivalent complexity (the hash probably wins |
29 |
by a modest margin) |
30 |
2) comparing two sorted lists has complexity of O(n), but using the hash table |
31 |
is going to have a complexity maybe of O(C*n), where C is some large constant |
32 |
dependent on the hash bucket size or something |
33 |
3) the sorted list method probably uses less memory, which is probably |
34 |
important given the dataset size; also, it probably has considerably better |
35 |
locality of reference |
36 |
|
37 |
Like I said, I can't argue with numbers though. Have any thoughts on this |
38 |
analysis? |
39 |
|
40 |
Evan |
41 |
|
42 |
---script-cruft.sh--- |
43 |
#!/bin/sh |
44 |
|
45 |
find / '(' -path /proc \ |
46 |
-or -path /dev \ |
47 |
-or -path /boot \ |
48 |
-or -path /mnt \ |
49 |
-or -path /tmp \ |
50 |
-or -path /var/tmp \ |
51 |
-or -path /root \ |
52 |
-or -path /home \ |
53 |
-or -path /lib/dev-state \ |
54 |
-or -path /lib/modules \ |
55 |
-or -path /usr/portage \ |
56 |
-or -path /var/cache/edb \ |
57 |
-or -path /var/db/pkg \ |
58 |
')' -prune -or -print \ |
59 |
| sort >/tmp/allfiles |
60 |
|
61 |
qpkg -nc -l \ |
62 |
| sed -n -e 's/ -> .*//' -e '1,2 d' -e '/^$/,+2! p' \ |
63 |
| sort \ |
64 |
| uniq >/tmp/portagefiles |
65 |
|
66 |
comm -2 -3 /tmp/allfiles /tmp/portagefiles |
67 |
---script-cruft.sh--- |
68 |
|
69 |
-- |
70 |
gentoo-dev@g.o mailing list |