On Thursday 17 April 2003 07:41 am, leon j. breedt wrote:
> i use the attached script to scan for unpackaged files on my filesystem,
There was a thread about such scripts here in early March, if you didn't know.
You could find it in the archives; might give you some ideas. Subject was
"Cruft detecting script".
> if you have a system with lots of packages, its going to take some time,
> as it caches all the /var/db/pkg/**/CONTENTS entries in a Berkeley hashdb
> for quick lookups, then runs /usr/bin/find on /, and compares results.
Hmm.... Have you timed your script yet? What sorts of run times are you
getting with that implementation?
Back in the aforementioned March thread, I posted a pretty naive script
(included at bottom for reference) to do the same thing; on my system it ran
in about 46 seconds. Is that faster or slower than your script? Granted yours
almost certainly does more than mine, but I would think the run time would be
dominated by the generation and comparison of the two file manifests, so the
numbers should be comparable.
The reason I even mention it is that I'm wondering if the hash table is a good
data structure in this situation. My gut tells me it isn't, but I can't argue
with timing numbers that say otherwise.
I'm thinking:
1) creating the hash table and creating two sorted lists (find output and the
CONTENTS) are tasks of roughly equivalent complexity (the hash probably wins
by a modest margin)
2) comparing two sorted lists has complexity of O(n), but using the hash table
is going to have a complexity maybe of O(C*n), where C is some large constant
dependent on the hash bucket size or something
3) the sorted list method probably uses less memory, which is probably
important given the dataset size; also, it probably has considerably better
locality of reference
Like I said, I can't argue with numbers though. Have any thoughts on this
analysis?
Evan
---script-cruft.sh---
#!/bin/sh
find / '(' -path /proc \
-or -path /dev \
-or -path /boot \
-or -path /mnt \
-or -path /tmp \
-or -path /var/tmp \
-or -path /root \
-or -path /home \
-or -path /lib/dev-state \
-or -path /lib/modules \
-or -path /usr/portage \
-or -path /var/cache/edb \
-or -path /var/db/pkg \
')' -prune -or -print \
| sort >/tmp/allfiles
qpkg -nc -l \
| sed -n -e 's/ -> .*//' -e '1,2 d' -e '/^$/,+2! p' \
| sort \
| uniq >/tmp/portagefiles
comm -2 -3 /tmp/allfiles /tmp/portagefiles
---script-cruft.sh---
--
gentoo-dev@g.o mailing list
|