Gentoo Archives: gentoo-dev

From: Evan Powers <powers.161@×××.edu>
To: gentoo-dev@g.o
Subject: Re: [gentoo-dev] orphaned files on system?
Date: Mon, 21 Apr 2003 07:44:23
Message-Id: 200304210344.21811.powers.161@osu.edu
In Reply to: [gentoo-dev] orphaned files on system? by "leon j. breedt"
1 On Thursday 17 April 2003 07:41 am, leon j. breedt wrote:
2 > i use the attached script to scan for unpackaged files on my filesystem,
3
4 There was a thread about such scripts here in early March, if you didn't know.
5 You could find it in the archives; might give you some ideas. Subject was
6 "Cruft detecting script".
7
8 > if you have a system with lots of packages, its going to take some time,
9 > as it caches all the /var/db/pkg/**/CONTENTS entries in a Berkeley hashdb
10 > for quick lookups, then runs /usr/bin/find on /, and compares results.
11
12 Hmm.... Have you timed your script yet? What sorts of run times are you
13 getting with that implementation?
14
15 Back in the aforementioned March thread, I posted a pretty naive script
16 (included at bottom for reference) to do the same thing; on my system it ran
17 in about 46 seconds. Is that faster or slower than your script? Granted yours
18 almost certainly does more than mine, but I would think the run time would be
19 dominated by the generation and comparison of the two file manifests, so the
20 numbers should be comparable.
21
22 The reason I even mention it is that I'm wondering if the hash table is a good
23 data structure in this situation. My gut tells me it isn't, but I can't argue
24 with timing numbers that say otherwise.
25
26 I'm thinking:
27 1) creating the hash table and creating two sorted lists (find output and the
28 CONTENTS) are tasks of roughly equivalent complexity (the hash probably wins
29 by a modest margin)
30 2) comparing two sorted lists has complexity of O(n), but using the hash table
31 is going to have a complexity maybe of O(C*n), where C is some large constant
32 dependent on the hash bucket size or something
33 3) the sorted list method probably uses less memory, which is probably
34 important given the dataset size; also, it probably has considerably better
35 locality of reference
36
37 Like I said, I can't argue with numbers though. Have any thoughts on this
38 analysis?
39
40 Evan
41
42 ---script-cruft.sh---
43 #!/bin/sh
44
45 find / '(' -path /proc \
46 -or -path /dev \
47 -or -path /boot \
48 -or -path /mnt \
49 -or -path /tmp \
50 -or -path /var/tmp \
51 -or -path /root \
52 -or -path /home \
53 -or -path /lib/dev-state \
54 -or -path /lib/modules \
55 -or -path /usr/portage \
56 -or -path /var/cache/edb \
57 -or -path /var/db/pkg \
58 ')' -prune -or -print \
59 | sort >/tmp/allfiles
60
61 qpkg -nc -l \
62 | sed -n -e 's/ -> .*//' -e '1,2 d' -e '/^$/,+2! p' \
63 | sort \
64 | uniq >/tmp/portagefiles
65
66 comm -2 -3 /tmp/allfiles /tmp/portagefiles
67 ---script-cruft.sh---
68
69 --
70 gentoo-dev@g.o mailing list

Replies

Subject Author
Re: [gentoo-dev] orphaned files on system? "leon j. breedt" <ljb@×××××××××.org>