Gentoo Archives: gentoo-dev

From:	Evan Powers <powers.161@×××.edu>
To:	gentoo-dev@g.o
Subject:	Re: [gentoo-dev] orphaned files on system?
Date:	Mon, 21 Apr 2003 07:44:23
Message-Id:	`200304210344.21811.powers.161@osu.edu`
In Reply to:	[gentoo-dev] orphaned files on system? by "leon j. breedt"

1	On Thursday 17 April 2003 07:41 am, leon j. breedt wrote:
2	> i use the attached script to scan for unpackaged files on my filesystem,
3
4	There was a thread about such scripts here in early March, if you didn't know.
5	You could find it in the archives; might give you some ideas. Subject was
6	"Cruft detecting script".
7
8	> if you have a system with lots of packages, its going to take some time,
9	> as it caches all the /var/db/pkg/**/CONTENTS entries in a Berkeley hashdb
10	> for quick lookups, then runs /usr/bin/find on /, and compares results.
11
12	Hmm.... Have you timed your script yet? What sorts of run times are you
13	getting with that implementation?
14
15	Back in the aforementioned March thread, I posted a pretty naive script
16	(included at bottom for reference) to do the same thing; on my system it ran
17	in about 46 seconds. Is that faster or slower than your script? Granted yours
18	almost certainly does more than mine, but I would think the run time would be
19	dominated by the generation and comparison of the two file manifests, so the
20	numbers should be comparable.
21
22	The reason I even mention it is that I'm wondering if the hash table is a good
23	data structure in this situation. My gut tells me it isn't, but I can't argue
24	with timing numbers that say otherwise.
25
26	I'm thinking:
27	1) creating the hash table and creating two sorted lists (find output and the
28	CONTENTS) are tasks of roughly equivalent complexity (the hash probably wins
29	by a modest margin)
30	2) comparing two sorted lists has complexity of O(n), but using the hash table
31	is going to have a complexity maybe of O(C*n), where C is some large constant
32	dependent on the hash bucket size or something
33	3) the sorted list method probably uses less memory, which is probably
34	important given the dataset size; also, it probably has considerably better
35	locality of reference
36
37	Like I said, I can't argue with numbers though. Have any thoughts on this
38	analysis?
39
40	Evan
41
42	---script-cruft.sh---
43	#!/bin/sh
44
45	find / '(' -path /proc \
46	-or -path /dev \
47	-or -path /boot \
48	-or -path /mnt \
49	-or -path /tmp \
50	-or -path /var/tmp \
51	-or -path /root \
52	-or -path /home \
53	-or -path /lib/dev-state \
54	-or -path /lib/modules \
55	-or -path /usr/portage \
56	-or -path /var/cache/edb \
57	-or -path /var/db/pkg \
58	')' -prune -or -print \
59	\| sort >/tmp/allfiles
60
61	qpkg -nc -l \
62	\| sed -n -e 's/ -> .*//' -e '1,2 d' -e '/^$/,+2! p' \
63	\| sort \
64	\| uniq >/tmp/portagefiles
65
66	comm -2 -3 /tmp/allfiles /tmp/portagefiles
67	---script-cruft.sh---
68
69	--
70	gentoo-dev@g.o mailing list

Replies

Subject	Author
Re: [gentoo-dev] orphaned files on system?	"leon j. breedt" <ljb@×××××××××.org>

Report Message

Find on MARC Find on Google Groups