Gentoo Logo
Gentoo Spaceship




Note: Due to technical difficulties, the Archives are currently not up to date. GMANE provides an alternative service for most mailing lists.
c.f. bug 424647
List Archive: gentoo-dev
Navigation:
Lists: gentoo-dev: < Prev By Thread Next > < Prev By Date Next >
Headers:
To: gentoo-dev@g.o
From: Evan Powers <powers.161@...>
Subject: Re: orphaned files on system?
Date: Mon, 21 Apr 2003 03:44:21 -0400
On Thursday 17 April 2003 07:41 am, leon j. breedt wrote:
> i use the attached script to scan for unpackaged files on my filesystem,

There was a thread about such scripts here in early March, if you didn't know. 
You could find it in the archives; might give you some ideas. Subject was 
"Cruft detecting script".

> if you have a system with lots of packages, its going to take some time,
> as it caches all the /var/db/pkg/**/CONTENTS entries in a Berkeley hashdb
> for quick lookups, then runs /usr/bin/find on /, and compares results.

Hmm.... Have you timed your script yet? What sorts of run times are you 
getting with that implementation?

Back in the aforementioned March thread, I posted a pretty naive script 
(included at bottom for reference) to do the same thing; on my system it ran 
in about 46 seconds. Is that faster or slower than your script? Granted yours 
almost certainly does more than mine, but I would think the run time would be 
dominated by the generation and comparison of the two file manifests, so the 
numbers should be comparable.

The reason I even mention it is that I'm wondering if the hash table is a good 
data structure in this situation. My gut tells me it isn't, but I can't argue 
with timing numbers that say otherwise.

I'm thinking:
1) creating the hash table and creating two sorted lists (find output and the 
CONTENTS) are tasks of roughly equivalent complexity (the hash probably wins 
by a modest margin)
2) comparing two sorted lists has complexity of O(n), but using the hash table 
is going to have a complexity maybe of O(C*n), where C is some large constant 
dependent on the hash bucket size or something
3) the sorted list method probably uses less memory, which is probably 
important given the dataset size; also, it probably has considerably better 
locality of reference

Like I said, I can't argue with numbers though. Have any thoughts on this 
analysis?

Evan

---script-cruft.sh---
#!/bin/sh

find / '(' -path /proc \
        -or -path /dev \
        -or -path /boot \
        -or -path /mnt \
        -or -path /tmp \
        -or -path /var/tmp \
        -or -path /root \
        -or -path /home \
        -or -path /lib/dev-state \
        -or -path /lib/modules \
        -or -path /usr/portage \
        -or -path /var/cache/edb \
        -or -path /var/db/pkg \
        ')' -prune -or -print \
| sort >/tmp/allfiles

qpkg -nc -l \
| sed -n -e 's/ -> .*//' -e '1,2 d' -e '/^$/,+2! p' \
| sort \
| uniq >/tmp/portagefiles

comm -2 -3 /tmp/allfiles /tmp/portagefiles
---script-cruft.sh---

--
gentoo-dev@g.o mailing list

Replies:
Re: orphaned files on system?
-- leon j. breedt
References:
orphaned files on system?
-- leon j. breedt
Navigation:
Lists: gentoo-dev: < Prev By Thread Next > < Prev By Date Next >
Previous by thread:
Re: orphaned files on system?
Next by thread:
Re: orphaned files on system?
Previous by date:
Re: Howto build kde on a n K6-2 ?
Next by date:
Recruiting a new kde-base maintainer for Gentoo


Updated Jun 17, 2009

Summary: Archive of the gentoo-dev mailing list.

Donate to support our development efforts.

Copyright 2001-2013 Gentoo Foundation, Inc. Questions, Comments? Contact us.