Gentoo Archives: gentoo-dev

From:	Ed Grimm <paranoid@××××××××××××××××××××××.org>
To:	gentoo-dev@l.g.o
Subject:	[gentoo-dev] Package sub-categories, directory performance, and benchmarking
Date:	Mon, 08 Nov 2004 04:26:50
Message-Id:	`Pine.LNX.4.60.0411080401050.5623@mbeq.rq.iarg`
In Reply to:	Re: [gentoo-dev] [LARGE MESSAGE] Media-sound reorganization! by Carsten Lohrke

1	On Sun, 7 Nov 2004, Carsten Lohrke wrote:
2
3	> On Sunday 07 November 2004 22:39, Stuart Herbert wrote:
4	>> If Portage supporting arbitrary-depth category trees, then we could
5	>> organise things a lot easier. But until that happens, devs are
6	>> going to have to accept the need for more directories in
7	>> /usr/portage.
8	>
9	> I don't think an arbitrary depths would be so helpful. Most likely
10	> it'd slowdown portage. How about flatten the whole beast!? The
11	> categorization hasn't to be done via directories.
12
13	Whyever would flat-tree be better than arbitrary-depth?
14
15	When I started being more dilligent about reading the gentoo mailing
16	lists, I saw a number of threads on the topic of adding sub-categories,
17	and the only consistent reason that was given for not moving forward
18	was, "we need to benchmark that."
19
20	Initially, I saw this as a good thing, because while I already knew the
21	answer, performing tests to verify what one suspects tends to be a good
22	thing. However, as time went on, I continued not seeing the benchmarks.
23	After a while, I became disheartened. But there were other things I was
24	wanting to focus on, so I didn't get invovled.
25
26
27	However, knowing the answer to the directory performance question, I
28	could not let this comment alone.
29
30	I've attached a benchmark script, written in perl, which will find all
31	of the files in the specified directory tree(s), and then randomly
32	selects [count] files (where count is either specified by the --count
33	option, or 10,000), and reads the first line of each of these files.
34	This script can be utilized to benchmark any directory layout methods
35	that people wish to consider for Gentoo.
36
37	What they will find is: for ext2 and ext3 systems, there is an optimal
38	number of files per directory, performance falls linearly beyond this
39	point; for reiserfs systems, it doesn't matter.
40
41	I performed my own tests with this script; doing a split at the most
42	obvious point (the - in the category names), I received marginally
43	improved performance - Gentoo is already slightly over the optimal
44	number of files in /usr/portage. (Don't get me started on dev-perl.)
45
46	More specifically, my average time for 10,000 random file reads in
47	/usr/portage (by changing to /usr/portage and using '.' as the argument
48	to benchaccess) was a little over 60 seconds, although as more tests were
49	performed, the Linux file cache started optimizing that result. My
50	average time for the split categories, on the other hand, averaged at 55
51	seconds.
52
53
54	Some people may wonder why this is - after all, to access multiple
55	directory trees is clearly a lot more work. This may be true for a
56	human, but the computer doesn't see it that way - under ext2 and ext3,
57	it has to read all of the filenames in the directory, until it finds the
58	one you're looking for. Having fewer files, but more directories, means
59	that it gets to the file at each level much quicker. Each of the
60	directory changes adds some time, but that's negligable compared to the
61	time it takes to read through a large directory. Note that this, of
62	course, assumes that sanity is maintained, and we don't have many
63	categories or subcategories with fewer than a dozen packages and/or
64	subcategories.
65
66	Another way to think of it, it's similar to the difference between
67	searches on an unsorted array, and searches on a sorted array.
68
69	Anyone who wonders why reiserfs does not have an issue with either
70	layout does not know what reiserfs is - it was designed specifically to
71	avoid this problem.
72
73	Ed

Attachments

File name	MIME type
benchaccess	text/plain

Replies

Subject	Author
Re: [gentoo-dev] Package sub-categories, directory performance, and benchmarking	Jason Rhinelander <jason@××××××××××××××××.com>
Re: [gentoo-dev] Package sub-categories, directory performance, and benchmarking	Carsten Lohrke <carlo@g.o>
Re: [gentoo-dev] Package sub-categories, directory performance, and benchmarking	Paul de Vrieze <pauldv@g.o>
Re: [gentoo-dev] Package sub-categories, directory performance, and benchmarking	Karl Trygve Kalleberg <karltk@g.o>

Report Message

Find on MARC Find on Google Groups