Gentoo Archives: gentoo-user

From: Florian Philipp <lists@×××××××××××.net>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Fast file system for cache directory with lot's of files
Date: Tue, 14 Aug 2012 14:04:17
Message-Id: 502A59FD.8000107@binarywings.net
In Reply to: Re: [gentoo-user] Fast file system for cache directory with lot's of files by Michael Hampicke
1 Am 13.08.2012 20:18, schrieb Michael Hampicke:
2 > Am 13.08.2012 19:14, schrieb Florian Philipp:
3 >> Am 13.08.2012 16:52, schrieb Michael Mol:
4 >>> On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke
5 >>> <mgehampicke@×××××.com <mailto:mgehampicke@×××××.com>> wrote:
6 >>>
7 >>> Have you indexed your ext4 partition?
8 >>>
9 >>> # tune2fs -O dir_index /dev/your_partition
10 >>> # e2fsck -D /dev/your_partition
11 >>>
12 >>> Hi, the dir_index is active. I guess that's why delete operations
13 >>> take as long as they take (index has to be updated every time)
14 >>>
15 >>>
16 >>> 1) Scan for files to remove
17 >>> 2) disable index
18 >>> 3) Remove files
19 >>> 4) enable index
20 >>>
21 >>> ?
22 >>>
23 >>> --
24 >>> :wq
25 >>
26 >> Other things to think about:
27 >>
28 >> 1. Play around with data=journal/writeback/ordered. IIRC, data=journal
29 >> actually used to improve performance depending on the workload as it
30 >> delays random IO in favor of sequential IO (when updating the journal).
31 >>
32 >> 2. Increase the journal size.
33 >>
34 >> 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of
35 >> course this only helps after re-allocating everything.
36 >>
37 >> 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since
38 >> 2.6.39 IIRC). For example:
39 >> find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \
40 >> xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f
41 >>
42 >> 5. Use a separate device for the journal.
43 >>
44 >> 6. Temporarily deactivate the journal with tune2fs similar to MM's idea.
45 >>
46 >> Regards,
47 >> Florian Philipp
48 >>
49 >
50 > Trying out different journals-/options was already on my list, but the
51 > manpage on chattr regarding the T attribute is an interesting read.
52 > Definitely worth trying.
53 >
54 > Parallelizing multiple finds was something I already did, but the only
55 > thing that increased was the IO wait :) But now having read all the
56 > suggestions in this thread, I might try it again.
57 >
58 > Separate device for the journal is a good idea, but not possible atm
59 > (machine is abroad in a data center)
60 >
61
62 Something else I just remembered. I guess it doesn't help you with your
63 current problem but it might come in handy when working with such large
64 cache dirs: I once wrote a script that sorts files by their starting
65 physical block. This improved reading them quite a bit (2 minutes
66 instead of 11 minutes for copying the portage tree).
67
68 It's a terrible clutch, will probably fail when passing FS boundaries or
69 a thousand other oddities and requires root for some very scary
70 programs. I never had the time to finish an improved C version. Anyway,
71 maybe it helps you:
72
73 #!/bin/bash
74 #
75 # Example below copies /usr/portage/* to /tmp/portage.
76 # Replace /usr/portage with the input directory.
77 # Replace `cpio` with whatever does the actual work. Input is a
78 # \0-delimited file list.
79 #
80 FIFO=/tmp/$(uuidgen).fifo
81 mkfifo "$FIFO"
82 find /usr/portage -type f -fprintf "$FIFO" 'bmap <%i> 0\n' -print0 |
83 tr '\n\0' '\0\n' |
84 paste <(
85 debugfs -f "$FIFO" /dev/mapper/vg-portage |
86 grep -E '^[[:digit:]]+'
87 ) - |
88 sort -k 1,1n |
89 cut -f 2- |
90 tr '\n\0' '\0\n' |
91 cpio -p0 --make-directories /tmp/portage/
92 unlink "$FIFO"

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies

Subject Author
Re: [gentoo-user] Fast file system for cache directory with lot's of files Michael Hampicke <gentoo-user@××××.biz>