1 |
Am 13.08.2012 20:18, schrieb Michael Hampicke: |
2 |
> Am 13.08.2012 19:14, schrieb Florian Philipp: |
3 |
>> Am 13.08.2012 16:52, schrieb Michael Mol: |
4 |
>>> On Mon, Aug 13, 2012 at 10:42 AM, Michael Hampicke |
5 |
>>> <mgehampicke@×××××.com <mailto:mgehampicke@×××××.com>> wrote: |
6 |
>>> |
7 |
>>> Have you indexed your ext4 partition? |
8 |
>>> |
9 |
>>> # tune2fs -O dir_index /dev/your_partition |
10 |
>>> # e2fsck -D /dev/your_partition |
11 |
>>> |
12 |
>>> Hi, the dir_index is active. I guess that's why delete operations |
13 |
>>> take as long as they take (index has to be updated every time) |
14 |
>>> |
15 |
>>> |
16 |
>>> 1) Scan for files to remove |
17 |
>>> 2) disable index |
18 |
>>> 3) Remove files |
19 |
>>> 4) enable index |
20 |
>>> |
21 |
>>> ? |
22 |
>>> |
23 |
>>> -- |
24 |
>>> :wq |
25 |
>> |
26 |
>> Other things to think about: |
27 |
>> |
28 |
>> 1. Play around with data=journal/writeback/ordered. IIRC, data=journal |
29 |
>> actually used to improve performance depending on the workload as it |
30 |
>> delays random IO in favor of sequential IO (when updating the journal). |
31 |
>> |
32 |
>> 2. Increase the journal size. |
33 |
>> |
34 |
>> 3. Take a look at `man 1 chattr`. Especially the 'T' attribute. Of |
35 |
>> course this only helps after re-allocating everything. |
36 |
>> |
37 |
>> 4. Try parallelizing. Ext4 requires relatively few locks nowadays (since |
38 |
>> 2.6.39 IIRC). For example: |
39 |
>> find $TOP_DIR -mindepth 1 -maxdepth 1 -print0 | \ |
40 |
>> xargs -0 -n 1 -r -P 4 -I '{}' find '{}' -type f |
41 |
>> |
42 |
>> 5. Use a separate device for the journal. |
43 |
>> |
44 |
>> 6. Temporarily deactivate the journal with tune2fs similar to MM's idea. |
45 |
>> |
46 |
>> Regards, |
47 |
>> Florian Philipp |
48 |
>> |
49 |
> |
50 |
> Trying out different journals-/options was already on my list, but the |
51 |
> manpage on chattr regarding the T attribute is an interesting read. |
52 |
> Definitely worth trying. |
53 |
> |
54 |
> Parallelizing multiple finds was something I already did, but the only |
55 |
> thing that increased was the IO wait :) But now having read all the |
56 |
> suggestions in this thread, I might try it again. |
57 |
> |
58 |
> Separate device for the journal is a good idea, but not possible atm |
59 |
> (machine is abroad in a data center) |
60 |
> |
61 |
|
62 |
Something else I just remembered. I guess it doesn't help you with your |
63 |
current problem but it might come in handy when working with such large |
64 |
cache dirs: I once wrote a script that sorts files by their starting |
65 |
physical block. This improved reading them quite a bit (2 minutes |
66 |
instead of 11 minutes for copying the portage tree). |
67 |
|
68 |
It's a terrible clutch, will probably fail when passing FS boundaries or |
69 |
a thousand other oddities and requires root for some very scary |
70 |
programs. I never had the time to finish an improved C version. Anyway, |
71 |
maybe it helps you: |
72 |
|
73 |
#!/bin/bash |
74 |
# |
75 |
# Example below copies /usr/portage/* to /tmp/portage. |
76 |
# Replace /usr/portage with the input directory. |
77 |
# Replace `cpio` with whatever does the actual work. Input is a |
78 |
# \0-delimited file list. |
79 |
# |
80 |
FIFO=/tmp/$(uuidgen).fifo |
81 |
mkfifo "$FIFO" |
82 |
find /usr/portage -type f -fprintf "$FIFO" 'bmap <%i> 0\n' -print0 | |
83 |
tr '\n\0' '\0\n' | |
84 |
paste <( |
85 |
debugfs -f "$FIFO" /dev/mapper/vg-portage | |
86 |
grep -E '^[[:digit:]]+' |
87 |
) - | |
88 |
sort -k 1,1n | |
89 |
cut -f 2- | |
90 |
tr '\n\0' '\0\n' | |
91 |
cpio -p0 --make-directories /tmp/portage/ |
92 |
unlink "$FIFO" |