Gentoo Archives: gentoo-user

From: Florian Philipp <lists@×××××××××××.net>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Re: RFC : fast copying of a whole directory tree
Date: Tue, 14 Feb 2012 17:46:48
Message-Id: 4F3A9DAC.1040103@binarywings.net
In Reply to: Re: [gentoo-user] Re: RFC : fast copying of a whole directory tree by Joerg.Schilling@fokus.fraunhofer.de (Joerg Schilling)
1 Am 14.02.2012 10:57, schrieb Joerg Schilling:
2 > Florian Philipp <lists@×××××××××××.net> wrote:
3 >
4 >>> Even if the i-nodes are sequential on-disk, there's no reason to think
5 >>> that the data blocks associated with the inodes are in any particular
6 >>> order with respect to the i-nodes themselves.
7 >>
8 >> You could probably find the intended order by using debugfs (at least
9 >> for ext*). The following command should output the first physical block
10 >> of every file:
11 >> find /var/db/portage/ -type f -printf 'bmap <%i> 0\n' | sudo debugfs
12 >> /dev/mapper/vg-portage
13 >
14 > This kind of order is not important for copy speed.
15 >
16 > Copy speed is dominated by write speed and write speed is dominated by seeks
17 > that are a result of keeping meta data up to date.
18 >
19 > Jörg
20 >
21
22 I cannot verify that hypothesis.
23
24 Test setup:
25 1x 7200rpm 2,5" HDD
26 /var/db/portage is my portage tree, ext4
27 /dev/mapper/vg-portage is its block device
28 /tmp is ext4
29
30 First test --- copy whole tree just with `cpio` (performance tested and
31 similar to `cp -a`):
32 $ echo 1 >/proc/sys/vm/drop_caches
33 $ time find /var/db/portage/ -type f -print0 |
34 $ cpio -p0 --make-directories /tmp/portage/
35
36 real 11m52.657s
37 user 0m1.848s
38 sys 0m19.802s
39
40 Second test --- Sort by starting physical block number:
41 $ echo 1 >/proc/sys/vm/drop_caches
42 $ FIFO=/tmp/$(uuidgen).fifo
43 $ mkfifo "$FIFO"
44 $ time find /var/db/portage/ -type f \
45 $ -fprintf "$FIFO" 'bmap <%i> 0\n' -print0 |
46 $ tr '\n\0' '\0\n' | paste <(
47 $ debugfs -f "$FIFO" /dev/mapper/vg-portage |
48 $ grep -E '^[[:digit:]]+') - |
49 $ sort -k 1,1n | cut -f 2- | tr '\n\0' '\0\n' |
50 $ cpio -p0 --make-directories /tmp/portage/
51 $ unlink "$FIFO"
52
53 real 2m8.400s
54 user 0m1.888s
55 sys 0m15.417s
56
57 Using `xargs -0 cat >/dev/null` instead of `cpio` yields 9m27.745s and
58 1m11.087s, respectively.
59
60 Some comments to the sorting script:
61 - Using a fifo instead of a pipe for issuing commands to debugfs is faster.
62 - If it is not obvious, the two `tr` commands are there because `paste`
63 and `cut` cannot handle zero-terminated lines but file names might
64 contain line breaks.
65 - `grep` is there because `debugfs` echoes all commands. Filtering every
66 odd numbered line should also work.
67 - A production-ready script should probably use `join` instead of
68 `paste` to deal with read errors of `debugfs` (for example if files are
69 removed between `find` and `debugfs`). Currently, this leads to
70 misaligned output.
71
72 BTW: I wanted to test it with `star -copy` but this resulted in buffer
73 overflows similar to these:
74 http://permalink.gmane.org/gmane.comp.archivers.star.user/752
75
76 Regards,
77 Florian Philipp

Attachments

File name MIME type
signature.asc application/pgp-signature