1 |
On Sun, Apr 3, 2022 at 4:59 AM Wols Lists <antlists@××××××××××××.uk> wrote: |
2 |
> |
3 |
> On 03/04/2022 02:15, Bill Kenworthy wrote: |
4 |
> > Rsync has a bwlimit argument which helps here. Note that rsync copies |
5 |
> > the whole file on what it considers local storage (which can be mounted |
6 |
> > network shares) ... this can cause a real slowdown. |
7 |
> |
8 |
> It won't help on the initial copy, but look at the - I think it is - |
9 |
> --in-place option. |
10 |
> |
11 |
> It won't help with the "read and compare", but it only writes what has |
12 |
> changed, so if a big file has changed slightly, it'll stop it re-copying |
13 |
> the whole file. |
14 |
|
15 |
You might also try ionice - though I find that is hit and miss for |
16 |
effectiveness once you start adding layers like lvm/mdadm/etc as I |
17 |
don't know that the kernel actually sees all the downstream queues |
18 |
when it is throttling processes. I haven't used it on LVM in a while |
19 |
though. |
20 |
|
21 |
Replication performance (especially if you want to do a second pass |
22 |
with rsync) is the sort of thing that using pvmove/etc helps with - |
23 |
since it will ensure nothing gets moved. Snapshot-supporting |
24 |
filesystems like zfs/btrfs are also better if you want to sync things |
25 |
up because they can rapidly identify all the changes between two |
26 |
snapshots without having to actually read anything but metadata, |
27 |
assuming you manage things correctly and maintain a common baseline |
28 |
between them. |
29 |
|
30 |
Of course all of those options require that they be set up in advance. |
31 |
If you just have two generic filesystems and want to sync them, then |
32 |
rsync is your main option. |
33 |
|
34 |
Oh, one thing I would suggest is that if they're on different hosts |
35 |
you actually run rsyncd or do the sync over ssh so that rsync |
36 |
recognizes the situation and will run the client on the remote host, |
37 |
so that all the hashing/etc is run local to the drives. This greatly |
38 |
reduces your network traffic which is likely to be the bottleneck. |
39 |
All the same, if you want to actually use hashes to find differences |
40 |
and not just rely on size/mtime there is no getting around having to |
41 |
read all the data off the disk. |
42 |
|
43 |
-- |
44 |
Rich |