1 |
On 09/17/2017 04:17 AM, Kai Krakow wrote: |
2 |
> Am Sun, 17 Sep 2017 01:20:45 -0500 |
3 |
> schrieb Dan Douglas <ormaaj@×××××.com>: |
4 |
> |
5 |
>> On 09/16/2017 07:06 AM, Kai Krakow wrote: |
6 |
>>> Am Fri, 15 Sep 2017 14:28:49 -0400 |
7 |
>>> schrieb Rich Freeman <rich0@g.o>: |
8 |
>>> |
9 |
>>>> On Fri, Sep 8, 2017 at 3:16 PM, Kai Krakow <hurikhan77@×××××.com> |
10 |
>>>> wrote: |
11 |
>> [...] |
12 |
>>>> |
13 |
>>>> True, but keep in mind that this applies in general in btrfs to any |
14 |
>>>> kind of modification to a file. If you modify 1MB in the middle |
15 |
>>>> of a 10GB file on ext4 you end up it taking up 10GB of space. If |
16 |
>>>> you do the same thing in btrfs you'll probably end up with the |
17 |
>>>> file taking up 10.001GB. Since btrfs doesn't overwrite files |
18 |
>>>> in-place it will typically allocate a new extent for the |
19 |
>>>> additional 1MB, and the original content at that position within |
20 |
>>>> the file is still on disk in the original extent. It works a bit |
21 |
>>>> like a log-based filesystem in this regard (which is also |
22 |
>>>> effectively copy on write). |
23 |
>>> |
24 |
>>> Good point, this makes sense. I never thought about that. |
25 |
>>> |
26 |
>>> But I guess that btrfs doesn't use 10G sized extents? And I also |
27 |
>>> guess, this is where autodefrag jumps in. |
28 |
>> |
29 |
>> According to btrfs-filesystem(8), defragmentation breaks reflinks, in |
30 |
>> all but a few old kernel versions where I guess they tried to fix the |
31 |
>> problem and apparently failed. |
32 |
> |
33 |
> It was splitting and splicing all the reflinks which is actually a tree |
34 |
> walk with more and more extents coming into the equation, and ended up |
35 |
> doing a lot of small IO and needing a lot of memory. I think you really |
36 |
> cannot fix this when working with extents. |
37 |
|
38 |
I figured by "break up" they meant it eliminates the reflink by making |
39 |
a full copy... so the increased space they're talking about isn't really |
40 |
double that of the original data in other words. |
41 |
|
42 |
> |
43 |
>> This really makes much of what btrfs |
44 |
>> does altogether pointless if you ever defragment manually or have |
45 |
>> autodefrag enabled. Deduplication is broken for the same reason. |
46 |
> |
47 |
> It's much easier to fix this for deduplication: Just write your common |
48 |
> denominator of an extent to a tmp file, then walk all the reflinks and |
49 |
> share them with parts of this extent. |
50 |
> |
51 |
> If you carefully select what to defragment, there should be no problem. |
52 |
> A defrag tool could simply skip all the shared extents. A few fragments |
53 |
> do not hurt performance at all, but what's important is spatial |
54 |
> locality. A lot small fragments may hurt performance a lot, so one |
55 |
> could give the defragger a hint when to ignore the rule and still |
56 |
> defragment the extent. Also, when your deduplication window is 1M you |
57 |
> could probably safely defrag all extents smaller than 1M. |
58 |
|
59 |
Yeah this sort of hurts with the way I deal wtih KVM image snapshots. I |
60 |
have raw base images as backing files with lots of shared and null |
61 |
data, so I run `fallocate --dig-holes' followed by `duperemove |
62 |
--dedupe-options=same' on the cow-enabled base images and hope that |
63 |
btrfs defrag can clean up the resulting fragmented mess, but it's a slow |
64 |
process and doesn't seem to do a good job. |