1 |
On Mon, Apr 27, 2020 at 12:20 PM <tuxic@××××××.de> wrote: |
2 |
> |
3 |
> The kernel is keep track of all, which already has been fstrimmed and |
4 |
> avoids to retrimm the same data. |
5 |
> This knowledge gets lost, when the PC is powercycled or rebooted. |
6 |
> |
7 |
|
8 |
I imagine this is filesystem-specific. When I checked the ext4 source |
9 |
I didn't think to actually check whether those flags are stored on |
10 |
disk vs in some kind of cache. |
11 |
|
12 |
I wouldn't be surprised if this data is also lost by simply unmounting |
13 |
the filesystem. |
14 |
|
15 |
> I think, the value of the amount of fstrimmed data does not reflect |
16 |
> the amount of data, which gets physically fstrimmed by the SSD |
17 |
> controller. |
18 |
|
19 |
Yup. Though I'd take issue with the term "physically fstrimmed" - I |
20 |
don't think that a concept like this really exists. The only physical |
21 |
operations are reading, writing, and erasing. TRIM is really a |
22 |
logical operation at its heart. |
23 |
|
24 |
It wouldn't make sense for a TRIM to automatically trigger some kind |
25 |
of erase operation all the time. Suppose blocks 1-32 are in a single |
26 |
erase group. You send a TRIM command for block 1 only. It makes no |
27 |
sense to have the device read blocks 2-32, erase blocks 1-32, and then |
28 |
write blocks 2-32 back. That does erase block 1, but it costs a bunch |
29 |
of IO and it only replicates the worst case scenario of what would |
30 |
happen if you overwrote block 1 in place without trimming it first. |
31 |
You might argue that now block 1 can be written later without having |
32 |
to do another erase, but this is only true if the drive can remember |
33 |
that it was already erased - otherwise all writes have to be preceded |
34 |
with reads just to see if the block is already empty. |
35 |
|
36 |
Maybe that is how they actually do it, but it seems like it would make |
37 |
more sense for a drive to try to look for opportunities to erase |
38 |
entire blocks that don't require a read first, or to try to keep these |
39 |
unused areas in some kind in extents that are less expensive to track. |
40 |
The drive already has to do a lot of mapping for the sake of wear |
41 |
leveling. |
42 |
|
43 |
Really though a better solution than any of this is for the filesystem |
44 |
to be more SSD-aware and just only perform writes on entire erase |
45 |
regions at one time. If the drive is told to write blocks 1-32 then |
46 |
it can just blindly erase their contents first because it knows |
47 |
everything there is getting overwritten anyway. Likewise a filesystem |
48 |
could do its own wear-leveling also, especially on something like |
49 |
flash where the cost of fragmentation is not high. I'm not sure how |
50 |
well either zfs or ext4 perform in these roles. Obviously a solution |
51 |
like f2fs designed for flash storage is going to excel here. |
52 |
|
53 |
-- |
54 |
Rich |