1 |
On Tue, Feb 24, 2015 at 8:11 AM, Todd Goodman <tsg@×××××××××.net> wrote: |
2 |
> |
3 |
> Can you explain why a log-based filesystem like f2fs would have any |
4 |
> impact on wear leveling? |
5 |
> |
6 |
> As I understand it, wear leveling (and bad block replacement) occurs on |
7 |
> the SSD itself (in the Flash Translation Layer probably.) |
8 |
> |
9 |
|
10 |
Well, if the device has a really dumb firmware there is nothing you |
11 |
can do to prevent it from wearing itself out. However, log-based |
12 |
filesystems and f2fs in particular are designed to make this very |
13 |
unlikely in practice. |
14 |
|
15 |
Log-based filesystems never overwrite data in place. Instead all |
16 |
changes are appended into empty space, until a large region of the |
17 |
disk is full. Then the filesystem: |
18 |
1. Allocates a new unused contiguous region of the disk (which was |
19 |
already trimmed). This would be aligned to the erase block size on |
20 |
the underlying SSD. |
21 |
2. Copies all data that is still in use from the oldest allocated |
22 |
region of the disk to the new region. |
23 |
3. Trims the entire old region, which was aligned to the erase block |
24 |
size when it was originally allocated. |
25 |
|
26 |
So, the entire space of the disk is written to sequentially, and the |
27 |
head basically eats the tail. Every block on the drive gets written |
28 |
to once before the first block on the drive gets written to twice. |
29 |
|
30 |
The design of the filesystem is basically ideal for flash, and all the |
31 |
firmware has to do is not mess up the perfect order it is handed on a |
32 |
silver platter. You never end up overwriting only part of an erase |
33 |
block in place, and you're trimming very large contiguous regions of |
34 |
the disk at once. Since flash chips don't care about sequential |
35 |
access, if part of the flash starts to fail the firmware just needs to |
36 |
map out those blocks, and as long as it maps an entire erase block at |
37 |
a time you'll get the same performance. Of course, if part of the |
38 |
flash starts to fail the erase count of the remainder of the drive |
39 |
will be identical. so you're going to need a new drive soon. |
40 |
|
41 |
I'd love to see a next-gen filesystem for flash that also takes into |
42 |
account COW snapshotting/reflinks, protection from silent corruption, |
43 |
and some of the RAID-like optimizations possible with btrfs/zfs. |
44 |
Since log-based filesystems are COW by nature I'd think that this |
45 |
would be achievable. The other side of this would be using SSDs as |
46 |
caches for something like btrfs/zfs on disk - something largely |
47 |
possible with zfs today, and perhaps planned for btrfs. |
48 |
|
49 |
-- |
50 |
Rich |