1 |
On Sun, Aug 26, 2012 at 11:14 AM, Alex Schuster <wonko@×××××××××.org> wrote: |
2 |
> Whatever. Then align to 8K instead. But what does this have to do with the |
3 |
> erasable page size? |
4 |
|
5 |
Short answer: Any page written to a block already containing data, the |
6 |
whole block must be erased. This is the "erase block size" people talk |
7 |
about. Block size is always divisible by page size. So if you align to |
8 |
the erase block size, you will always be okay. |
9 |
|
10 |
Long answer: NAND flash cells do not operate like a normal HDD |
11 |
storage, they can only be written to when they are empty. Empty |
12 |
meaning null, devoid of data, unallocated, not just "filled with |
13 |
zeroes" or "ignored by filesystem". So, any time you want to write to |
14 |
a block that already contains data, it must be erased and re-written |
15 |
by the drive controller. |
16 |
|
17 |
On most current-generation SSD the block size is 512k and contains 128 |
18 |
pages (4k each page). In older/slower SSD, or other kind of flash |
19 |
devices like CompactFlash or SD cards, often the erase block is |
20 |
larger, usually 4M or sometimes even up to 16M. (Definitely check the |
21 |
specs for your specific model of SSD to find the correct values.) |
22 |
|
23 |
SSD can write at page-size chunks of data, which is very fast, but |
24 |
only in an empty block. So if the block has data, that data must be |
25 |
relocated or erased and rewritten. TRIM feature tells the SSD that |
26 |
these pages are not used anymore, and allows it to do better garbage |
27 |
collection and combine pages/deallocate those unused blocks. Next time |
28 |
you write to one of those blocks, it will be very fast because erase |
29 |
already happened at TRIM time and these unused blocks are available |
30 |
for writing. |
31 |
|
32 |
This is why SSD without TRIM feature become slower once they have |
33 |
filled up. The drive controller has no knowledge of your filesystem, |
34 |
erase overhead is added to every write once the internal NAND free |
35 |
space is used up. So instead of writing a 4k page now it's potentially |
36 |
erasing 512k data then writing 512k data. 256 times more data touched |
37 |
for the same 4k write! (For a case where you have no TRIM support the |
38 |
only possible way to improve performance once a full drive worth of |
39 |
data has been written is to backup, perform ATA Secure Erase, which |
40 |
will clear the SSD allocation metadata, then restore your backup.) |
41 |
|
42 |
Now imagine if the alignment is not correct for both page size and |
43 |
erase block size, then when you write data it could overlap, causing |
44 |
two blocks to be erased and written instead of only one. In the |
45 |
example from the previous paragraph you can see now how the |
46 |
performance degrades even worse, as well as causing extra erases and |
47 |
writes which will potentially reduce the lifetime of your drive. |
48 |
|
49 |
Additional complexity is added by any further layers, filesystem block |
50 |
size, filesystem alignment (I'm looking at you, FAT32), LVM, RAID |
51 |
stripe size, etc... |
52 |
|
53 |
A good article giving more information about the subject is in the |
54 |
English version of Wikipedia: |
55 |
https://en.wikipedia.org/wiki/Write_amplification |
56 |
|
57 |
(disclaimer: all above info is AFAIK, please correct me if I got any |
58 |
facts or advice wrong) |