Gentoo Archives: gentoo-user

From: Paul Hartman <paul.hartman+gentoo@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] SSD performance tweaking
Date: Mon, 27 Aug 2012 16:00:50
Message-Id: CAEH5T2MWYtFq4T3TcL1swJ347tAkZ5Cmp51jo37t74iDt+xRhQ@mail.gmail.com
In Reply to: Re: [gentoo-user] SSD performance tweaking by Alex Schuster
1 On Sun, Aug 26, 2012 at 11:14 AM, Alex Schuster <wonko@×××××××××.org> wrote:
2 > Whatever. Then align to 8K instead. But what does this have to do with the
3 > erasable page size?
4
5 Short answer: Any page written to a block already containing data, the
6 whole block must be erased. This is the "erase block size" people talk
7 about. Block size is always divisible by page size. So if you align to
8 the erase block size, you will always be okay.
9
10 Long answer: NAND flash cells do not operate like a normal HDD
11 storage, they can only be written to when they are empty. Empty
12 meaning null, devoid of data, unallocated, not just "filled with
13 zeroes" or "ignored by filesystem". So, any time you want to write to
14 a block that already contains data, it must be erased and re-written
15 by the drive controller.
16
17 On most current-generation SSD the block size is 512k and contains 128
18 pages (4k each page). In older/slower SSD, or other kind of flash
19 devices like CompactFlash or SD cards, often the erase block is
20 larger, usually 4M or sometimes even up to 16M. (Definitely check the
21 specs for your specific model of SSD to find the correct values.)
22
23 SSD can write at page-size chunks of data, which is very fast, but
24 only in an empty block. So if the block has data, that data must be
25 relocated or erased and rewritten. TRIM feature tells the SSD that
26 these pages are not used anymore, and allows it to do better garbage
27 collection and combine pages/deallocate those unused blocks. Next time
28 you write to one of those blocks, it will be very fast because erase
29 already happened at TRIM time and these unused blocks are available
30 for writing.
31
32 This is why SSD without TRIM feature become slower once they have
33 filled up. The drive controller has no knowledge of your filesystem,
34 erase overhead is added to every write once the internal NAND free
35 space is used up. So instead of writing a 4k page now it's potentially
36 erasing 512k data then writing 512k data. 256 times more data touched
37 for the same 4k write! (For a case where you have no TRIM support the
38 only possible way to improve performance once a full drive worth of
39 data has been written is to backup, perform ATA Secure Erase, which
40 will clear the SSD allocation metadata, then restore your backup.)
41
42 Now imagine if the alignment is not correct for both page size and
43 erase block size, then when you write data it could overlap, causing
44 two blocks to be erased and written instead of only one. In the
45 example from the previous paragraph you can see now how the
46 performance degrades even worse, as well as causing extra erases and
47 writes which will potentially reduce the lifetime of your drive.
48
49 Additional complexity is added by any further layers, filesystem block
50 size, filesystem alignment (I'm looking at you, FAT32), LVM, RAID
51 stripe size, etc...
52
53 A good article giving more information about the subject is in the
54 English version of Wikipedia:
55 https://en.wikipedia.org/wiki/Write_amplification
56
57 (disclaimer: all above info is AFAIK, please correct me if I got any
58 facts or advice wrong)