1 |
Kai Krakow <hurikhan77@×××××.com> writes: |
2 |
|
3 |
> Am Sat, 20 Feb 2016 11:24:56 +0100 |
4 |
> schrieb lee <lee@××××××××.de>: |
5 |
> |
6 |
>> > It uses some very clever ideas to place files into groups and into |
7 |
>> > proper order - other than using file mod and access times like other |
8 |
>> > defrag tools do (which even make the problem worse by doing so |
9 |
>> > because this destroys locality of data even more). |
10 |
>> |
11 |
>> I've never heard of MyDefrag, I might try it out. Does it make |
12 |
>> updating any faster? |
13 |
> |
14 |
> Ah well, difficult question... Short answer: It uses countermeasures |
15 |
> against performance after updates decreasing too fast. It does this by |
16 |
> using a "gapped" on-disk file layout - leaving some gaps for Windows to |
17 |
> put temporary files. By this, files don't become a far spread as |
18 |
> usually during updates. But yes, it improves installation time. |
19 |
|
20 |
What difference would that make with an SSD? |
21 |
|
22 |
> Apparently it's unmaintained since a few years but it still does a good |
23 |
> job. It was built upon a theory by a student about how to properly |
24 |
> reorganize file layout on a spinning disk to stay at high performance |
25 |
> as best as possible. |
26 |
|
27 |
For spinning disks, I can see how it can be beneficial. |
28 |
|
29 |
>> > But even SSDs can use _proper_ defragmentation from time to time for |
30 |
>> > increased lifetime and performance (this is due to how the FTL works |
31 |
>> > and because erase blocks are huge, I won't get into detail unless |
32 |
>> > someone asks). This is why mydefrag also supports flash |
33 |
>> > optimization. It works by moving as few files as possible while |
34 |
>> > coalescing free space into big chunks which in turn relaxes |
35 |
>> > pressure on the FTL and allows to have more free and continuous |
36 |
>> > erase blocks which reduces early flash chip wear. A filled SSD with |
37 |
>> > long usage history can certainly gain back some performance from |
38 |
>> > this. |
39 |
>> |
40 |
>> How does it improve performance? It seems to me that, for practical |
41 |
>> use, almost all of the better performance with SSDs is due to reduced |
42 |
>> latency. And IIUC, it doesn't matter for the latency where data is |
43 |
>> stored on an SSD. If its performance degrades over time when data is |
44 |
>> written to it, the SSD sucks, and the manufacturer should have done a |
45 |
>> better job. Why else would I buy an SSD. If it needs to reorganise |
46 |
>> the data stored on it, the firmware should do that. |
47 |
> |
48 |
> There are different factors which have impact on performance, not just |
49 |
> seek times (which, as you write, is the worst performance breaker): |
50 |
> |
51 |
> * management overhead: the OS has to do more house keeping, which |
52 |
> (a) introduces more IOPS (which is the only relevant limiting |
53 |
> factor for SSD) and (b) introduces more CPU cycles and data |
54 |
> structure locking within the OS routines during performing IO which |
55 |
> comes down to more CPU cycles spend during IO |
56 |
|
57 |
How would that be reduced by defragmenting an SSD? |
58 |
|
59 |
> * erasing a block is where SSDs really suck at performance wise, plus |
60 |
> blocks are essentially read-only once written - that's how flash |
61 |
> works, a flash data block needs to be erased prior to being |
62 |
> rewritten - and that is (compared to the rest of its performance) a |
63 |
> really REALLY HUGE time factor |
64 |
|
65 |
So let the SSD do it when it's idle. For applications in which it isn't |
66 |
idle enough, an SSD won't be the best solution. |
67 |
|
68 |
> * erase blocks are huge compared to common filesystem block sizes |
69 |
> (erase block = 1 or 2 MB vs. file system block being 4-64k usually) |
70 |
> which happens to result in this effect: |
71 |
> |
72 |
> - OS replaces a file by writing a new, deleting the old |
73 |
> (common during updates), or the user deletes files |
74 |
> - OS marks some blocks as free in its FS structures, it depends on |
75 |
> the file size and its fragmentation if this gives you a |
76 |
> continuous area of free blocks or many small blocks scattered |
77 |
> across the disk: it results in free space fragmentation |
78 |
> - free space fragments happen to become small over time, much |
79 |
> smaller then the erase block size |
80 |
> - if your system has TRIM/discard support it will tell the SSD |
81 |
> firmware: here, I no longer use those 4k blocks |
82 |
> - as you already figured out: those small blocks marked as free do |
83 |
> not properly align with the erase block size - so actually, you |
84 |
> may end up with a lot of free space but essentially no complete |
85 |
> erase block is marked as free |
86 |
|
87 |
Use smaller erase blocks. |
88 |
|
89 |
> - this situation means: the SSD firmware cannot reclaim this free |
90 |
> space to do "free block erasure" in advance so if you write |
91 |
> another block of small data you may end up with the SSD going |
92 |
> into a direct "read/modify/erase/write" cycle instead of just |
93 |
> "read/modify/write" and deferring the erasing until later - ah |
94 |
> yes, that's probably becoming slow then |
95 |
> - what do we learn: (a) defragment free space from time to time, |
96 |
> (b) enable TRIM/discard to reclaim blocks in advance, (c) you may |
97 |
> want to over-provision your SSD: just don't ever use 10-15% of |
98 |
> your SSD, trim that space, and leave it there for the firmware to |
99 |
> shuffle erase blocks around |
100 |
|
101 |
Use better firmware for SSDs. |
102 |
|
103 |
> - the latter point also increases life-time for obvious reasons as |
104 |
> SSDs only support a limited count of write-cycles per block |
105 |
> - this "shuffling around" blocks is called wear-levelling: the |
106 |
> firmware chooses a block candidate with the least write cycles |
107 |
> for doing "read/modify/write" |
108 |
> |
109 |
> So, SSDs actually do this "reorganization" as you call it - but they |
110 |
> are limited to it within the bounds of erase block sizes - and the |
111 |
> firmware knows nothing about the on-disk format and its smaller blocks, |
112 |
> so it can do nothing to go down to a finer grained reorganization. |
113 |
|
114 |
Well, I can't help it. I'm going to need to use 2 SSDs on a hardware |
115 |
RAID controller in a RAID-1. I expect the SSDs to just work fine. If |
116 |
they don't, then there isn't much point in spending the extra money on |
117 |
them. |
118 |
|
119 |
The system needs to boot from them. So what choice do I have to make |
120 |
these SSDs happy? |
121 |
|
122 |
> These facts are apparently unknown to most people, that's why they are |
123 |
> denying a SSD could become slow or needs some specialized form of |
124 |
> "defragmentation". The usual recommendation is to do a "secure erase" |
125 |
> of the disk if it becomes slow - which I consider pretty harmful as it |
126 |
> rewrites ALL blocks (reducing their write-cycle counter/lifetime), plus |
127 |
> it's time consuming and could be avoided. |
128 |
|
129 |
That isn't an option because it would be way too much hassle. |
130 |
|
131 |
> BTW: OS makers (and FS designers) actually optimize their systems for |
132 |
> that kind of reorganization of the SSD firmware. NTFS may use different |
133 |
> allocation strategies on SSD (just a guess) and in Linux there is F2FS |
134 |
> which actually exploits this reorganization for increased performance |
135 |
> and lifetime, Ext4 and Btrfs use different allocation strategies and |
136 |
> prefer spreading file data instead of free space (which is just the |
137 |
> opposite of what's done for HDD). So, with a modern OS you are much |
138 |
> less prone to the effects described above. |
139 |
|
140 |
Does F2FS come with some sort of redundancy? Reliability and booting |
141 |
from these SSDs are requirements, so I can't really use btrfs because |
142 |
it's troublesome to boot from, and the reliability is questionable. Ext4 |
143 |
doesn't have raid. Using ext4 on mdadm probably won't be any better |
144 |
than using the hardware RAID, so there's no point in doing that, and I |
145 |
rather spare me the overhead. |
146 |
|
147 |
After your explanation, I have to wonder even more than before what the |
148 |
point in using SSDs is, considering current hard- and software which |
149 |
doesn't properly use them. OTOH, so far they do seem to provide better |
150 |
performance than hard disks even when not used with all the special |
151 |
precautions I don't want to have to think about. |
152 |
|
153 |
BTW, why would anyone use SSDs for ZFS's zil or l2arc? Does ZFS treat |
154 |
SSDs properly in this application? |