Re: [gentoo-user] Re: {OT} Allow work from home? - gentoo-user

From:	lee <lee@××××××××.de>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Re: {OT} Allow work from home?
Date:	Sat, 05 Mar 2016 00:24:16
Message-Id:	`878u1xn6mu.fsf@heimdali.yagibdah.de`
In Reply to:	[gentoo-user] Re: {OT} Allow work from home? by Kai Krakow

1

Kai Krakow <hurikhan77@×××××.com> writes:

2

3

> Am Sat, 20 Feb 2016 11:24:56 +0100

4

> schrieb lee <lee@××××××××.de>:

5

>

6

>> > It uses some very clever ideas to place files into groups and into

7

>> > proper order - other than using file mod and access times like other

8

>> > defrag tools do (which even make the problem worse by doing so

9

>> > because this destroys locality of data even more).

10

>>

11

>> I've never heard of MyDefrag, I might try it out.  Does it make

12

>> updating any faster?

13

>

14

> Ah well, difficult question... Short answer: It uses countermeasures

15

> against performance after updates decreasing too fast. It does this by

16

> using a "gapped" on-disk file layout - leaving some gaps for Windows to

17

> put temporary files. By this, files don't become a far spread as

18

> usually during updates. But yes, it improves installation time.

19

20

What difference would that make with an SSD?

21

22

> Apparently it's unmaintained since a few years but it still does a good

23

> job. It was built upon a theory by a student about how to properly

24

> reorganize file layout on a spinning disk to stay at high performance

25

> as best as possible.

26

27

For spinning disks, I can see how it can be beneficial.

28

29

>> > But even SSDs can use _proper_ defragmentation from time to time for

30

>> > increased lifetime and performance (this is due to how the FTL works

31

>> > and because erase blocks are huge, I won't get into detail unless

32

>> > someone asks). This is why mydefrag also supports flash

33

>> > optimization. It works by moving as few files as possible while

34

>> > coalescing free space into big chunks which in turn relaxes

35

>> > pressure on the FTL and allows to have more free and continuous

36

>> > erase blocks which reduces early flash chip wear. A filled SSD with

37

>> > long usage history can certainly gain back some performance from

38

>> > this.

39

>>

40

>> How does it improve performance?  It seems to me that, for practical

41

>> use, almost all of the better performance with SSDs is due to reduced

42

>> latency.  And IIUC, it doesn't matter for the latency where data is

43

>> stored on an SSD.  If its performance degrades over time when data is

44

>> written to it, the SSD sucks, and the manufacturer should have done a

45

>> better job.  Why else would I buy an SSD.  If it needs to reorganise

46

>> the data stored on it, the firmware should do that.

47

>

48

> There are different factors which have impact on performance, not just

49

> seek times (which, as you write, is the worst performance breaker):

50

>

51

>   * management overhead: the OS has to do more house keeping, which

52

>     (a) introduces more IOPS (which is the only relevant limiting

53

>     factor for SSD) and (b) introduces more CPU cycles and data

54

>     structure locking within the OS routines during performing IO which

55

>     comes down to more CPU cycles spend during IO

56

57

How would that be reduced by defragmenting an SSD?

58

59

>   * erasing a block is where SSDs really suck at performance wise, plus

60

>     blocks are essentially read-only once written - that's how flash

61

>     works, a flash data block needs to be erased prior to being

62

>     rewritten - and that is (compared to the rest of its performance) a

63

>     really REALLY HUGE time factor

64

65

So let the SSD do it when it's idle.  For applications in which it isn't

66

idle enough, an SSD won't be the best solution.

67

68

>   * erase blocks are huge compared to common filesystem block sizes

69

>     (erase block = 1 or 2 MB vs. file system block being 4-64k usually)

70

>     which happens to result in this effect:

71

>

72

>     - OS replaces a file by writing a new, deleting the old

73

>       (common during updates), or the user deletes files

74

>     - OS marks some blocks as free in its FS structures, it depends on

75

>       the file size and its fragmentation if this gives you a

76

>       continuous area of free blocks or many small blocks scattered

77

>       across the disk: it results in free space fragmentation

78

>     - free space fragments happen to become small over time, much

79

>       smaller then the erase block size

80

>     - if your system has TRIM/discard support it will tell the SSD

81

>       firmware: here, I no longer use those 4k blocks

82

>     - as you already figured out: those small blocks marked as free do

83

>       not properly align with the erase block size - so actually, you

84

>       may end up with a lot of free space but essentially no complete

85

>       erase block is marked as free

86

87

Use smaller erase blocks.

88

89

>     - this situation means: the SSD firmware cannot reclaim this free

90

>       space to do "free block erasure" in advance so if you write

91

>       another block of small data you may end up with the SSD going

92

>       into a direct "read/modify/erase/write" cycle instead of just

93

>       "read/modify/write" and deferring the erasing until later - ah

94

>       yes, that's probably becoming slow then

95

>     - what do we learn: (a) defragment free space from time to time,

96

>       (b) enable TRIM/discard to reclaim blocks in advance, (c) you may

97

>       want to over-provision your SSD: just don't ever use 10-15% of

98

>       your SSD, trim that space, and leave it there for the firmware to

99

>       shuffle erase blocks around

100

101

Use better firmware for SSDs.

102

103

>     - the latter point also increases life-time for obvious reasons as

104

>       SSDs only support a limited count of write-cycles per block

105

>     - this "shuffling around" blocks is called wear-levelling: the

106

>       firmware chooses a block candidate with the least write cycles

107

>       for doing "read/modify/write"

108

>

109

> So, SSDs actually do this "reorganization" as you call it - but they

110

> are limited to it within the bounds of erase block sizes - and the

111

> firmware knows nothing about the on-disk format and its smaller blocks,

112

> so it can do nothing to go down to a finer grained reorganization.

113

114

Well, I can't help it.  I'm going to need to use 2 SSDs on a hardware

115

RAID controller in a RAID-1.  I expect the SSDs to just work fine.  If

116

they don't, then there isn't much point in spending the extra money on

117

them.

118

119

The system needs to boot from them.  So what choice do I have to make

120

these SSDs happy?

121

122

> These facts are apparently unknown to most people, that's why they are

123

> denying a SSD could become slow or needs some specialized form of

124

> "defragmentation". The usual recommendation is to do a "secure erase"

125

> of the disk if it becomes slow - which I consider pretty harmful as it

126

> rewrites ALL blocks (reducing their write-cycle counter/lifetime), plus

127

> it's time consuming and could be avoided.

128

129

That isn't an option because it would be way too much hassle.

130

131

> BTW: OS makers (and FS designers) actually optimize their systems for

132

> that kind of reorganization of the SSD firmware. NTFS may use different

133

> allocation strategies on SSD (just a guess) and in Linux there is F2FS

134

> which actually exploits this reorganization for increased performance

135

> and lifetime, Ext4 and Btrfs use different allocation strategies and

136

> prefer spreading file data instead of free space (which is just the

137

> opposite of what's done for HDD). So, with a modern OS you are much

138

> less prone to the effects described above.

139

140

Does F2FS come with some sort of redundancy?  Reliability and booting

141

from these SSDs are requirements, so I can't really use btrfs because

142

it's troublesome to boot from, and the reliability is questionable. Ext4

143

doesn't have raid.  Using ext4 on mdadm probably won't be any better

144

than using the hardware RAID, so there's no point in doing that, and I

145

rather spare me the overhead.

146

147

After your explanation, I have to wonder even more than before what the

148

point in using SSDs is, considering current hard- and software which

149

doesn't properly use them.  OTOH, so far they do seem to provide better

150

performance than hard disks even when not used with all the special

151

precautions I don't want to have to think about.

152

153

BTW, why would anyone use SSDs for ZFS's zil or l2arc?  Does ZFS treat

154

SSDs properly in this application?

Gentoo Archives: gentoo-user

Replies

1	Kai Krakow <hurikhan77@×××××.com> writes:
2
3	> Am Sat, 20 Feb 2016 11:24:56 +0100
4	> schrieb lee <lee@××××××××.de>:
5	>
6	>> > It uses some very clever ideas to place files into groups and into
7	>> > proper order - other than using file mod and access times like other
8	>> > defrag tools do (which even make the problem worse by doing so
9	>> > because this destroys locality of data even more).
10	>>
11	>> I've never heard of MyDefrag, I might try it out. Does it make
12	>> updating any faster?
13	>
14	> Ah well, difficult question... Short answer: It uses countermeasures
15	> against performance after updates decreasing too fast. It does this by
16	> using a "gapped" on-disk file layout - leaving some gaps for Windows to
17	> put temporary files. By this, files don't become a far spread as
18	> usually during updates. But yes, it improves installation time.
19
20	What difference would that make with an SSD?
21
22	> Apparently it's unmaintained since a few years but it still does a good
23	> job. It was built upon a theory by a student about how to properly
24	> reorganize file layout on a spinning disk to stay at high performance
25	> as best as possible.
26
27	For spinning disks, I can see how it can be beneficial.
28
29	>> > But even SSDs can use _proper_ defragmentation from time to time for
30	>> > increased lifetime and performance (this is due to how the FTL works
31	>> > and because erase blocks are huge, I won't get into detail unless
32	>> > someone asks). This is why mydefrag also supports flash
33	>> > optimization. It works by moving as few files as possible while
34	>> > coalescing free space into big chunks which in turn relaxes
35	>> > pressure on the FTL and allows to have more free and continuous
36	>> > erase blocks which reduces early flash chip wear. A filled SSD with
37	>> > long usage history can certainly gain back some performance from
38	>> > this.
39	>>
40	>> How does it improve performance? It seems to me that, for practical
41	>> use, almost all of the better performance with SSDs is due to reduced
42	>> latency. And IIUC, it doesn't matter for the latency where data is
43	>> stored on an SSD. If its performance degrades over time when data is
44	>> written to it, the SSD sucks, and the manufacturer should have done a
45	>> better job. Why else would I buy an SSD. If it needs to reorganise
46	>> the data stored on it, the firmware should do that.
47	>
48	> There are different factors which have impact on performance, not just
49	> seek times (which, as you write, is the worst performance breaker):
50	>
51	> * management overhead: the OS has to do more house keeping, which
52	> (a) introduces more IOPS (which is the only relevant limiting
53	> factor for SSD) and (b) introduces more CPU cycles and data
54	> structure locking within the OS routines during performing IO which
55	> comes down to more CPU cycles spend during IO
56
57	How would that be reduced by defragmenting an SSD?
58
59	> * erasing a block is where SSDs really suck at performance wise, plus
60	> blocks are essentially read-only once written - that's how flash
61	> works, a flash data block needs to be erased prior to being
62	> rewritten - and that is (compared to the rest of its performance) a
63	> really REALLY HUGE time factor
64
65	So let the SSD do it when it's idle. For applications in which it isn't
66	idle enough, an SSD won't be the best solution.
67
68	> * erase blocks are huge compared to common filesystem block sizes
69	> (erase block = 1 or 2 MB vs. file system block being 4-64k usually)
70	> which happens to result in this effect:
71	>
72	> - OS replaces a file by writing a new, deleting the old
73	> (common during updates), or the user deletes files
74	> - OS marks some blocks as free in its FS structures, it depends on
75	> the file size and its fragmentation if this gives you a
76	> continuous area of free blocks or many small blocks scattered
77	> across the disk: it results in free space fragmentation
78	> - free space fragments happen to become small over time, much
79	> smaller then the erase block size
80	> - if your system has TRIM/discard support it will tell the SSD
81	> firmware: here, I no longer use those 4k blocks
82	> - as you already figured out: those small blocks marked as free do
83	> not properly align with the erase block size - so actually, you
84	> may end up with a lot of free space but essentially no complete
85	> erase block is marked as free
86
87	Use smaller erase blocks.
88
89	> - this situation means: the SSD firmware cannot reclaim this free
90	> space to do "free block erasure" in advance so if you write
91	> another block of small data you may end up with the SSD going
92	> into a direct "read/modify/erase/write" cycle instead of just
93	> "read/modify/write" and deferring the erasing until later - ah
94	> yes, that's probably becoming slow then
95	> - what do we learn: (a) defragment free space from time to time,
96	> (b) enable TRIM/discard to reclaim blocks in advance, (c) you may
97	> want to over-provision your SSD: just don't ever use 10-15% of
98	> your SSD, trim that space, and leave it there for the firmware to
99	> shuffle erase blocks around
100
101	Use better firmware for SSDs.
102
103	> - the latter point also increases life-time for obvious reasons as
104	> SSDs only support a limited count of write-cycles per block
105	> - this "shuffling around" blocks is called wear-levelling: the
106	> firmware chooses a block candidate with the least write cycles
107	> for doing "read/modify/write"
108	>
109	> So, SSDs actually do this "reorganization" as you call it - but they
110	> are limited to it within the bounds of erase block sizes - and the
111	> firmware knows nothing about the on-disk format and its smaller blocks,
112	> so it can do nothing to go down to a finer grained reorganization.
113
114	Well, I can't help it. I'm going to need to use 2 SSDs on a hardware
115	RAID controller in a RAID-1. I expect the SSDs to just work fine. If
116	they don't, then there isn't much point in spending the extra money on
117	them.
118
119	The system needs to boot from them. So what choice do I have to make
120	these SSDs happy?
121
122	> These facts are apparently unknown to most people, that's why they are
123	> denying a SSD could become slow or needs some specialized form of
124	> "defragmentation". The usual recommendation is to do a "secure erase"
125	> of the disk if it becomes slow - which I consider pretty harmful as it
126	> rewrites ALL blocks (reducing their write-cycle counter/lifetime), plus
127	> it's time consuming and could be avoided.
128
129	That isn't an option because it would be way too much hassle.
130
131	> BTW: OS makers (and FS designers) actually optimize their systems for
132	> that kind of reorganization of the SSD firmware. NTFS may use different
133	> allocation strategies on SSD (just a guess) and in Linux there is F2FS
134	> which actually exploits this reorganization for increased performance
135	> and lifetime, Ext4 and Btrfs use different allocation strategies and
136	> prefer spreading file data instead of free space (which is just the
137	> opposite of what's done for HDD). So, with a modern OS you are much
138	> less prone to the effects described above.
139
140	Does F2FS come with some sort of redundancy? Reliability and booting
141	from these SSDs are requirements, so I can't really use btrfs because
142	it's troublesome to boot from, and the reliability is questionable. Ext4
143	doesn't have raid. Using ext4 on mdadm probably won't be any better
144	than using the hardware RAID, so there's no point in doing that, and I
145	rather spare me the overhead.
146
147	After your explanation, I have to wonder even more than before what the
148	point in using SSDs is, considering current hard- and software which
149	doesn't properly use them. OTOH, so far they do seem to provide better
150	performance than hard disks even when not used with all the special
151	precautions I don't want to have to think about.
152
153	BTW, why would anyone use SSDs for ZFS's zil or l2arc? Does ZFS treat
154	SSDs properly in this application?