[gentoo-user] Re: {OT} Allow work from home? - gentoo-user

From:	Kai Krakow <hurikhan77@×××××.com>
To:	gentoo-user@l.g.o
Subject:	[gentoo-user] Re: {OT} Allow work from home?
Date:	Sun, 06 Mar 2016 12:18:22
Message-Id:	`20160305161610.022bd42e@jupiter.sol.kaishome.de`
In Reply to:	Re: [gentoo-user] Re: {OT} Allow work from home? by lee

1

Am Sat, 05 Mar 2016 00:52:09 +0100

2

schrieb lee <lee@××××××××.de>:

3

4

> >> > It uses some very clever ideas to place files into groups and

5

> >> > into proper order - other than using file mod and access times

6

> >> > like other defrag tools do (which even make the problem worse by

7

> >> > doing so because this destroys locality of data even more).

8

> >>

9

> >> I've never heard of MyDefrag, I might try it out.  Does it make

10

> >> updating any faster?

11

> >

12

> > Ah well, difficult question... Short answer: It uses countermeasures

13

> > against performance after updates decreasing too fast. It does this

14

> > by using a "gapped" on-disk file layout - leaving some gaps for

15

> > Windows to put temporary files. By this, files don't become a far

16

> > spread as usually during updates. But yes, it improves installation

17

> > time.

18

>

19

> What difference would that make with an SSD?

20

21

Well, those gapps are by a good chance a trimmed erase block, so it can

22

be served fast by the SSD firmware. Of course, the same applies if your

23

OS is using discard commands to mark free blocks and you still have

24

enough free space in the FS. So, actually, for SSDs it probably makes

25

no difference.

26

27

> > Apparently it's unmaintained since a few years but it still does a

28

> > good job. It was built upon a theory by a student about how to

29

> > properly reorganize file layout on a spinning disk to stay at high

30

> > performance as best as possible.

31

>

32

> For spinning disks, I can see how it can be beneficial.

33

34

My comment was targetted at this.

35

36

> >> > But even SSDs can use _proper_ defragmentation from time to time

37

> >> > for increased lifetime and performance (this is due to how the

38

> >> > FTL works and because erase blocks are huge, I won't get into

39

> >> > detail unless someone asks). This is why mydefrag also supports

40

> >> > flash optimization. It works by moving as few files as possible

41

> >> > while coalescing free space into big chunks which in turn relaxes

42

> >> > pressure on the FTL and allows to have more free and continuous

43

> >> > erase blocks which reduces early flash chip wear. A filled SSD

44

> >> > with long usage history can certainly gain back some performance

45

> >> > from this.

46

> >>

47

> >> How does it improve performance?  It seems to me that, for

48

> >> practical use, almost all of the better performance with SSDs is

49

> >> due to reduced latency.  And IIUC, it doesn't matter for the

50

> >> latency where data is stored on an SSD.  If its performance

51

> >> degrades over time when data is written to it, the SSD sucks, and

52

> >> the manufacturer should have done a better job.  Why else would I

53

> >> buy an SSD.  If it needs to reorganise the data stored on it, the

54

> >> firmware should do that.

55

> >

56

> > There are different factors which have impact on performance, not

57

> > just seek times (which, as you write, is the worst performance

58

> > breaker):

59

> >

60

> >   * management overhead: the OS has to do more house keeping, which

61

> >     (a) introduces more IOPS (which is the only relevant limiting

62

> >     factor for SSD) and (b) introduces more CPU cycles and data

63

> >     structure locking within the OS routines during performing IO

64

> > which comes down to more CPU cycles spend during IO

65

>

66

> How would that be reduced by defragmenting an SSD?

67

68

FS structures are coalesced back into simpler structures by

69

defragmenting, e.g. btrfs creates a huge overhead by splitting extents

70

due to its COW nature. Doing a defrag here combines this back into

71

fewer extents. It's reported on the btrfs list that this CAN make a big

72

difference even for SSD, tho usually you only see the performance loss

73

with heavily fragmented files like VM images - so recommendation here

74

is to set those files nocow.

75

76

> >   * erasing a block is where SSDs really suck at performance wise,

77

> > plus blocks are essentially read-only once written - that's how

78

> > flash works, a flash data block needs to be erased prior to being

79

> >     rewritten - and that is (compared to the rest of its

80

> > performance) a really REALLY HUGE time factor

81

>

82

> So let the SSD do it when it's idle.  For applications in which it

83

> isn't idle enough, an SSD won't be the best solution.

84

85

That's probably true - haven't thought of this.

86

87

> >   * erase blocks are huge compared to common filesystem block sizes

88

> >     (erase block = 1 or 2 MB vs. file system block being 4-64k

89

> > usually) which happens to result in this effect:

90

> >

91

> >     - OS replaces a file by writing a new, deleting the old

92

> >       (common during updates), or the user deletes files

93

> >     - OS marks some blocks as free in its FS structures, it depends

94

> > on the file size and its fragmentation if this gives you a

95

> >       continuous area of free blocks or many small blocks scattered

96

> >       across the disk: it results in free space fragmentation

97

> >     - free space fragments happen to become small over time, much

98

> >       smaller then the erase block size

99

> >     - if your system has TRIM/discard support it will tell the SSD

100

> >       firmware: here, I no longer use those 4k blocks

101

> >     - as you already figured out: those small blocks marked as free

102

> > do not properly align with the erase block size - so actually, you

103

> >       may end up with a lot of free space but essentially no

104

> > complete erase block is marked as free

105

>

106

> Use smaller erase blocks.

107

108

It's a hardware limitation - and it's probably not going to change. I

109

think erase blocks will become even bigger when capacities increase.

110

111

> >     - this situation means: the SSD firmware cannot reclaim this

112

> > free space to do "free block erasure" in advance so if you write

113

> >       another block of small data you may end up with the SSD going

114

> >       into a direct "read/modify/erase/write" cycle instead of just

115

> >       "read/modify/write" and deferring the erasing until later - ah

116

> >       yes, that's probably becoming slow then

117

> >     - what do we learn: (a) defragment free space from time to time,

118

> >       (b) enable TRIM/discard to reclaim blocks in advance, (c) you

119

> > may want to over-provision your SSD: just don't ever use 10-15% of

120

> >       your SSD, trim that space, and leave it there for the

121

> > firmware to shuffle erase blocks around

122

>

123

> Use better firmware for SSDs.

124

125

This is a technical limitation. I don't think there's anything a

126

firmware could improve here - except by using internal overprovisioning

127

and bigger caches to defer this into idle background - but see your

128

comment above regarding idle time.

129

130

Problem that goes hand in hand with this: If your SSD firmware falls

131

back to "read/erase/modify/write" cycle, this wears the flash cells

132

much faster. Thus, I'd recommend to use bigger overprovisioning

133

depending on application and usage pattern.

134

135

> >     - the latter point also increases life-time for obvious reasons

136

> > as SSDs only support a limited count of write-cycles per block

137

> >     - this "shuffling around" blocks is called wear-levelling: the

138

> >       firmware chooses a block candidate with the least write cycles

139

> >       for doing "read/modify/write"

140

> >

141

> > So, SSDs actually do this "reorganization" as you call it - but they

142

> > are limited to it within the bounds of erase block sizes - and the

143

> > firmware knows nothing about the on-disk format and its smaller

144

> > blocks, so it can do nothing to go down to a finer grained

145

> > reorganization.

146

>

147

> Well, I can't help it.  I'm going to need to use 2 SSDs on a hardware

148

> RAID controller in a RAID-1.  I expect the SSDs to just work fine.  If

149

> they don't, then there isn't much point in spending the extra money on

150

> them.

151

>

152

> The system needs to boot from them.  So what choice do I have to make

153

> these SSDs happy?

154

155

Well, from OS point of view they should just work the same with

156

hardware and software RAID. Your RAID controller should support passing

157

discard commands down to the SSD - or you use bigger overprovisioning

158

by not assigning all space to the array configuration.

159

160

But by all means: It is worth spending the money. We are using mirrored

161

SSDs for LSI CacheCade configuration - the result is lightning-fast

162

systems. The SSD mirror just acts as a huge write-back and random

163

access cache for the bigger spinning RAID sets - like l2arc does for

164

ZFS, just at RAID controller level. This way, you can have your cake

165

and eat it, too: Best of both worlds - big storage + high IOPS.

166

167

> > These facts are apparently unknown to most people, that's why they

168

> > are denying a SSD could become slow or needs some specialized form

169

> > of "defragmentation". The usual recommendation is to do a "secure

170

> > erase" of the disk if it becomes slow - which I consider pretty

171

> > harmful as it rewrites ALL blocks (reducing their write-cycle

172

> > counter/lifetime), plus it's time consuming and could be avoided.

173

>

174

> That isn't an option because it would be way too much hassle.

175

176

You mean secure erase: Yes. Not an option. For different reasons.

177

178

> > BTW: OS makers (and FS designers) actually optimize their systems

179

> > for that kind of reorganization of the SSD firmware. NTFS may use

180

> > different allocation strategies on SSD (just a guess) and in Linux

181

> > there is F2FS which actually exploits this reorganization for

182

> > increased performance and lifetime, Ext4 and Btrfs use different

183

> > allocation strategies and prefer spreading file data instead of

184

> > free space (which is just the opposite of what's done for HDD). So,

185

> > with a modern OS you are much less prone to the effects described

186

> > above.

187

>

188

> Does F2FS come with some sort of redundancy?  Reliability and booting

189

> from these SSDs are requirements, so I can't really use btrfs because

190

> it's troublesome to boot from, and the reliability is questionable.

191

> Ext4 doesn't have raid.  Using ext4 on mdadm probably won't be any

192

> better than using the hardware RAID, so there's no point in doing

193

> that, and I rather spare me the overhead.

194

195

Well, you can use F2FS with mdadm. Btrfs boots just fine if you are not

196

using multi-device btrfs - so you have to fall back to hardware RAID or

197

mdadm instead of using btrfs native RAID pooling.

198

199

> After your explanation, I have to wonder even more than before what

200

> the point in using SSDs is, considering current hard- and software

201

> which doesn't properly use them.  OTOH, so far they do seem to

202

> provide better performance than hard disks even when not used with

203

> all the special precautions I don't want to have to think about.

204

205

Yes, they do. But I think there's still lot that can be done.

206

Developing file systems is a multi-year, if not multi-decade process.

207

Historically, everything is designed around spinning disk

208

characteristics. Of course, much has been done already to make these FS

209

work better with SSD: Ext4 has optimizations, btrfs was designed with

210

having SSD in mind, F2FS is a completely new filesystem specifically

211

targetted at simple flash storage (those without an FTL, read: embedded

212

devices) but also works great for SSD (which uses an FTL), most other

213

systems added some sort of caches to make use of SSDs while still

214

providing big storage, that is:

215

216

> BTW, why would anyone use SSDs for ZFS's zil or l2arc?  Does ZFS treat

217

> SSDs properly in this application?

218

219

ZFS' caches are properly designed around this, I think. Linux adds its

220

own l2arc/zil like caches (usable for every FS), namely bcache,

221

flashcache, mdcache, maybe more... I'm very confident with bcache in

222

writeback mode for my home system. [1]

223

224

Hardware solutions like LSI CacheCade also work very well. So, if

225

you're using a RAID controller anyways, consider that.

226

227

But I think all of those caches just work around the design patterns of

228

todays common filesystems - those can still use improvements and

229

optimizations. But in itself I already see it as a huge improvement.

230

231

[1]: Tho, I must say that you can wear out your SSD with bcache in

232

around 2 years, at least the cheaper ones. But my Win7 VM can boot in 7

233

seconds at best with it (btrfs-raid/bcache), tho usually it's around

234

15-20 seconds - an its image is bigger than my SSD. And working with

235

it feels no different than using Win7 natively on SDD (read: no VM,

236

drive C and everything on SSD). But actually, I feel it's simpler to

237

replace the caching-SSD due to wearing out than reinstalling the system

238

on a new SSD when used natively due to it's space just becoming to

239

small.

240

241

--

242

Regards,

243

Kai

244

245

Replies to list-only preferred.

1	Am Sat, 05 Mar 2016 00:52:09 +0100
2	schrieb lee <lee@××××××××.de>:
3
4	> >> > It uses some very clever ideas to place files into groups and
5	> >> > into proper order - other than using file mod and access times
6	> >> > like other defrag tools do (which even make the problem worse by
7	> >> > doing so because this destroys locality of data even more).
8	> >>
9	> >> I've never heard of MyDefrag, I might try it out. Does it make
10	> >> updating any faster?
11	> >
12	> > Ah well, difficult question... Short answer: It uses countermeasures
13	> > against performance after updates decreasing too fast. It does this
14	> > by using a "gapped" on-disk file layout - leaving some gaps for
15	> > Windows to put temporary files. By this, files don't become a far
16	> > spread as usually during updates. But yes, it improves installation
17	> > time.
18	>
19	> What difference would that make with an SSD?
20
21	Well, those gapps are by a good chance a trimmed erase block, so it can
22	be served fast by the SSD firmware. Of course, the same applies if your
23	OS is using discard commands to mark free blocks and you still have
24	enough free space in the FS. So, actually, for SSDs it probably makes
25	no difference.
26
27	> > Apparently it's unmaintained since a few years but it still does a
28	> > good job. It was built upon a theory by a student about how to
29	> > properly reorganize file layout on a spinning disk to stay at high
30	> > performance as best as possible.
31	>
32	> For spinning disks, I can see how it can be beneficial.
33
34	My comment was targetted at this.
35
36	> >> > But even SSDs can use _proper_ defragmentation from time to time
37	> >> > for increased lifetime and performance (this is due to how the
38	> >> > FTL works and because erase blocks are huge, I won't get into
39	> >> > detail unless someone asks). This is why mydefrag also supports
40	> >> > flash optimization. It works by moving as few files as possible
41	> >> > while coalescing free space into big chunks which in turn relaxes
42	> >> > pressure on the FTL and allows to have more free and continuous
43	> >> > erase blocks which reduces early flash chip wear. A filled SSD
44	> >> > with long usage history can certainly gain back some performance
45	> >> > from this.
46	> >>
47	> >> How does it improve performance? It seems to me that, for
48	> >> practical use, almost all of the better performance with SSDs is
49	> >> due to reduced latency. And IIUC, it doesn't matter for the
50	> >> latency where data is stored on an SSD. If its performance
51	> >> degrades over time when data is written to it, the SSD sucks, and
52	> >> the manufacturer should have done a better job. Why else would I
53	> >> buy an SSD. If it needs to reorganise the data stored on it, the
54	> >> firmware should do that.
55	> >
56	> > There are different factors which have impact on performance, not
57	> > just seek times (which, as you write, is the worst performance
58	> > breaker):
59	> >
60	> > * management overhead: the OS has to do more house keeping, which
61	> > (a) introduces more IOPS (which is the only relevant limiting
62	> > factor for SSD) and (b) introduces more CPU cycles and data
63	> > structure locking within the OS routines during performing IO
64	> > which comes down to more CPU cycles spend during IO
65	>
66	> How would that be reduced by defragmenting an SSD?
67
68	FS structures are coalesced back into simpler structures by
69	defragmenting, e.g. btrfs creates a huge overhead by splitting extents
70	due to its COW nature. Doing a defrag here combines this back into
71	fewer extents. It's reported on the btrfs list that this CAN make a big
72	difference even for SSD, tho usually you only see the performance loss
73	with heavily fragmented files like VM images - so recommendation here
74	is to set those files nocow.
75
76	> > * erasing a block is where SSDs really suck at performance wise,
77	> > plus blocks are essentially read-only once written - that's how
78	> > flash works, a flash data block needs to be erased prior to being
79	> > rewritten - and that is (compared to the rest of its
80	> > performance) a really REALLY HUGE time factor
81	>
82	> So let the SSD do it when it's idle. For applications in which it
83	> isn't idle enough, an SSD won't be the best solution.
84
85	That's probably true - haven't thought of this.
86
87	> > * erase blocks are huge compared to common filesystem block sizes
88	> > (erase block = 1 or 2 MB vs. file system block being 4-64k
89	> > usually) which happens to result in this effect:
90	> >
91	> > - OS replaces a file by writing a new, deleting the old
92	> > (common during updates), or the user deletes files
93	> > - OS marks some blocks as free in its FS structures, it depends
94	> > on the file size and its fragmentation if this gives you a
95	> > continuous area of free blocks or many small blocks scattered
96	> > across the disk: it results in free space fragmentation
97	> > - free space fragments happen to become small over time, much
98	> > smaller then the erase block size
99	> > - if your system has TRIM/discard support it will tell the SSD
100	> > firmware: here, I no longer use those 4k blocks
101	> > - as you already figured out: those small blocks marked as free
102	> > do not properly align with the erase block size - so actually, you
103	> > may end up with a lot of free space but essentially no
104	> > complete erase block is marked as free
105	>
106	> Use smaller erase blocks.
107
108	It's a hardware limitation - and it's probably not going to change. I
109	think erase blocks will become even bigger when capacities increase.
110
111	> > - this situation means: the SSD firmware cannot reclaim this
112	> > free space to do "free block erasure" in advance so if you write
113	> > another block of small data you may end up with the SSD going
114	> > into a direct "read/modify/erase/write" cycle instead of just
115	> > "read/modify/write" and deferring the erasing until later - ah
116	> > yes, that's probably becoming slow then
117	> > - what do we learn: (a) defragment free space from time to time,
118	> > (b) enable TRIM/discard to reclaim blocks in advance, (c) you
119	> > may want to over-provision your SSD: just don't ever use 10-15% of
120	> > your SSD, trim that space, and leave it there for the
121	> > firmware to shuffle erase blocks around
122	>
123	> Use better firmware for SSDs.
124
125	This is a technical limitation. I don't think there's anything a
126	firmware could improve here - except by using internal overprovisioning
127	and bigger caches to defer this into idle background - but see your
128	comment above regarding idle time.
129
130	Problem that goes hand in hand with this: If your SSD firmware falls
131	back to "read/erase/modify/write" cycle, this wears the flash cells
132	much faster. Thus, I'd recommend to use bigger overprovisioning
133	depending on application and usage pattern.
134
135	> > - the latter point also increases life-time for obvious reasons
136	> > as SSDs only support a limited count of write-cycles per block
137	> > - this "shuffling around" blocks is called wear-levelling: the
138	> > firmware chooses a block candidate with the least write cycles
139	> > for doing "read/modify/write"
140	> >
141	> > So, SSDs actually do this "reorganization" as you call it - but they
142	> > are limited to it within the bounds of erase block sizes - and the
143	> > firmware knows nothing about the on-disk format and its smaller
144	> > blocks, so it can do nothing to go down to a finer grained
145	> > reorganization.
146	>
147	> Well, I can't help it. I'm going to need to use 2 SSDs on a hardware
148	> RAID controller in a RAID-1. I expect the SSDs to just work fine. If
149	> they don't, then there isn't much point in spending the extra money on
150	> them.
151	>
152	> The system needs to boot from them. So what choice do I have to make
153	> these SSDs happy?
154
155	Well, from OS point of view they should just work the same with
156	hardware and software RAID. Your RAID controller should support passing
157	discard commands down to the SSD - or you use bigger overprovisioning
158	by not assigning all space to the array configuration.
159
160	But by all means: It is worth spending the money. We are using mirrored
161	SSDs for LSI CacheCade configuration - the result is lightning-fast
162	systems. The SSD mirror just acts as a huge write-back and random
163	access cache for the bigger spinning RAID sets - like l2arc does for
164	ZFS, just at RAID controller level. This way, you can have your cake
165	and eat it, too: Best of both worlds - big storage + high IOPS.
166
167	> > These facts are apparently unknown to most people, that's why they
168	> > are denying a SSD could become slow or needs some specialized form
169	> > of "defragmentation". The usual recommendation is to do a "secure
170	> > erase" of the disk if it becomes slow - which I consider pretty
171	> > harmful as it rewrites ALL blocks (reducing their write-cycle
172	> > counter/lifetime), plus it's time consuming and could be avoided.
173	>
174	> That isn't an option because it would be way too much hassle.
175
176	You mean secure erase: Yes. Not an option. For different reasons.
177
178	> > BTW: OS makers (and FS designers) actually optimize their systems
179	> > for that kind of reorganization of the SSD firmware. NTFS may use
180	> > different allocation strategies on SSD (just a guess) and in Linux
181	> > there is F2FS which actually exploits this reorganization for
182	> > increased performance and lifetime, Ext4 and Btrfs use different
183	> > allocation strategies and prefer spreading file data instead of
184	> > free space (which is just the opposite of what's done for HDD). So,
185	> > with a modern OS you are much less prone to the effects described
186	> > above.
187	>
188	> Does F2FS come with some sort of redundancy? Reliability and booting
189	> from these SSDs are requirements, so I can't really use btrfs because
190	> it's troublesome to boot from, and the reliability is questionable.
191	> Ext4 doesn't have raid. Using ext4 on mdadm probably won't be any
192	> better than using the hardware RAID, so there's no point in doing
193	> that, and I rather spare me the overhead.
194
195	Well, you can use F2FS with mdadm. Btrfs boots just fine if you are not
196	using multi-device btrfs - so you have to fall back to hardware RAID or
197	mdadm instead of using btrfs native RAID pooling.
198
199	> After your explanation, I have to wonder even more than before what
200	> the point in using SSDs is, considering current hard- and software
201	> which doesn't properly use them. OTOH, so far they do seem to
202	> provide better performance than hard disks even when not used with
203	> all the special precautions I don't want to have to think about.
204
205	Yes, they do. But I think there's still lot that can be done.
206	Developing file systems is a multi-year, if not multi-decade process.
207	Historically, everything is designed around spinning disk
208	characteristics. Of course, much has been done already to make these FS
209	work better with SSD: Ext4 has optimizations, btrfs was designed with
210	having SSD in mind, F2FS is a completely new filesystem specifically
211	targetted at simple flash storage (those without an FTL, read: embedded
212	devices) but also works great for SSD (which uses an FTL), most other
213	systems added some sort of caches to make use of SSDs while still
214	providing big storage, that is:
215
216	> BTW, why would anyone use SSDs for ZFS's zil or l2arc? Does ZFS treat
217	> SSDs properly in this application?
218
219	ZFS' caches are properly designed around this, I think. Linux adds its
220	own l2arc/zil like caches (usable for every FS), namely bcache,
221	flashcache, mdcache, maybe more... I'm very confident with bcache in
222	writeback mode for my home system. [1]
223
224	Hardware solutions like LSI CacheCade also work very well. So, if
225	you're using a RAID controller anyways, consider that.
226
227	But I think all of those caches just work around the design patterns of
228	todays common filesystems - those can still use improvements and
229	optimizations. But in itself I already see it as a huge improvement.
230
231	[1]: Tho, I must say that you can wear out your SSD with bcache in
232	around 2 years, at least the cheaper ones. But my Win7 VM can boot in 7
233	seconds at best with it (btrfs-raid/bcache), tho usually it's around
234	15-20 seconds - an its image is bigger than my SSD. And working with
235	it feels no different than using Win7 natively on SDD (read: no VM,
236	drive C and everything on SSD). But actually, I feel it's simpler to
237	replace the caching-SSD due to wearing out than reinstalling the system
238	on a new SSD when used natively due to it's space just becoming to
239	small.
240
241	--
242	Regards,
243	Kai
244
245	Replies to list-only preferred.

Gentoo Archives: gentoo-user