Re: [gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite - gentoo-amd64

From:	Bernhard Auzinger <e0026053@×××××××××××××××××.at>
To:	gentoo-amd64@l.g.o
Subject:	Re: [gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite
Date:	Thu, 09 Feb 2006 19:08:37
Message-Id:	`200602092009.41427.e0026053@student.tuwien.ac.at`
In Reply to:	[gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite by Duncan <1i5t5.duncan@cox.net>

1

May I put my oar into your optimisation dicussion. 

2

3

It's funny, Duncan. On the one side you are saving every byte of cpu-cache. On 

4

the other side, you are happy by having forked bashes in your main memory. 

5

But how do you take control about that? I mean, how do you get the code of 

6

your forked bashes away from your cpu cache to have it free for kernel code?

7

8

A long time ago . . ., I was testing some CFLAGS on my own programs. I wrote a 

9

fast-fourier algorithm myself, only to see the "impressive" difference 

10

between Os, O3 and some other optimisation flags. I fed my fast-fourier 

11

algorithm with a large amount of input. But no matter how hard I tried to get 

12

it faster by changing the flags, it didn't work. The difference is marginal 

13

and not every flag brings improvement for every program. The only thing that 

14

changed a lot was the time gcc needs to perform those optimisations.

15

16

Bernhard

17

Am Donnerstag 09 Februar 2006 01:17 schrieb Duncan:

18

> Simon Stelling posted <43EA568D.6020307@g.o>, excerpted below,  on

19

>

20

> Wed, 08 Feb 2006 21:37:33 +0100:

21

> > Duncan wrote:

22

> >> I should really create a page listing all the little Gentoo admin

23

> >> scripts I've come up with and how I use them.  I'm sure a few folks

24

> >> anyway would likely find them useful.

25

> >>

26

> >> The idea behind most of them is to create shortcuts to having to type in

27

> >> long emerge lines, with all sorts of arbitrary command line parameters.

28

> >> The majority of these fall into two categories, ea* and ep*, short for

29

> >> emerge --ask <additional parameters> and emerge --pretend ... .  Thus, I

30

> >> have epworld and eaworld, the pretend and ask versions of emerge -NuDv

31

> >> world, epsys and easys, the same for system, eplog <package>, emerge

32

> >> --pretend --log --verbose (package name to be added to the command line

33

> >> so eplog gcc, for instance, to see the changes between my current and

34

> >> the new version of gcc), eptree <package>, to use the tree output, etc.

35

> >

36

> > Interesting. But why do you use scripts and not simple aliases? Every

37

> > time you launch your script the HD performs a seek (which is very

38

> > expensive in time), copies the script into memory and then forks a whole

39

> > bash process to execute a one-liner. Using alias, which is a bash

40

> > built-in, wouldn't fork a process and therefore be much faster.

41

>

42

> My thinking, which is possibly incorrect (your input appreciated), is that

43

> file-based scripts get pulled into cache the first time they are executed,

44

> and will remain there (with a gig of memory) pretty much until I'm done

45

> doing my upgrades.  At the same time, they are simply in cache, not

46

> something in bash's memory, so if the memory is needed, it will be

47

> reclaimed.  As well, after I'm done and on to other tasks, the cached

48

> commands will eventually be replaced by other data, if need be.

49

>

50

> Aliases (and bash-functions) are held in memory.  That's not as flexible

51

> as cache in terms of being knocked out of memory if the memory is needed

52

> by other things.  Sure, that memory may be flushed to disk-based swap, but

53

> that's disk based the same as the actual script files  I'm using, so

54

> reading it back into main memory if it's faulted out will take something

55

> comparable to the time it'd take to read in the script file again anyway.

56

> That's little gain, with the additional overhead and therefore loss of

57

> having to manage the temp-copy in swapped memory, if it comes to that.

58

>

59

> Actually, there are some details here that may affect things.  I don't

60

> know enough about the following factors to be able to evaluate how they

61

> balance out, but the real reason I chose individual scripts is below.

62

>

63

> One, here anyway, tho not on most systems, I'm running four SATA disks in

64

> RAID.  The swap is actually not on the RAID, as the kernel manages it like

65

> RAID on its own, provided all four swap areas are set to the same priority

66

> (they are), which means swap is running on the equivalent of

67

> four-way-striped RAID-0.  Meanwhile, the scripts, as part of my main

68

> system, are on RAID-6 for redundancy, so with the same four disks backing

69

> the RAID-6 as the swap, I've only effectively two-way-striped storage

70

> there, the other two disk stripes being parity.  Thus, retrieval from the

71

> 4-way-striped swap should in theory be more efficient than from the

72

> 2-way-striped regular storage.  OTOH, the granularity of the stripe

73

> in either case, against the size of the one or two-line script, likely

74

> means that it'll be pulled from a single stripe (at the speed of

75

> reading from a single disk, tho there are parallelizing opportunities

76

> not available on a single disk).  It's also likely that the swap will be

77

> more optimally managed for fast retrieval than the location on the regular

78

> filesystem is.  Balanced against that we have the overhead of maintaining

79

> the swap tracking.

80

>

81

> That's assuming it would swap that out to the dedicated swap in the first

82

> place.  I'm not familiar with Linux's VM, but given that the aliases and

83

> functions would be file-based in either case, it's possible it would

84

> simply drop the data from main memory, relying on the fact that that the

85

> data is clean file-backed data and could be read-in directly from the

86

> files again, if necessary, rather than bothering with actually creating a

87

> temporary copy of the /same/ data in swap, taking time to do so when it

88

> could just read it back in from the file.

89

>

90

> Another aspect is the effect of data vs metadata caching.  Again, I'm not

91

> familiar with how Linux manages this, and indeed, it may differ between

92

> filesystems, but the idea is that if the file metadata is still cached,

93

> even if the file itself isn't, it's a single disk seek and read to read

94

> the data back in, as opposed to multiple seeks and reads, following the

95

> logical directory structure to fetch each directory table in the

96

> hierarchy until it reaches the entry that actually has the file location,

97

> before it can read the file itself, to read the file initially, or if the

98

> location metadata has been flushed as well.  (Back several years ago on

99

> MSWormOS, one of the first things I always did after a reinstall was set

100

> the system to server profile, which kept a far larger metadata cache, on

101

> the theory that the metadata was usually smaller than the data, and for

102

> dirs, sharable among many data files, so I'd rather spend cache memory on

103

> metadata than data.  The other choices were the default desktop profile,

104

> and laptop, a much smaller metadata cache.  I originally learned about

105

> these as a result of reading about a bug in the original 95 as shipped,

106

> that swapped some entries in the registry, and therefore cached FAR less

107

> metadata than it should have. I don't know where these tweaks are located

108

> on Linux, or how to go about adjusting them safely.)

109

>

110

> Basically, therefore, I don't believe aliases to be a big positive, and

111

> possibly somewhat of a negative, as opposed to scripts, because the

112

> scripts will be cached in most cases after initial use anyway, yet they

113

> have the advantage of not having to be maintained or tracked in memory

114

> when I'm doing other tasks and the system needs that cache.

115

>

116

> Given that I don't believe it's a big positive, I prefer the

117

> administrative convenience and maintainability of separate scripts.

118

>

119

> There /is/ a third alternative, that I came across recently, that I think

120

> is a good idea.  If you'd coomment, perhaps it would help me sort out the

121

> implications.

122

>

123

> The idea, simply put, is "bash command theming", single scripts that can

124

> be invoked that will "theme" a command prompt for the tasks at hand.  I

125

> didn't read the entire article I saw covering this, but skimmed it enough

126

> to get the gist.  A single invokable script for each set of tasks, say

127

> perl programming, bash programming, working with portage, etc, that would

128

> set up a specific set of aliases and functions for that task.  Invoking

129

> the script with the "off" parameter would erase that set of aliases and

130

> bash functions, thereby recovering the memory, and do any related cleanup

131

> like resetting the path if necessary to exclude any task specific

132

> commands.  Taking this a step further, a variable could be setup that

133

> would list the theme or themes that were active, that the theme-setup

134

> script could read and automatically deactivate the previous theme while

135

> switching to the new one.  One could even share functionality between

136

> themes, sourcing common files, which would check the active theme and

137

> adjust their behavior based on the active theme.

138

>

139

> This alias and function theming wouldn't be quite as modular (tho with

140

> sourcing it could be) as the individual scripts, but would maintain the

141

> performance advantages (if any) of the alias/function idea, while at the

142

> same time allowing the memory reclamation of the cached-script option.  It

143

> sounds really good, but I'm not yet convinced the benefits would be worth

144

> the additional effort of setting up those themes, since the solution I

145

> have works.

146

>

147

> One VERY NICE benefit of the themes idea is that it would directly

148

> address any namespace pollution concerns.  It has a direct appeal to

149

> programmers and anyone else that's ever had to deal with such issues, for

150

> that reason alone.  One single command on the path to invoke the theme,

151

> possibly even an eselect-like command shared among themes, with

152

> everything else off-path and out of the namespace unless that theme is

153

> invoked!  /VERY/ appealing indeed.  OTOH, there are those who'll never

154

> remember the theme they have active at the moment, and be constantly

155

> confused.  For these folks, it'd be a nightmare!

156

>

157

> > man emerge:

158

> >        --oneshot (-1)

159

> >

160

> > IIRC --oneshot has a short form since 2.0.52 was released.

161

>

162

> Learn new things everyday.  Thanks!  I remember how pleased I was to have

163

> --newuse, and even more so when I discovered -N, so very nice!

164

>

165

> >> ...  Deep breath... <g>

166

> >>

167

> >> All that as a preliminary explanation to this:  Along with the above, I

168

> >> have a set of efetch functions, that invoke the -f form, so just do the

169

> >> fetch, not the actual compile and merge, and esyn (there's already an

170

> >> esync function in something or other I have merged so I just call it

171

> >> esyn), which does emerge sync, then updates the esearch db, then

172

> >> automatically fetches all the packages that an eaworld would want to

173

> >> update, so they are ready for me to merge at my leisure.

174

> >

175

> > I'm a bit confused now. You use *functions* to do that? Or do you mean

176

> > scripts? By the way: with alias you could name your custom "script"

177

> > esync because it doesn't place a file on the harddisk.

178

>

179

> Scripts.  I was using "functions" in the generic sense here.  I did

180

> realize before I sent that it had a dual meaning, but figured it wasn't

181

> important enough a distinction to go back and correct, or explain.

182

> Unfortunately, every time I decide to skip something like that, I get

183

> called on it, which doesn't help my posts get any shorter! =8^)

184

>

185

> >> I choose -Os, optimize for size, because a modern CPU and the various

186

> >> cache levels are FAR faster than main memory.

187

> >

188

> > Given the fact that two CPUs, only differing in L2 Cache size, have

189

> > nearly the same performance, I doubt that the performance increase is

190

> > very big. Some interesting figures:

191

> >

192

> > Athlon64 something (forgot what, but shouldn't matter anyway) with 1 MB

193

> > L2-cache is 4% faster than an Athlon64 of the same frequency but with

194

> > only 512kB L2-cache. The bigger the cache sizes you compare get, the

195

> > smaller the performance increase. Since you run a dual Opteron system

196

> > with 1 MB L2 cache per CPU I tend to say that the actual performance

197

> > increase you experience is about 3%. But then I didn't take into account

198

> > that -Os leaves out a few optimizations which would be included by -O2,

199

> > the default optimization level, which actually makes the code a bit

200

> > slower when compared to -O2. So, the performance increase you really

201

> > experience shrinks to about 0-2%. I'd tend to proclaim that -O2 is even

202

> > faster for most of the code, but that's only my feeling.

203

>

204

> Interesting, indeed.  I'd counter that it likely has to do with how many

205

> tasks are being juggled as well, plus the number of kernel/user context

206

> switches, of course.  I wonder under what load, and with what task-type,

207

> the above 4% difference was measured.

208

>

209

> Of course, the definitive way to end the argument would be to do some

210

> profiling and get some hard numbers, but I don't think either you or I

211

> consider it an important enough factor in our lives to go to /that/ sort

212

> of trouble. <g>

213

>

214

> > Beside that I should mention that -Os sometimes still has problems with

215

> > huge packages like glibc.

216

>

217

> Interestingly enough, while Gentoo's glibc ebuilds stripflags to -O2, I

218

> did try it with all that stripflags logic disabled.  For glibc, it /does/

219

> seem to slow things down, or did back with gcc-3.3 (IIRC) anyway.  I tried

220

> the same glibc both ways.  I would have tried tinkering further, but

221

> decided it wasn't worth complicating debugging and the like, since glibc

222

> is loaded by virtually everything, and I'd never be able to tell if it was

223

> my funny tweaks to glibc, or some actual issue with whatever package.

224

> Besides, that's an aweful costly package, in terms of recompile time, not

225

> to mention system stability, to be experimenting with.  I /can/ say,

226

> however, that it didn't crash or cause any other issues I could see or

227

> attribute to it.

228

>

229

> OTOH, I haven't tried it with xorg-modular yet, but the monolithic xorg

230

> builds seemed to perform better with -Os.  I tried one of them (6.8??)

231

> both ways too.  I ended up routinely killing the stripflags logic, but I

232

> was modifying other portions of the ebuild as well (so it compiled only

233

> the ATI video driver, and only installed the 100-dpi fonts, not 75-dpi,

234

> among other things), so that was just one of several modifications I was

235

> making, tho the only real performance affecting one. Performance in X was

236

> better, but it DID take longer to switch to a VT, when I tried that.  In

237

> fact, at one point, the switch to VT functionality broke, but someone

238

> mentioned it was broken in general at that point for certain drivers,

239

> anyway, so I'm not sure my optimizations had anything to do with it.

240

>

241

> >> Of course, this is theory, and the practical case can and will differ

242

> >> depending on the instructions actually being compiled.  In particular,

243

> >> streaming media apps and media encoding/decoding are likely to still

244

> >> benefit from the traditional loop elimination style optimizations,

245

> >> because they run thru so much data already, that cache is routinely

246

> >> trashed anyway, regardless of the size of your instructions.  As well,

247

> >> that type of application tends to have a LOT of looping instructions to

248

> >> optimize!

249

> >>

250

> >> By contrast, something like the kernel will benefit more than usual

251

> >> from size optimization.  First, it's always memory locked and as such

252

> >> can't be swapped, and even "slow" main memory is still **MANY**

253

> >> **MANY** times faster than swap, so a smaller kernel means more other

254

> >> stuff fits into main memory with it, and isn't swapped as much. Second,

255

> >> parts of the

256

> >

257

> > Funny to hear this from somebody with 4 GB RAM in his system. I don't

258

> > know how bloated your kernel is, but even if -Os would reduce the size

259

> > of my kernel to **the half**, which is totally impossible, it wouldn't

260

> > be enough to load the mail I am just answering into RAM. So, basically,

261

> > this reasoning is just ridiculous.

262

>

263

> I won't argue with that.  BTW, still at a gig, much to my frustration!  I

264

> put off upgrading memory when I decided my disk was in danger of going bad

265

> and I ended up deciding to go 4-disk SATA based RAID.  Then I upgraded my

266

> stereo near Christmas...  Now the CC is almost paid off again, so I'm

267

> looking at that memory upgrade again.

268

>

269

> Much to my frustration, memory prices don't seem to be dropping much

270

> lately!

271

>

272

> > You are referring a lot to the gcc manpage, but obviously you missed

273

> > this part:

274

> >

275

> >        -fomit-frame-pointer

276

> >            Don't keep the frame pointer in a register for functions that

277

> >            don't need one.  This avoids the instructions to save, set up

278

> >            and restore frame pointers; it also makes an extra register

279

> >            available in many functions.  It also makes debugging

280

> >            impossible on some machines.

281

> >

282

> >            On some machines, such as the VAX, this flag has no effect,

283

> >            because the standard calling sequence automatically handles

284

> >            the frame pointer and nothing is saved by pretending it

285

> >            doesn't exist.  The machine-description macro

286

> >            "FRAME_POINTER_REQUIRED" controls whether a target machine

287

> >            supports this flag.

288

> >

289

> >            Enabled at levels -O, -O2, -O3, -Os.

290

> >

291

> > I have to say that I am a bit disappointed now. You seemed to be one of

292

> > those people who actually inform themselves before sticking new flags

293

> > into their CFLAGS.

294

>

295

> ??

296

>

297

> I'm not sure which way you mean this.  It was in my CFLAGS list, but I

298

> didn't discuss it as it's fairly common (from my observation, nearly as

299

> common as -pipe) and seems fairly non-controversial on Gentoo.  Did you

300

> miss it in my CFLAGS and are saying I should be using it, or did you see

301

> it and are saying its unnecessary and redundant because it's enabled by

302

> the -Os?

303

>

304

> If the latter, yes, but as mentioned above in the context of glibc, -Os is

305

> sometimes stripped.  In that case, the redundancy of having the basic

306

> -fomit-frame-pointer is useful, unless it's also stripped, but as I said,

307

> it seems much less controversial than some flags and is often

308

> specifically allowed where most are stripped.

309

>

310

> Or, are you saying I should avoid it due to the debugging implications?  I

311

> don't quite get it.

312

>

313

> >> !!! Relying on the shell to locate gcc, this may break !!! DISTCC,

314

> >> installing gcc-config and setting your current gcc !!! profile will fix

315

> >> this

316

> >>

317

> >> Another warning, likewise to stderr and thus not in the eis output.

318

> >> This one is due to the fact that eselect, the eventual systemwide

319

> >> replacement for gcc-config and a number of other commands, uses a

320

> >> different method to set the compiler than gcc-config did, and portage

321

> >> hasn't been adjusted to full compatibility just yet.  Portage finds the

322

> >> proper gcc just fine for itself, but there'd be problems if distcc was

323

> >> involved, thus the warning.

324

> >

325

> > Didn't know about this. Have you filed a bug yet on the topic? Or is

326

> > there already one?

327

>

328

> There is one.  I don't recall if I filed it or if it was already there,

329

> but both JH and the portage folks know about the issue.  IIRC, the portage

330

> folks decided it was their side that needed changed, but that required

331

> changes to the distcc package, and I don't know how that has gone since I

332

> don't use distcc, except that I was slightly surprised to see the warning

333

> in portage 2.1 still.

334

>

335

> >> MAKEOPTS="-j4"

336

> >>

337

> >> The four jobs is nice for a dual-CPU system -- when it works.

338

> >> Unfortunately, the unpack and configure steps are serialized, so the

339

> >> jobs option does little good, there.  To make most efficient use of the

340

> >> available cycles when I have a lot to merge, therefore, I'll run as

341

> >> many as five merges in parallel.  I do this quite regularly with KDE

342

> >> upgrades like the one to 3.5.1, where I use the split KDE ebuilds and

343

> >> have something north of 100 packages to merge before KDE is fully

344

> >> upgraded.

345

> >

346

> > I really wonder how you would paralellize unpacking and configuring a

347

> > package.

348

>

349

> That's what was nice about configcache, which was supposed to be in the

350

> next portage, but I haven't seen or heard anything about it for awhile,

351

> and the next portage, 2.1, is what I'm using.  configcache seriously

352

> shortened that stage of the build, leaving more of it parallelized, but...

353

>

354

> I was using it for awhile, patching successive versions of portage, but it

355

> broke about the time sandbox split, the dev said he wasn't maintaining the

356

> old version since it was going in the new portage, and I tried updating

357

> the patch but eventually ran into what I think were unrelated issues but

358

> decided to drop that in one of my troubleshooting steps and never picked

359

> it up again.

360

>

361

> I'd certainly like to have it back again, tho.  If it's working in 2.1,

362

> I've not seen it documented or seen any hints in the emerge output, as

363

> were there before.  You seen or heard anything?

364

>

365

> BTW, what is your opinion on -ftracer?  Several devs I've noticed use it,

366

> but the manpage says it's not that useful without active profiling, which

367

> means compiling, profiling, and recompiling, AFAIK.  It's possible the

368

> devs running it do that, but I doubt it, and otherwise, I don't see that

369

> it should be that useful?  I don't know if you run it, but since I've got

370

> your attention, I thought I'd ask what you think about it.  Is there

371

> something of significance I'm missing, or are they, or are they actually

372

> doing that compile/profile/recompile thing?  It just doesn't make sense to

373

> me.  I've seen it in several user posted CFLAGS as well, but I'll bet a

374

> good portion of them are simply because they saw it in a dev's CFLAGS and

375

> decided it looked useful, not because they understand any implications

376

> stated in the manpage.  (Not that I always do either, but... <g>)

377

>

378

> --

379

> Duncan - List replies preferred.   No HTML msgs.

380

> "Every nonfree program has a lord, a master --

381

> and if you use the program, he is your master."  Richard Stallman in

382

> http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

383

--

384

gentoo-amd64@g.o mailing list

Subject	Author
Re: [gentoo-amd64] Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite	"Kevin F. Quinn (Gentoo)" <kevquinn@g.o>
[gentoo-amd64] Re: Re: Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite	Duncan <1i5t5.duncan@×××.net>

Gentoo Archives: gentoo-amd64

Replies