[gentoo-amd64] Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite - gentoo-amd64

From:	Duncan <1i5t5.duncan@×××.net>
To:	gentoo-amd64@l.g.o
Subject:	[gentoo-amd64] Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite
Date:	Fri, 03 Feb 2006 16:32:48
Message-Id:	`pan.2006.02.03.16.28.28.536378@cox.net`
In Reply to:	Re: [gentoo-amd64] Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite by Mike Owen

1

Mike Owen posted

2

<8f5ca2210602021712s53d33de5w6794fa384bbf93a5@××××××××××.com>, excerpted

3

below,  on Thu, 02 Feb 2006 17:12:04 -0800:

4

5

> On 2/2/06, Duncan <1i5t5.duncan@×××.net> wrote:

6

>>

7

>> http://members.cox.net/pu61ic.1inux.dunc4n/

8

>

9

> Nice. Now let us know your CFLAGS, and what toolchain versions you're

10

> running :D

11

12

You probably didn't notice, as I had it commented out on the main index

13

page as I don't have the page created to actually list them yet, but if

14

you viewed source, you'd have seen I have a techspecs page link commented

15

out, that'll get that sort of info, when/if I actually get it created.

16

17

However, since you asked, your answer, and a bit more, by way of

18

explanation...

19

20

I should really create a page listing all the little Gentoo admin scripts

21

I've come up with and how I use them.  I'm sure a few folks anyway would

22

likely find them useful.

23

24

The idea behind most of them is to create shortcuts to having to type in

25

long emerge lines, with all sorts of arbitrary command line parameters.

26

The majority of these fall into two categories, ea* and ep*, short for

27

emerge --ask <additional parameters> and emerge --pretend ... .  Thus, I

28

have epworld and eaworld, the pretend and ask versions of emerge -NuDv

29

world, epsys and easys, the same for system, eplog <package>, emerge

30

--pretend --log --verbose (package name to be added to the command line so

31

eplog gcc, for instance, to see the changes between my current and the new

32

version of gcc), eptree <package>, to use the tree output, etc.

33

34

One thing I've found is that I'll often epworld or eptreeworld, then

35

emerge the individual packages, rather than use eaworld to do it.  That

36

way, I can do them in the order I want or do several at a time if I want

37

to make use of both CPUs.  Because I always use --deep, as I want to keep

38

my dependencies updated as well, I'm very often merging specific

39

dependencies.  There's a small problem with that, however --oneshot, which

40

I'll always want to use with dependencies to help keep my world file

41

uncluttered, has no short form, but I use it as the default!  OTOH, the

42

normal portage mode of adding stuff listed on the command line to the

43

world file, I don't want very often, as most of the time I'm simply

44

updating what I have, so it's all in the world file if it needs to be

45

there already anyway.  Not a problem! All my regular ea* scriptlets use

46

--oneshot, so it /is/ my default.  If I *AM* merging something new that I

47

want added to my world file, I have another family of ea* scriptlets that

48

do that -- all ending in "2", as in, "NOT --oneshot".  Thus, I have a

49

family of ea*2 scriptlets.

50

51

The regulars here already know one of my favorite portage features is

52

FEATURES=buildpkg, which I have set in make.conf.  That of course gives me

53

a collection of binary versions  of packages I've already emerged, so I

54

can quickly revert to an old version for testing something, if I want,

55

then remerge the new version once I've tested the old version to see if it

56

has the same bug I'm working on or not.  To aid in this, I have a

57

collection of eppak and eapak scriptlets.  Again, the portage default of

58

--usepackage (-k) doesn't fit my default needs, as  if I'm using a binpkg,

59

I usually want to ONLY use a binpkg, NOT merge from source if the package

60

isn't available.  That happens to be -K in short-form. However, it's my

61

default, so eapak invokes the -K version.  I therefore have eapaK to

62

invoke the -k version if I don't really care whether it goes from binpkg

63

or source.

64

65

Of course, there are various permutations of the above as well, so I have

66

eapak2 and eapaK2, as well as eapak and eapaK.  For the ep* versions, of

67

course the --oneshot doesn't make a difference, so I only have eppak and

68

eppaK, no eppa?2 scriptlets.

69

70

...  Deep breath... <g>

71

72

All that as a preliminary explanation to this:  Along with the above, I

73

have a set of efetch functions, that invoke the -f form, so just do the

74

fetch, not the actual compile and merge, and esyn (there's already an

75

esync function in something or other I have merged so I just call it

76

esyn), which does emerge sync, then updates the esearch db, then

77

automatically fetches all the packages that an eaworld would want to

78

update, so they are ready for me to merge at my leisure.

79

80

Likewise, and the real reason for this whole explanation, I /had/ an

81

"einfo" scriptlet that simply ran "emerge info".  This can be very handy

82

to run, if like me, you have several slotted versions of gcc merged, and

83

you sometimes forget which one you have eselected or gcc-configed as the

84

one portage will use.  Likewise, it's useful for checking on CFLAGS (or

85

CXXFLAGS OR LDFLAGS or...), if you modified them from the normal ones

86

because a particular package wasn't cooperating, and you want to see if

87

you remembered to switch them back or not.

88

89

However, I ran into a problem.  The output of einfo was too long to

90

quickly find the most useful info -- the stuff I most often change and

91

therefore most often am looking for.

92

93

No sweat!  I shortened my original "einfo" to simply "ei", and added a

94

second script, "eis" (for einfo short), that simply piped the output of

95

the usual emerge info into a grep that only returned the lines I most

96

often need -- the big title one with gcc and similar info, CFLAGS,

97

CXXFLAGS, LDFLAGS, and FEATURES.  USE would also be useful, but it's too

98

long even by itself to be searched at a glance, so if I want it, I simply

99

run ei and look for what I want in the longer output.

100

101

...  Another deep breath... <g>

102

103

OK, with that as a preliminary, you should be able to understand the

104

following:

105

106

$eis

107

108

Portage 2.1_pre4-r1 (default-linux/amd64/2006.0, gcc-4.1.0-beta20060127,

109

glibc-2.3.6-r2, 2.6.15 x86_64)

110

111

CFLAGS="-march=k8 -Os -pipe -fomit-frame-pointer -frename-registers

112

-funit-at-a-time -fweb -freorder-blocks-and-partition

113

-fmerge-all-constants"

114

115

CXXFLAGS="-march=k8 -Os -pipe -fomit-frame-pointer -frename-registers

116

-funit-at-a-time -fweb -freorder-blocks-and-partition

117

-fmerge-all-constants"

118

119

FEATURES="autoconfig buildpkg candy ccache confcache distlocks

120

multilib-strict parallel-fetch sandbox sfperms strict userfetch"

121

122

LDFLAGS="-Wl,-z,now"

123

124

MAKEOPTS="-j4"

125

126

To make sense of that...

127

128

* The portage and glibc versions are ~amd64, as set in make.conf for the

129

system in general.

130

131

* CFLAGS:  

132

133

I choose -Os, optimize for size, because a modern CPU and the various

134

cache levels are FAR faster than main memory.  This difference is

135

frequently severe enough that it's actually more efficient to optimize for

136

size than for CPU performance, because the result is smaller code that

137

maintains cache locality (stays in fast cache) far better, and the CPU

138

saves more time that it would otherwise be spending idle, waiting for data

139

to come in from slower more distant memory, than the actual cost of the

140

loss of cycle efficiency that's often the tradeoff for small code.

141

142

-O3, and to a lessor extent, -O2, do things like turn a loop that executes

143

a fixed number of say 3 times, into "faster" code, by avoiding the jump at

144

the end of each loop back to the top of the loop by writing it out as

145

inline code, copying the loop instructions three times.  This process

146

would in our example of a 3-time fixed execution loop, save the expensive

147

jump back to the top of the loop two times -- but at the SAME time would

148

expand that section of code to three times its looped size.

149

150

Back when memory operated at or near the speed of the CPU, avoiding the

151

loop, even at the expense of three-times the code, was  often faster. 

152

Today, where CPUs do several calculations in the time it takes to fetch

153

data from main memory, it's generally faster to go for the smaller code,

154

as it will be far more likely to still be in fast cache, avoiding that

155

long wait for main memory, even if it /does/ mean wasting a couple

156

additional cycles doing the expensive jump back to the top of the loop.

157

158

Of course, this is theory, and the practical case can and will differ

159

depending on the instructions actually being compiled.  In particular,

160

streaming media apps and media encoding/decoding are likely to still

161

benefit from the traditional loop elimination style optimizations, because

162

they run thru so much data already, that cache is routinely trashed

163

anyway, regardless of the size of your instructions.  As well, that type

164

of application tends to have a LOT of looping instructions to optimize!

165

166

By contrast, something like the kernel will benefit more than usual from

167

size optimization.  First, it's always memory locked and as such

168

can't be swapped, and even "slow" main memory is still **MANY** **MANY**

169

times faster than swap, so a smaller kernel means more other stuff fits

170

into main memory with it, and isn't swapped as much.  Second, parts of the

171

kernel such as task scheduling are executed VERY often, either because

172

they are frequently executed by most processes, or because they /control/

173

those processes.  The smaller these are, the more likely they are to still

174

be in cache when next used.  Likewise, the smaller they are, the less

175

potentially still useful other data gets flushed out of cache to make room

176

for the kernel code executing at the moment.  Third, while there's a lot

177

of kernel code that will loop, and a lot that's essentially streaming, the

178

kernel as a whole is a pretty good mix of code and thus won't benefit as

179

much from loop optimizations and the like, as compared to special purpose

180

code like the media codec and streaming applications above.

181

182

The differences are marked enough and now demonstrated enough that a

183

kernel config option to optimize for size was added I believe about a year

184

ago.  Evidently, that lead to even MORE demonstration, as the option was 

185

originally in the obscure embedded optimizations corner of the config,

186

where few would notice or use it, and they upgraded it into a main option.

187

In fact, where a year or two ago, the option didn't even exist, now I

188

believe it defaults to yes/on/do-optimize-for-size (altho it's possible

189

I'm incorrect on the last and it's not yet the default).

190

191

According to the gcc manpage, -frename-registers causes gcc to attempt to

192

make use of registers left over after normal register allocation.  This is

193

particularly beneficial on archs that have many registers (keeping in

194

mind that "registers" are what amounts to L0 cache, the fastest possible

195

memory because the CPU accesses registers directly and they operate at

196

full CPU speed.  Unfortunately, registers are also very limited, making

197

them an EXCEEDINGLY valuable resource! Note that while x86-32 is noted

198

for its relative /lack/ of registers, AMD basically doubled the number of

199

registers available to 64-bit code in its x86-64 aka AMD64 spec.  Thus,

200

while this option wouldn't be of particular benefit on x86, on amd64, it

201

can, depending on the code of course, provide some rather serious

202

optimization!

203

204

-fweb is a register use optimizer function as well.  It tells gcc to

205

create a /web/ of dependencies and assign each individual dependency web

206

to its own pseudo-register.  Thus, when it comes time for gcc to allocate

207

registers, it already has a list of the best candidates lined up and ready

208

to go.  Combined with -frename register to tell gcc to efficiently make

209

use of any registers left over after the the first pass, and due to the

210

number of registers available in 64-bit mode on our arch, this can allow

211

some seriously powerful optimizations.  Still, a couple of things to note

212

about it.  One, -fweb (and -frename-registers as well) can cause data to

213

move out of its "home" register, which seriously complicates debugging, if

214

you are a programmer or power-user enough to worry about such things. 

215

Two, the rewrite for gcc 4.0 significantly modified the functionality of

216

-fweb, and it wasn't recommended for 4.0 as it didn't yet work as well as

217

expected or as it did with gcc 3.x.  For gcc 4.1, -fweb is apparently back

218

to its traditional strength.  Those Gentoo users having gcc 3.4, 4.0, and

219

4.1, all three in separate slots, will want to note this as they change

220

gcc-configuratiions, and modify it accordingly.  Yes, this *IS* one of the

221

reasons my CFLAGS change so frequently!

222

223

-funit-at-a-time tells gcc to consider a full logical unit, perhaps

224

consisting of several source files rather than just one, as a whole, when

225

it does its compiling.  Of course, this allows gcc to make

226

optimizations it couldn't see if it wasn't looking at the larger picture

227

as a whole, but it requires rather more memory, to hold the entire unit

228

so it can consider it at once. This is a fairly new flag, introduced with

229

gcc 3.3 IIRC.  While the idea is simple enough and shouldn't lead to any

230

bugs on its own, there WERE a number of initially never encountered bugs

231

in various code that this flag exposed, when GCC made optimizations on the

232

entire unit that it wouldn't otherwise make, thereby triggering bugs that

233

had never been triggered before.  I /believe/ this was the root reason why

234

the Gentoo amd64 technotes originally discouraged use of -Os, back with

235

the first introduction of this flag in gcc 3.2 hammer (amd64) edition, as

236

-funit-at-a-time was activated by -Os at that time, and -Os was known to

237

produce bad code at the time, on amd64, with packages like portions of

238

KDE.  The gcc 4.1.0 manpage now says it's enabled by default at -O2 and

239

-O3, but doesn't mention -Os.  Whether that's an omission, or whether they

240

decided it shouldn't be enabled by -Os for some reason, I'm not sure, but

241

I use them both to be sure and haven't had any issues I can trace to this

242

(not even back when the technotes recommended against -Os, and said KDE

243

was supposed to have trouble with it -- maybe it was parts of KDE I never

244

merged, or maybe I was just lucky, but I've simply never had an issue with

245

it).

246

247

-freorder-blocks-and-partition is new for gcc 4.0, I believe, alto I

248

didn't discover it until I was reading the 4.1-beta manpage.  I KNOW gcc

249

3.4.4 fails out with it, saying unrecognized flag or some such, so it's

250

another of those flags that cause my CFLAGS to be constantly changing, as

251

I switch between gcc versions.  This flag won't work under all conditions,

252

according to the manpage, so is automatically disabled in the presence of

253

exception handling, and a few other situations named in the manpage.  It

254

causes a lot of warnings too, to the effect that it's being disabled due

255

to X reason.  There's a similar -freorder-blocks flag, which optimizes by

256

reordering blocks in a function to "reduce number of taken branches and

257

improve code locality."  In English, what that means is that it breaks

258

caching less often.  Again, caching is *EXTREMELY* performance critical,

259

so anything that breaks it less often is CERTAINLY welcome!  The

260

-and-partition increases the effect, by separating the code into

261

frequently used and less frequently used partitions.  This keeps the most

262

frequently used code all together, therefore keeping it in cache far more

263

efficiently, since the less used code won't be constantly pulled in,

264

forcing out frequently used code in the process.

265

266

Hmm... As I'm writing and thinking about this, the probability that

267

sticking the regular -freorder-blocks option in CFLAGS as well would be a

268

wise thing, occurs to me.  The non-partition version isn't as efficient as

269

the partition version, and would be redundant if the partitioned version

270

is in effect.  However, the non-partitioned version doesn't have the same

271

sorts of no-exceptions-handler and similar restrictions, so having it in

272

the list, first, so the partitioned version overrides it where it can be

273

used, should be a good idea.  That way, where the partitioned version can

274

be used, it will be, but where it can't, gcc will still use the

275

non-partitioned version of the option, so I'll still get /some/ of the

276

optimizations!  I (re)compiled major portions of xorg (modular), qt, and

277

the new kde 3.5.1 with the partitioned option, however, and it works, and

278

I haven't tested having both options in there yet, so I'm not sure it'll

279

work as the theory suggests it should, so some caution might be advised.

280

281

-fmerge-all-constants COULD be dangerous with SOME code, as it breaks part

282

of the C/C++ specification.  However, it should be fine for most code

283

written to be compiled with gcc, and I've seen no problems /yet/ tho both

284

this and the reorder-and-partition flag above are fairly new to my CFLAGS,

285

so haven't been as extensively personally tested as the others have been. 

286

If something seems to be breaking when this is in your CFLAGS, certainly

287

it's the first thing I'd try pulling out.  What it actually does is merge

288

all constants with the same value into the same one.  gcc has a weaker

289

-fmerge-constants version that's enabled with any -O option at all (thus

290

at -O, -O2, -O3, AND -Os), that merges all declared constants of the same

291

value, which is safe and doesn't conflict with the C/C++ spec.  What the

292

/all/ specifier in there does, however, is cause gcc to merge declared

293

variables where the value actually never changes, so they are in effect

294

constants, altho they are declared as variables, with other constants of

295

the same value.  This /should/ be safe, /provided/ gcc isn't failing to

296

detect a variable chance somewhere, but it conflicts with the C/C++ spec,

297

according to the gcc manpage, and thus /could/ cause issues, if the

298

developer pulls certain tricks that gcc wouldn't detect, or possibly more

299

likely, if used with code compiled by a different compiler (say

300

binary-only applications you may run, which may not have been compiled

301

with gcc).  There are two reasons why I choose to use it despite the

302

possible risks.  One, I want /small/ code, again, because small code fits

303

in that all-important cache better and therefore runs faster, and

304

obviously, two or more merged constants aren't going to take the space

305

they would if gcc stored them separately.  Two, the risks aren't as bad if

306

you aren't running non-gcc compiled code anyway, and since I'm a strong

307

believer in Software Libre, if it's binary-only, there's very little

308

chance I'll want or risk it on my box, and everything I do run is gcc

309

compiled anyway, so should be generally safe.  Still, I know there may be

310

instances where I'll have to recompile with the flag turned off, and am

311

prepared to deal with them when they happen, or I'd not have the flag in

312

my CFLAGS.

313

314

315

And, here's some selected output from ei, interspersed with explanations,

316

since I'm editing the output anyway:

317

318

$ei

319

!!! Failed to change nice value to '-2' 

320

!!! [Errno 13] Permission denied

321

322

This is stderr output.  It's not in the eis output above because I

323

redirect stderr to /dev/null for it, as I know the reason for the error

324

and am trying to be brief.

325

326

The warning is because I'm using PORTAGE_NICENESS=-2 in make.conf.  It has

327

a negative nice set there to encourage portage to make fuller use of the

328

dual CPUs under-X/from-a-konsole-session, as X and the kernel do some

329

dynamic scheduling magic to keep X more responsive without having to up

330

/its/ priority.  The practical effect of that "magic" is to lower the

331

priorities of everything besides X slightly, when X is running.  This

332

/does/ have the intended effect of keeping X more responsive, but the cost

333

as observed here is that emerges take longer than they should when X is

334

running, because the scheduler is leaving a bit of extra idle CPU time to

335

keep X responsive.  In many cases, I'd rather be using maximum CPU and get

336

the merges done faster, even if X drags a bit in the mean time, and the

337

slightly negative niceness for portage accomplishes exactly that.

338

339

It's reporting a warning (to stderr) here, as I ran the command as a

340

regular non-root user, and non-root can't set negative priorities for

341

obvious system security reasons.  I get the same warning with my ep*

342

commands, which I normally run as a regular user, as well.  The ea*

343

commands which actually do the merging get run as root, naturally, so the

344

niceness /can/ be set negative when it counts, during a real emerge.

345

346

So... nothing of any real matter, then.

347

348

349

!!! Relying on the shell to locate gcc, this may break

350

!!! DISTCC, installing gcc-config and setting your current gcc 

351

!!! profile will fix this

352

353

Another warning, likewise to stderr and thus not in the eis output.  This

354

one is due to the fact that eselect, the eventual systemwide replacement

355

for gcc-config and a number of other commands, uses a different method to

356

set the compiler than gcc-config did, and portage hasn't been adjusted to

357

full compatibility just yet.  Portage finds the proper gcc just fine for

358

itself, but there'd be problems if distcc was involved, thus the warning.

359

360

Again, I'm aware of the situation and the cause, but don't use distcc, so

361

it's nothing I have to worry about, and I can safely ignore the warning.

362

363

I kept the warnings here, as I find them and the explanation behind them

364

interesting elements of my Gentoo environment, thus worth posting, for

365

others who seem interested in my Gentoo environment as well.  If nothing

366

else, the explanations should help some in my audience understand that bit

367

more about how their system operates, even if they don't get these

368

warnings.

369

370

371

Portage 2.1_pre4-r1 (default-linux/amd64/2006.0, gcc-4.1.0-beta20060127,

372

glibc-2.3.6-r2, 2.6.15 x86_64)

373

=================================================================

374

System uname: 2.6.15 x86_64 AMD Opteron(tm) Processor 242

375

Gentoo Base System version 1.12.0_pre15

376

377

Those of you running stable amd64, but wondering where baselayout is for

378

unstable, there you have it!

379

380

ccache version 2.4 [enabled]

381

dev-lang/python:   2.4.2

382

sys-apps/sandbox:    1.2.17

383

sys-devel/autoconf:  2.13, 2.59-r7

384

sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1

385

sys-devel/binutils:  2.16.91.0.1

386

sys-devel/libtool:   1.5.22

387

virtual/os-headers:  2.6.11-r3

388

389

ACCEPT_KEYWORDS="amd64 ~amd64"

390

391

Same for the above portions of my toolchain.  AFAIR, it's all ~amd64,

392

altho I was running a still-masked binutils for awhile shortly after

393

gcc-4.0 was released (still-masked on Gentoo as well), as it required the

394

newer binutils.

395

396

LANG="en_US"

397

LDFLAGS="-Wl,-z,now"

398

399

Some of you may have noticed the occasional Portage warning about a SETUID

400

executables using lazy bindings, and the potential security issue that

401

causes. This setting for LDFLAGS forces early bindings with all

402

dynamically linked libraries.  Normally it'd only be necessary or

403

recommended for SETUID executables, and set in the ebuild where it's safe

404

to do so, but I use it by default, for several reasons.  The effect is

405

that a program takes a bit longer to load initially, but won't have to

406

pause to resolve late bindings as they are needed.  You're trading waiting

407

at executable initialization for waiting at some other point.  With a gig

408

of memory, I find most stuff I run more than once is at least partially

409

still in cache on the second and later launches, and with my system, I

410

don't normally find the initial wait irritating, and sometimes find a

411

pause after I'm working with a program especially so, so I prefer to have

412

everything resolved and loaded at executable launch.  Additionally, with

413

lazy bindings, I've had programs start just fine, then fail later when

414

they need to resolve some function that for some reason won't resolve in

415

whatever library it's supposed to be coming from.  I don't like have the

416

thing fail and interrupt me in the middle of a task, and find it far less

417

frustrating, if it's going to fail when it tries to load something, to

418

have it do so at launch.  Because early bindings forces resolution of

419

functions at launch, if it's going to fail loading one, it'll fail at

420

launch, rather than after I've started working with the program.  That's

421

/exactly/ how I want it, so that's why I run the above LDFLAGS setting. 

422

It's nice not to have to worry about the security issue, but SETUID type

423

security isn't as critical on my single-human-user system, where that

424

single-user-is me and  I already have root when I want it anyway, as it'd

425

be in a multi-user system, particularly a public server, so the other

426

reasons are more important than security, for me, on this.  They just

427

happen to coincide, so I'm a happy camper. =8^)

428

429

The caveat with these LDFLAGS, however, is the rare case where there's a

430

circular functional dependency that's normally self-resolving,   Modular

431

xorg triggers one such case, where the monolithic xorg didn't.  There are

432

three individual ebuilds related to modular xorg that I have to remove

433

these LDFLAGS for or they won't work.  xorg-server is one. 

434

xf86-vidio-ati, my video driver, is another.  libdri was the third, IIRC.

435

There's a specific order they have to be compiled in, as well. If they are

436

compiled with this enabled, they, and consequently X, refuses to load (tho

437

X will load without DRI, if that's the only one, it'll just protest in the

438

log and DRI and glx aren't available).  Evidently there's a non-critical

439

fourth module somewhere, that still won't load properly due to an

440

unresolved symbol, that I need to track down and remerge without these

441

LDFLAGS, and that's what's keeping GLX from loading on my current system,

442

as mentioned in an earlier post.

443

444

LINGUAS="en"

445

MAKEOPTS="-j4"

446

447

The four jobs is nice for a dual-CPU system -- when it works. 

448

Unfortunately, the unpack and configure steps are serialized, so the jobs

449

option does little good, there.  To make most efficient use of the

450

available cycles when I have a lot to merge, therefore, I'll run as many

451

as five merges in parallel.  I do this quite regularly with KDE upgrades

452

like the one to 3.5.1, where I use the split KDE ebuilds and have

453

something north of 100 packages to merge before KDE is fully upgraded.

454

455

I mentioned above that I often run eptree, then ea individual packages

456

from the list.  This is how I accomplish the five merges in parallel. 

457

I'll take a look at the tree output to check the dependencies, and merge

458

the packages first that have several dependencies, but only where those

459

dependencies aren't stepping on each other, thus keeping the parallel

460

emerges from interfering with each other, because each one is doing its

461

own dependencies, that aren't dependencies of any of the others.  After I

462

get as many of those going as I can, I'll start listing 3-5 individual

463

packages without deps on the same ea command line.  By the time I've

464

gotten the fifth one started, one of the other sessions has usually

465

finished or is close to it, so I can start it merging the next set of

466

packages.  With five merge sessions in parallel, I'm normally running an

467

average load of 5 to 9, meaning that many applications are ready for CPU

468

scheduling time at any instant, on average.  If the load drops below four,

469

there's proobably idle CPU cycles being wasted that could otherwise be

470

compiling stuff, as each CPU needs at least one load-point to stay busy,

471

plus usually can schedule a second one for some cycles as well, while the

472

first is waiting for the hard drive or whatever.  

473

474

(Note that I'm running a four-drive RAID, RAID-6, so two-way striped, for

475

my main system, Raid-0, so 4-way striped, for $PORTAGE_TMPDIR, so hard

476

drive latency isn't /nearly/ as high as it would be on a single-hard-drive

477

system.  Of course, running five merges in parallel /does/ increase disk

478

latency some as well, but it /does/ seem to keep my load-average in the

479

target zone and my idle cycles to a minimum, during the merge period. 

480

Also note that I've only recently added the PORTAGE_NICENESS value above,

481

and haven't gotten it fully tweaked to the best balance between

482

interactivity and emerge speed just yet, but from observations so far,

483

with the niceness value set, I'll be able to keep the system busy with

484

"only" 3-4 parallel merges, rather than the 5 I had been having to run to

485

keep the system most efficiently occupied when I had a lot to merge.)

486

487

PKGDIR="/pkg"

488

PORTAGE_TMPDIR="/tmp"

489

PORTDIR="/p"

490

PORTDIR_OVERLAY="/l/p"

491

492

Here you can see some of my path customization.

493

494

USE="amd64 7zip X a52

495

aac acpi alsa apm arts asf audiofile avi bash-completion berkdb

496

bitmap-fonts bzip2 caps cdparanoia cdr crypt css cups curl dga divx4linux

497

dlloader dri dts dv dvd dvdr dvdread eds emboss encode extrafilters fam

498

fame ffmpeg flac font-server foomaticdb gdbm gif glibc-omitfp gpm

499

gstreamer gtk2 idn imagemagick imlib ithreads jp2 jpeg jpeg2k kde

500

kdeenablefinal lcms libwww linuxthreads-tls lm_sensors logitech-mouse

501

logrotate lzo lzw lzw-tiff mad maildir mikmod mjpeg mng motif mozilla mp3

502

mpeg ncurses network no-old-linux nolvm1 nomirrors nptl nptlonly offensive

503

ogg opengl oss pam pcre pdflib perl pic png ppds python qt quicktime

504

radeon readline scanner slang speex spell ssl tcltk theora threads tiff

505

truetype truetype-fonts type1 type1-fonts usb userlocales vcd vorbis

506

xcomposite xine xinerama xml2 xmms xosd xpm xrandr xv xvid yv12 zlib

507

elibc_glibc input_devices_keyboard input_devices_mouse kernel_linux

508

linguas_en userland_GNU video_cards_ati" 

509

510

My USE flags, FWTAR (for what they are worth).  Of particular interest are

511

the input_devices_mouse and keyboard, and video_cards_ati.  These come

512

from variables (INPUT_DEVICES and VIDEO_CARDS) set in make.conf, and used

513

in the new xorg-modular ebuilds.  These and the others listed after zlib

514

are referred to by Gentoo devs as USE_EXPAND.  Effectively, they are USE

515

flags in the form of variables, setup that way because there are rather

516

many possible values for those variables, too many to work as USE flags. 

517

The LINGUAS and LANG USE_EXPAND variables are prime examples.  Consider

518

how many different languages there are and that were used and documented

519

as regular USE flags, it would have to be in use.local.desc, because few

520

supporting packages would offer the same choices, so each would have to be

521

listed separately for each package.  Talk about the number of USE flags

522

quickly getting out of control!

523

524

Unset:  ASFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, LC_ALL

525

526

OK, some loose ends to wrapup, and I'm done.

527

528

re: gcc versions:  The plan is for gcc-4.0 to go ~arch fairly soon, now. 

529

The devs are actively asking for bug reports involving it, now, so as many

530

as possible can be resolved before it goes ~arch.  (Formerly, they were

531

recommending that bugs be filed upstream, and not with Gentoo unless there

532

was a patch attached, as it was considered entirely unsupported, just

533

there for those that wanted it anyway.)  At this point, nearly everything

534

should compile just fine with 4.0.

535

536

That said, Gentoo has slotted gcc for a reason.  It's possible to have

537

multiple minor versions (3.3, 3.4, 4.0, 4.1) merged at the same time. 

538

With USE=multislot, that's actually microversion (4.0.0, 4.0.1, 4.0.2...).

539

Using either gcc-config or eselect compiler, and discounting any CFLAG

540

switching you may have to do, it's a simple matter to switch between

541

merged versions.  This made it easy to experiment with gcc-4.0 even tho

542

Gentoo wasn't supporting it and certain packages wouldn't compile with

543

4.x, because it was always possible to switch to a 3.x version if

544

necessary, and compile the package there.  I did this quite regularly,

545

using gcc-4.0 as my normal version, but reverting for individual packages

546

as necessary, when they wouldn't compile with 4.0.

547

548

The same now applies to the 4.1.0-beta-snapshot series.  Other than the

549

compile time necessary to compile a new gcc when the snapshot comes out

550

each week, it's easy to run the 4.1-beta as the main system compiler for

551

as wide testing as possible, while reverting to 4.0 or 3.4 (I don't have a

552

3.3 slot merged) if needed.

553

554

re: the performance improvements I saw that started this whole thing: 

555

These trace to several things, I believe.  #1, with gcc-4.0, there's now

556

support for -fvisibility -- setting certain functions as exported and

557

visible externally, others not.  That can easily cut exported symbols by a

558

factor of 10.  Exported symbols of course affect dynamic load-time, which

559

of course gets magnified dramatically by my LDFLAGS early binding

560

settings.  When I first compiled KDE with that (there were several

561

missteps early on in terms of KDE and Gentoo's support, but that aside),

562

KDE appload times went down VERY NOTICEABLY!  Again, due to my LDFLAGS,

563

the effect was multiplied dramatically, but the effect is VERY real!

564

565

Of course, that's mainly load-time performance.  The run-time performance

566

that we are actually talking here has other explanations.  A big one is

567

that gcc-4 was a HUGE rewrite, with a BIG potential to DRAMATICALLY

568

improve gcc's performance.  With 4.0, the theory is there, but in

569

practice, it wasn't all that optimized just yet.  In some ways it reverted

570

behavior below that of the fairly mature 3.x series, altho the rewrite

571

made things much simpler and less prone to error given its maturity.  4.1,

572

however, is the first 4.x release to REALLY be hitting the potential of

573

the 4.x series, and it appears the difference is very noticeable.  Of

574

course, there's a reason 4.1.0 is still in beta upstream and not supported

575

by Gentoo either, as there are still known regressions.  However, where it

576

works, which it seems to do /most/ of the time, it **REALLY** works, or at

577

least that's been my observation.  3.3 was a MAJOR improvement in gcc for

578

amd64 users, because it was the first version where amd64 wasn't simply an

579

add-on hack, as it had been with 3.2.  The 3.4 upgrade was minor in

580

comparison, and 4.0 while it's going ~arch shortly, and sets the stage for

581

a lot of future improvement, will be pretty minor in terms of actual

582

improved performance as well.  4.1, however, when it is finally fully

583

released, has the potential to be as big an improvement as 3.3 was -- that

584

is, a HUGE one.  I'm certainly looking forward to it, and meanwhile,

585

running the snapshots, because Gentoo makes it easy to do so while

586

maintaining the ability to switch very simply between multiiple versions

587

on the system.

588

589

Both -freorder-blocks-and-partition and -fmerge-all-constants are new to

590

me within a few days, now, and new to me with kde 3.5.1.  Normally,

591

individual flags won't make /that/ much of a difference, but it's possible

592

I hit it lucky, with these.  Actually, because they both match very well

593

with and reinforce my strategy of targeting size, it's possible I'm only

594

now unlocking the real potential behind size optimization.  -- I **KNOW**

595

there's a **HUGE** difference in sizes between resulting file-sizes.  I

596

compared 4.0.2 and 4.1.0-beta-snapshot file sizes for several modular-X

597

files in the course of researching the missing symbols problem, and the

598

difference was often a shrinkage of near 33 percent with 4.1 and my

599

current CFLAGS as opposed to 4.0.1 without the new ones.  Going the other

600

way, that's a 50% larger file with 4.0.2 as compared to 4.1, 100KB vs

601

150KB, by way of example.  That's a *HUGE* difference, one big enough to

602

initially think I'd found the reason for the missing symbols right there,

603

as the new files were simply too much smaller to look workable!  Still, I

604

traced the problem too LDFLAGS, so that wasn't it, and the files DO work,

605

confirming things.  I'm guessing -fmerge-all-constants plays a significant

606

part in that.  In any case, with that difference in size, and knowing how

607

/much/ cache hit vs. miss affects performance, it's quite possible the

608

size is the big performance factor.  Of course, even if that's so, I'm not

609

sure whether it is the CFLAGS or the 4.0 vs 4.1 that should get the credit.

610

611

In any case, I'm a happy camper right now! =8^)

612

613

614

--

615

Duncan - List replies preferred.   No HTML msgs.

616

"Every nonfree program has a lord, a master --

617

and if you use the program, he is your master."  Richard Stallman in

618

http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

619

620

621

--

622

gentoo-amd64@g.o mailing list

Subject	Author
Re: [gentoo-amd64] Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite	Taka John Brunkhorst <antiwmac@×××××.com>
Re: [gentoo-amd64] Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite	David Guerizec <david@××××××××.net>
Re: [gentoo-amd64] Re: Re: Wow! KDE 3.5.1 & Xorg 7.0 w/ Composite	Simon Stelling <blubb@g.o>

Gentoo Archives: gentoo-amd64

Replies