Re: [gentoo-user] PostgreSQL Vs MySQL @Uber - gentoo-user

From:	"J. Roeleveld" <joost@××××××××.org>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
Date:	Thu, 04 Aug 2016 10:10:00
Message-Id:	`4843400.bSB1tyOA95@andromeda`
In Reply to:	Re: [gentoo-user] PostgreSQL Vs MySQL @Uber by james

1

On Tuesday, August 02, 2016 12:16:32 AM james wrote:

2

> On 08/01/2016 11:49 AM, J. Roeleveld wrote:

3

> > On Monday, August 01, 2016 08:43:49 AM james wrote:

4

5

<snipped>

6

7

> >> Way back, when the earth was cooling and we all had dinosaurs for pets,

8

> >> some of us hacked on AT&T "3B2" unix systems. They were know for their

9

> >> 'roll back and recovery', triplicated (or more) transaction processes

10

> >> and 'voters' system to ferret out if a transaction was complete and

11

> >> correct. There was no ACID, the current 'gold standard' if you believe

12

> >> what Douglas and other write about concerning databases.

13

> >>

14

> >> In essence, (from crusted up memories) a basic (SS7) transaction related

15

> >> to the local telephone switch, was ran  on 3 machines. The results were

16

> >> compared. If they matched, the transaction went forward as valid. If 2/3

17

> >> matched,

18

> >

19

> > And what in the likely case when only 1 was correct?

20

>

21

> 1/3 was a failure, in fact X<1 could be defined (parameter setting) as a

22

> failure depending on the need.

23

24

I actually meant:

25

system A says true

26

system B and C say false

27

And "true" was correct.

28

(Being devil's advocate here)

29

30

> > Have you seen the movie "minority report"?

31

> > If yes, think back to why Tom Cruise was found 'guilty' when he wasn't and

32

> > how often this actually occured.

33

>

34

> Apples to Oranges. The (3) "pre-cons" were  not equal, ableit the voted,

35

> most of the time all three in agreement, but the dominant pre-con was

36

> always on the correct side of the issue. But that is make-believe.

37

38

Ofcourse, but it was the first example that I could come up with.

39

40

> Comparing results of codes run on 3 different processors or separate

41

> machines for agreement withing tolerances, is quite different.  The very

42

> essence of using voting where there a result less that 1.0 (that is

43

> n-1/n or n-x/n  was requisite on identical (replicated) processes all

44

> returning the same result ( expecting either a 0 or 1) returned. Results

45

> being logical or within rounding error of acceptance. Surely we need not

46

> split hairs. I was merely pointing out that the basis telecom systems

47

> formed the early and of widespread transaction processing industries and

48

> is the grand daddy of the ACID model/norms/constructs of modern

49

> transaction processing.

50

51

Hmm... I am having difficulty following how ACID and ensuring results are 

52

correct by double or triple checking are related.

53

54

> And Douglas is

55

56

Which Douglas are you referring to? The one in this thread didn't actually 

57

write the article he linked to. (Unless he has 2 different identities)

58

59

> dead wrong that those sorts of (ACID) transactions cannot be made to fly

60

> on clusters versus a single machine.

61

62

It depends on how you define a cluster. I tend to view a cluster as a single 

63

system that just happens to be spread over multiple physical boxes.

64

65

> For massively parallel needs,

66

> distributed processing rules, but it is not trivial

67

68

Agreed.

69

70

> and hence Uber, with

71

> mostly a bunch of kids, seems to be struggling and have made bad

72

> decisions.

73

74

Lets ignore if the decisions are good or bad. Only thing we can be certain of, 

75

without seeing their code and environment, is that it doesn't scale the way 

76

they need it to.

77

78

> Prolly, there mid managers and software architects are the

79

> weak link, or they did get expert guidance that was not inhouse, or poor

80

> decisions to get some code running quickly etc etc. I do not really care

81

> about UBER.

82

83

Neither do I. And decisions are usually made by a single architect or 

84

developer who starts the project. His/her manager usually just accepts his/her 

85

word on this and all future decisions. Up until the moment the manager gets 

86

replaced. Then it depends on how much the manager trusts the original 

87

developer.

88

Other developers (internal or external) usually have a hard time pointing out 

89

potential issues if the first developer doesn't agree and/or understand.

90

91

> My singular issue is Douglas was completely dead wrong

92

> (which nicely promoted himself as a postgress expert and his business

93

> credentals, and just barely saved his credibility by stating what UBER

94

> is now doing that is superior to a grade ACID, dB solution.

95

96

I didn't see that in the article. Must have missed that part.

97

98

> Another point, there are single big GPUs that can be run as thousands of

99

> different processors on either FPGA or GPU, granted using SIMD/MIMD

100

> style processors and thing like 'systolic algorithms' but that sort of

101

> this is out of scope here. (Vulcan might change that, in an open source

102

> kind of way, maybe). Furthermore, GPU resources combined with DDR-5 can

103

> blur the line and may actually be more cost effective for many forms of

104

> transaction processing, but clusters, in their current forms are very

105

> much general purpose machines.

106

107

I don't really agree here. For most software, having a really fast CPU helps. 

108

Having a lot of mediocre CPUs means the vast majority isn't doing anything 

109

useful.

110

Software running on clusters needs to be written with massive parallel 

111

processing in mind. Most developers don't understand this part.

112

113

> My point:: Douglas is dead wrong about

114

> ACID being dominated by Databases, for technical reasons, particularly

115

> for advanced teams of experts.

116

117

Wikipedia actually disagrees with you:

118

https://en.wikipedia.org/wiki/ACID

119

"In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is 

120

a set of properties of database transactions."

121

122

In other words, it's related to databases.

123

124

> Surely most MBA, HR and Finance types of

125

> idiots running these new startups would know know a coder from an

126

> architect, and that is very sad, because a good consultant could have

127

> probably designed several robust systems in a week or two. Grant few

128

> consultants has that sort of unbiased integrity, because we all have

129

> bills to pay and much is getting outsourced... Integrity has always been

130

> the rarest of qualities, particularly with humanoids......

131

132

The software Uber uses for their business had to be developed in-house as 

133

there, at least at the time, was nothing available they could use ready-made.

134

This usually means, they start with something simple they can get running 

135

quickly. If they want to fully design the whole system first, they would never 

136

get anything done.

137

138

Where these projects usually go wrong is that they wait too long with a good 

139

robust design, leading to a near impossibility of actually fixing all the, in 

140

hindsight obvious, design mistakes.

141

(NOTE: In hindsight, as most of the actual requirements would not be clear on 

142

day 1)

143

144

> >> and the switch was was configured, then the code would

145

> >> essentially 'vote' and majority ruled. This is what led to phone calls

146

> >> (switched phone calls) having variable delays, often in the order of

147

> >> seconds, mis-connections and other problems we all encountered during

148

> >> periods of excessive demand.

149

> >

150

> > Not sure if that was the cause in the past, but these days it can also

151

> > still take a few seconds before the other end rings. This is due to the

152

> > phone-system (all PBXs in the path) needing to setup the routing between

153

> > both end-points prior to the ring-tone actually starting.

154

> > When the system is busy, these lookups will take time and can even

155

> > time-out. (Try wishing everyone you know a happy new year using a wired

156

> > phone and you'll see what I mean. Mobile phones have a seperate problem

157

> > at that time)

158

> I did not intend to argue about the minutia of how a particular Baby

159

> Bell implemented their SS7 switching systems on unix systems. My point

160

> was the 'transaction processing' grew out the early telephone network,

161

> the way I remember it:: ymmv. Banks did dual entry accounting by hand

162

> and had clerks manually load data sets, then double entry accounting

163

> became automated and ACID style transaction processing added later. So

164

> what sql folks refer to as ACID properties, comes from the North

165

> American  switching heritage and eventually the worlds telecom networks,

166

> eons ago.

167

168

There is a similarity, but where ACID is a way of guaranteeing data integrity, 

169

a phone-switch does not need this. It simply needs to do the routing 

170

correctly.

171

Finance departments still do double-entry accounting and there still is a lot 

172

of manual writing/typing going on.

173

174

> >> That scenario was at the heart of how old, crappy AT&T unix (SVR?) could

175

> >> perform so well and therefore established the gold standard for RT

176

> >> transaction processing, aka the "five  9s" 99.999% of up-time (about 5

177

> >> minutes per year of downtime).

178

> >

179

> > "Unscheduled" downtime. Regular maintenance will require more than 5

180

> > minutes per year.

181

>

182

> Yes but the redundancy of 3b2 and other computers (Sequent, Sequoia and

183

> Tandem to name a few) meant that the "phone switching" fabric, at any

184

> given Central Office (the local building where the copper, Rf and fiber

185

> lines are muxed)(was, on average up and available 99.999% of the time.

186

> Ironically gentoo now has a 'sys/fabric group ::

187

> /usr/portage/sys-fabric, thanks to some forward thinking cluster folk.

188

>

189

> >> Sure this part is only related to

190

> >> transaction processing as there was much more to the "five 9s" legacy,

191

> >> but imho, that is the heart of what was the precursor to ACID property's

192

> >> now so greatly espoused in SQL codes that Douglas refers to.

193

> >>

194

> >> Do folks concur or disagree at this point?

195

> >

196

> > ACID is about data integrity. The "best 2 out of 3" voting was, in my

197

> > opinion, a work-around for unreliable hardware.

198

>

199

> Absolute true. But the fact that a High Reliability in computer

200

> processing (including the billing) could be replicated performed

201

> elsewhere and then 'recombined', proves that the need of any ACID

202

> function can be split up and ran on clusters and achieve ACID standards

203

> or even better. So my point, is that the cluster, if used wisely,

204

> will beat the 'dog shit' out of any Oracle fancy-pants database

205

> maneuvers. Evidence:: Snoracle is now snapping up billion dollar

206

> companies in the cluster space, cause their days of extortion are

207

> winding down rather rapidly, imho.

208

209

I disagree here. For some workloads, clusters are really great. But SQL 

210

databases will remain.

211

212

> Also, just because the kids are writing the codes, have not figured all

213

> of this out, does not mean that SQL and any abstraction is better that

214

> parallel processing. No way in hell. Cheaper and quicker to set up,

215

> surely true, but never superior to a well design properly coded

216

> distributed solution. That's my point.

217

218

Workloads where you can split the whole processing into small chunks where the 

219

same steps can be performed over a random sized chunk and merging at a later 

220

stage will lead to correct results. Then yes.

221

However, I deal with processes and reports where the amount of possible chunks 

222

is definitely limited and any theoretical benefit of splitting it over multiple 

223

nodes will be lost when having to build a very fancy and complex algorithm to 

224

merge all the seperate results back together.

225

This algorithm then also needs to be extensively tested analysed and 

226

understood by future developers. The additional cost involved will be 

227

prohibitive.

228

229

> Hence, Douglas is full of

230

> stuffing, except he alludes to the fact that UBER is doing something

231

> much better, beyond what Oracle has an interest in doing, at the last

232

> possible moment in his critique. This is back up by Oracles lethargic

233

> reaction to the data processing market just leaving Oracle to become the

234

> next IBM.... (ymmv).

235

236

I disagree, UBER is still using a relational database as the storage layer 

237

with something custom put over it to make it simpler for the developers.

238

Any abstraction layer will have a negative performance impact.

239

240

> > It is based on a clever idea, but when

241

> > 2 computers having the same data and logic come up with 2 different

242

> > answers, I wouldn't trust either of them.

243

>

244

> Yep, That the QA of  Transactions is rejected and must be resubmitted,

245

> modified or any number of remedies, is quite common in many forms of

246

> software. Voting does not correct errors, except maybe a fractional

247

> rounding up to 1(pass) or down to zero (failure). It does help to

248

> achieve the ACI of ACID

249

250

It's one way of doing it. But it can also cause extra delays due to having to 

251

wait for seperate nodes to finish and then to check if they all agree.

252

253

> Since billions and billions of these (complex) transactions are

254

> occurring, it is usually just repeated. If it keeps failing then

255

> engineers/coders take a deeper look. Rare statistical anomalies are

256

> auto-scrutinized (that would be replications and voting) and the pushed

257

> to a logical zero or logical one.

258

259

The complexity comes from having to mould the algorithm into that structure. 

260

And additional complexity also makes it more fault-likely.

261

262

> >> The reason this is important to me (and others?), is that, if this idea

263

> >> (granted there is much more detail to it) is still valid, then it can

264

> >> form  the basis for building up superior-ACID processes, that meet or

265

> >> exceed, the properties of an expensive (think Oracle) transaction

266

> >> process on distributed (parallel) or clustered systems, to a degree of

267

> >> accuracy only limited by the limit of the number of odd numbered voter

268

> >> codes involve in the distributed and replicated parts of the

269

> >> transaction. I even added some code where replicated routines were

270

> >> written in different languages, and the results compared to add an

271

> >> additional layer of verification before the voter step. (gotta love

272

> >> assembler?).

273

> >

274

> > You have seen how "democracies" work, right? :)

275

>

276

> Yes I need to shed some light on  telecom processing. I never intend to

277

> suggest that voting corrected errors; althoght error correction codes

278

> are usually part of the overall stack. I  tried to suggest that all

279

> transactions on phone switches are already (Atomic (pass or fail-redo;

280

> Consistent (replications pass on different hardware pathways to

281

> satisfaction metrics; Isolated via multiple hardware pathways; Durable

282

> passing a voter check scheme and (five nines still is the gold standard

283

> for a system (even mil-spec).

284

>

285

> So the old telecom systems are indeed and infact  the heritage for

286

> modern ACID transactions.

287

288

A lot can be described using 'modern' designs. However, the fact remains that 

289

ACID was worked out for databases and not for phone systems. Any sane system 

290

will have some form of consistency checks, but the extent where this is done 

291

for a data storage layer, like a database, will be different to the extent 

292

where this is done for a switching layer, like a router or phone switch.

293

294

Modern phone switches will not implement a redo.

295

296

> > The more voters involved, the longer it takes for all the votes to be

297

> > counted.

298

> Wrong! Voters are all run in parallel. For this level of redundancy (to

299

> achieve a QA result of 99.999% system pristine, it is more expensive,

300

> analogous to encryption versus clear text. Nobody, but a business major

301

> would use an excessive number of voters in their switching fabric.

302

> Telecom incompetences, in my experiences, has been the domain of mid

303

> manager too weak to educate upper management on poor ideas many of them

304

> have had and continue to have (Verizon comes to mind, too often).

305

306

Those incompetencies are usually in the domain of finances and services 

307

provided. The basic service of a telecoms company is pretty simple: "Pass 

308

data/voice between A and B".

309

There are plenty of proven systems available that can do this. The mistakes 

310

are usually of the kind: The system that we bought does not handle the load 

311

the salesperson promised.

312

313

> > With a small number, it might actually still scale, but when you pass a

314

> > magic number (no clue what this would be), the counting time starts to

315

> > exceed any time you might have gained by adding more voters.

316

>

317

> Nope the larger the number, the more expensive. The number of voters

318

> rarely goes above 5, but it could for some sorts of physics problems

319

> (think quantum mechanics and logic not bound to [0 1] whole numbers.

320

> Often logic circuits (constructs for programmers, have "dont care"

321

> states that can be handled in a variety of ways (filters, transforms,

322

> counters etc etc).

323

324

"don't care" values should always be ignored. Never actually used. (Except for 

325

randomizer functionality)

326

327

> > Also, this, to me, seems to counteract the whole reason for using

328

> > clusters:

329

> > Have different nodes handle a different part of the problem.

330

>

331

> That also occurs. But my point is properly design code for the cluster

332

> can replace ACID functions, offered by Oracle and other over priced

333

> solutions, on standard cluster hardware.

334

335

All commonly used relational databases have ACID functionality as long as they 

336

support transactions. There is no need to only choose a commercial version for 

337

that.

338

339

> The problem with todays

340

> clusters is the vendors that employ the kid-coders, are making things

341

> far more complicated that necessary, so the average linux hacker just

342

> outsources via the cloud. DUMB, insecure and not a wise choice for many

343

> industries.

344

345

Moving your entire business into the cloud often is.

346

347

> And sooner or later folks are going to get wise can build

348

> their own clusters that just solve the problems they have. Surely hybrid

349

> clusters will domiant where the owner of the codes does outsource peak

350

> loads and mundance collects of ordinary (non-critical) data.

351

352

Eg. hybrid solutions...

353

354

> Vendors

355

> know this and have started another 'smoke and mirrors' campaign called

356

> (brace yourself) 'Unikernels'.....

357

358

"unikernels" is something a small group came up with... I see no practical 

359

benefit for that approach.

360

361

> Problem with that approach is they

362

> should just be using minized (focused) gentoo on striped and optimize

363

> linux kernels; but that is another lost art from the linux collection

364

365

I see "unikernels" as basically, running the applications directly on top of a 

366

hypervisor. I fail to see how this makes more sense than starting an 

367

application directly on top of an OS. The whole reason we have an OS is to 

368

avoid having to reinvent the wheel (networking, storage, memory handling,....) 

369

for every single program.

370

371

> > Clusters of multiple compute-nodes is a quick and "simple" way of

372

> > increasing the amount of computational cores to throw at problems that

373

> > can be broken down in a lot of individual steps with minimal

374

> > inter-dependencies.

375

>

376

> And surpass the ACID features of either postgresql or Oracle, and spend

377

> less money (maybe not with you and postgresql on their team)!

378

379

Large clusters are useful when doing Hadoop ("big data") style things (I 

380

mostly work with financial systems and the corresponding data).

381

Storing the entire datawarehouse inside a cluster doesn't work with all the 

382

additional requirements. Reports still need to be displayed quickly and a 

383

decently configured database is usually more beneficial. Where systems like 

384

Exadata really help here is by integrating the underlying storage (SAN) with 

385

the actual database servers and doing most of the processing in-memory.

386

Eg. it works like a dedicated and custom build cluster environment specifically 

387

for a relational database.

388

389

390

> > I say "simple" because I think designing a 1,000 core chip is more

391

> > difficult than building a 1,000-node cluster using single-core, single

392

> > cpu boxes.

393

> Today, you are correct. Tomorrow you will be wrong.

394

395

In that case, clusters will be obsolete tomorrow.

396

397

> [1]. Besides once

398

> that chip or VHDL code or whatever is designed, it can be replicated and

399

> resused endlessly. Think ASIC designers, folks to take a fpga project to

400

> completing, An EE can codes on large arrays of DSPs, or a GPU

401

> (think Khronos group) using Vulcan.

402

>

403

> > I would still consider the cluster to be a single "machine".

404

>

405

> Thats the goal.

406

407

That, in my opinion, that goal has already been achieved. Unless you want ALL 

408

machines to be part of the same cluster and all machines being able to push 

409

work to the entire cluster...

410

In that case, good luck in achieving this as you then also need to handle 

411

"randomly dissapearing nodes"

412

413

> >> I guess my point is 'Douglas' is full of stuffing, OR that is what folks

414

> >> are doing when they 'role their own solution specifically customized to

415

> >> their specific needs' as he alludes to near the end of his commentary?

416

> >

417

> > The response Douglas linked to is closer to what seems to work when

418

> > dealing

419

> > with large amounts of data.

420

> >

421

> >> (I'd like your opinion of this and maybe some links to current schemes

422

> >> how to have ACID/99.999% accurate transactions on clusters of various

423

> >> architectures.)  Douglas, like yourself, writes of these things in a

424

> >> very lucid fashion, so that is why I'm asking you for your thoughts.

425

> >

426

> > The way Uber created the cluster is useful when having 1 node handle all

427

> > the updates and multiple nodes providing read-only access while also

428

> > providing failover functionality.

429

>

430

> SIMD solution, mimic on a cluster? Cool.

431

432

Hmm.... no.

433

This is load balancing on the data-retrieval side.

434

435

> >> Robustness of transactions, in a distributed (clustered) environment is

436

> >> fundamental to the usefulness of most codes that are trying to migrate

437

> >> to a cluster based processes in (VM/container/HPC) environments.

438

> >

439

> > Whereas I do consider clusters to be very useful, not all work-loads can

440

> > be

441

> > redesigned to scale properly.

442

>

443

> Today, correct. Tomorrow, I think you are going to be wrong. It's like

444

> the single core, multicore.

445

446

And 90+% of developers still don't understand how to properly code for multi-

447

threading. Just look at how most applications work on your desktop. They all 

448

tend to max out a single core and the other x-1 cores tend to idle...

449

450

> Granted many old decreped codes had to be

451

> redesigned and coded anew with threads and other modern constructs to

452

> take advantage of newer processing platforms.

453

454

Intel came with Hyperthreading back in 2005 (or even before). We are now in 

455

2016 and the majority of code is still single-threaded.

456

The problem is, the algorithms that are being used need to be converted to 

457

parallel methods.

458

459

> Sure the same is true with

460

> distributed, but it's far closer than ever. The largest problem with

461

> cluster, is Vendors with agendas, are making things more complicated

462

> than necessary and completely ignoring many fundamental issues, like

463

> kernel stripping and optimizations under the bloated OS they are using.

464

465

I still want a graphical desktop with full multi media support. I still want 

466

to easily plugin a USB device or SD-card and use it immediately,.....

467

That requirement is incompatible with stripping the OS.

468

469

> >> I do

470

> >> not have the old articles handy but, I'm sure that many/most of those

471

> >> types of inherent processes can be formulated in the algebraic domain,

472

> >> normalized and used to solve decisions often where other forms of

473

> >> advanced logic failed (not that I'm taking a cheap shot at modern

474

> >> programming languages) (wink wink nudge nudge); or at least that's how

475

> >> we did it.... as young whipper_snappers bask in the day...

476

> >

477

> > If you know what you are doing, the language is just a tool. Sometimes a

478

> > hammer is sufficient, other times one might need to use a screwdriver.

479

> >

480

> >> --an_old_farts_logic

481

> >

482

> > Thinking back on how long I've been playing with computers, I wonder how

483

> > long it will be until I am in the "old fart" category?

484

>

485

> Stay young! I run full court hoops all the time with young college

486

> punks; it's one of my greatest joys in life, run with the young

487

> stallions, hacking, pushing, shoving, slicing and taunting other

488

> athletes. Old farts clubs is not something to be proud of, I just like

489

> to share too much......

490

491

Hehe.... One is only as old as he/she feels.

492

493

--

494

Joost

Gentoo Archives: gentoo-user

Replies