1 |
On 08/01/2016 11:49 AM, J. Roeleveld wrote: |
2 |
> On Monday, August 01, 2016 08:43:49 AM james wrote: |
3 |
>> On 08/01/2016 02:16 AM, J. Roeleveld wrote: |
4 |
>>> On Saturday, July 30, 2016 06:38:01 AM Rich Freeman wrote: |
5 |
>>>> On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@×××××.com> |
6 |
>>> |
7 |
>>> wrote: |
8 |
>>>>> On 29/07/2016 22:58, Mick wrote: |
9 |
>>>>>> Interesting article explaining why Uber are moving away from |
10 |
>>>>>> PostgreSQL. |
11 |
>>>>>> I am |
12 |
>>>>>> running both DBs on different desktop PCs for akonadi and I'm also |
13 |
>>>>>> running |
14 |
>>>>>> MySQL on a number of websites. Let's which one goes sideways first. |
15 |
>>>>>> :p |
16 |
>>>>>> |
17 |
>>>>>> https://eng.uber.com/mysql-migration/ |
18 |
>>>>> |
19 |
>>>>> I don't think your akonadi and some web sites compares in any way to |
20 |
>>>>> Uber |
21 |
>>>>> and what they do. |
22 |
>>>>> |
23 |
>>>>> FWIW, my Dev colleagues support and entire large corporate ISP's |
24 |
>>>>> operational and customer data on PostgreSQL-9.3. With clustering. With |
25 |
>>>>> no |
26 |
>>>>> db-related issues :-) |
27 |
>>>> |
28 |
>>>> Agree, you'd need to be fairly large-scale to have their issues, |
29 |
>>> |
30 |
>>> And also have to design your database by people who think MySQL actually |
31 |
>>> follows common SQL standards. |
32 |
>>> |
33 |
>>>> but I |
34 |
>>>> think the article was something anybody interested in databases should |
35 |
>>>> read. If nothing else it is a really easy to follow explanation of |
36 |
>>>> the underlying architectures. |
37 |
>>> |
38 |
>>> Check the link posted by Douglas. |
39 |
>>> Ubers article has some misunderstandings about the architecture with |
40 |
>>> conclusions drawn that are, at least also, caused by their database design |
41 |
>>> and usage. |
42 |
>>> |
43 |
>>>> I'll probably post this to my LUG mailing list. I think one of the |
44 |
>>>> Postgres devs lurks there so I'm curious to his impressions. |
45 |
>>>> |
46 |
>>>> I was a bit surprised to hear about the data corruption bug. I've |
47 |
>>>> always considered Postgres to have a better reputation for data |
48 |
>>>> integrity. |
49 |
>>> |
50 |
>>> They do. |
51 |
>>> |
52 |
>>>> And of course almost any FOSS project could have a bug. I |
53 |
>>>> don't know if either project does the kind of regression testing to |
54 |
>>>> reliably detect this sort of issue. |
55 |
>>> |
56 |
>>> Not sure either, I do think PostgreSQL does a lot with regression tests. |
57 |
>>> |
58 |
>>>> I'd think that it is more likely |
59 |
>>>> that the likes of Oracle would (for their flagship DB (not for MySQL), |
60 |
>>> |
61 |
>>> Never worked with Oracle (or other big software vendors), have you? :) |
62 |
>>> |
63 |
>>>> and they'd probably be more likely to send out an engineer to beg |
64 |
>>>> forgiveness while they fix your database). |
65 |
>>> |
66 |
>>> Only if you're a big (as in, spend a lot of money with them) customer. |
67 |
>>> |
68 |
>>>> Of course, if you're Uber |
69 |
>>>> the hit you'd take from downtime/etc isn't made up for entirely by |
70 |
>>>> having somebody take a few days to get everything fixed. |
71 |
>>> |
72 |
>>> -- |
73 |
>>> Joost |
74 |
>> |
75 |
>> I certainly respect your skills and posts on Databases, Joost, as |
76 |
>> everything you have posted, in the past is 'spot on'. |
77 |
> |
78 |
> Comes with a keen interest and long-term (think decades) of working with |
79 |
> different databases. |
80 |
> |
81 |
>> Granted, I'm no database expert, far from it. |
82 |
> |
83 |
> Not many people are, nor do they need to be. |
84 |
> |
85 |
>> But I want to share a few thing with you, |
86 |
>> and hope you (and others) will 'chime in' on these comments. |
87 |
>> |
88 |
>> Way back, when the earth was cooling and we all had dinosaurs for pets, |
89 |
>> some of us hacked on AT&T "3B2" unix systems. They were know for their |
90 |
>> 'roll back and recovery', triplicated (or more) transaction processes |
91 |
>> and 'voters' system to ferret out if a transaction was complete and |
92 |
>> correct. There was no ACID, the current 'gold standard' if you believe |
93 |
>> what Douglas and other write about concerning databases. |
94 |
>> |
95 |
>> In essence, (from crusted up memories) a basic (SS7) transaction related |
96 |
>> to the local telephone switch, was ran on 3 machines. The results were |
97 |
>> compared. If they matched, the transaction went forward as valid. If 2/3 |
98 |
>> matched, |
99 |
> |
100 |
> And what in the likely case when only 1 was correct? |
101 |
|
102 |
1/3 was a failure, in fact X<1 could be defined (parameter setting) as a |
103 |
failure depending on the need. |
104 |
|
105 |
> Have you seen the movie "minority report"? |
106 |
> If yes, think back to why Tom Cruise was found 'guilty' when he wasn't and how |
107 |
> often this actually occured. |
108 |
|
109 |
Apples to Oranges. The (3) "pre-cons" were not equal, ableit the voted, |
110 |
most of the time all three in agreement, but the dominant pre-con was |
111 |
always on the correct side of the issue. But that is make-believe. |
112 |
Comparing results of codes run on 3 different processors or separate |
113 |
machines for agreement withing tolerances, is quite different. The very |
114 |
essence of using voting where there a result less that 1.0 (that is |
115 |
n-1/n or n-x/n was requisite on identical (replicated) processes all |
116 |
returning the same result ( expecting either a 0 or 1) returned. Results |
117 |
being logical or within rounding error of acceptance. Surely we need not |
118 |
split hairs. I was merely pointing out that the basis telecom systems |
119 |
formed the early and of widespread transaction processing industries and |
120 |
is the grand daddy of the ACID model/norms/constructs of modern |
121 |
transaction processing. And Douglas is |
122 |
dead wrong that those sorts of (ACID) transactions cannot be made to fly |
123 |
on clusters versus a single machine. For massively parallel needs, |
124 |
distributed processing rules, but it is not trivial and hence Uber, with |
125 |
mostly a bunch of kids, seems to be struggling and have made bad |
126 |
decisions. Prolly, there mid managers and software architects are the |
127 |
weak link, or they did get expert guidance that was not inhouse, or poor |
128 |
decisions to get some code running quickly etc etc. I do not really care |
129 |
about UBER. My singular issue is Douglas was completely dead wrong |
130 |
(which nicely promoted himself as a postgress expert and his business |
131 |
credentals, and just barely saved his credibility by stating what UBER |
132 |
is now doing that is superior to a grade ACID, dB solution. |
133 |
|
134 |
Another point, there are single big GPUs that can be run as thousands of |
135 |
different processors on either FPGA or GPU, granted using SIMD/MIMD |
136 |
style processors and thing like 'systolic algorithms' but that sort of |
137 |
this is out of scope here. (Vulcan might change that, in an open source |
138 |
kind of way, maybe). Furthermore, GPU resources combined with DDR-5 can |
139 |
blur the line and may actually be more cost effective for many forms of |
140 |
transaction processing, but clusters, in their current forms are very |
141 |
much general purpose machines. My point:: Douglas is dead wrong about |
142 |
ACID being dominated by Databases, for technical reasons, particularly |
143 |
for advanced teams of experts. Surely most MBA, HR and Finance types of |
144 |
idiots running these new startups would know know a coder from an |
145 |
architect, and that is very sad, because a good consultant could have |
146 |
probably designed several robust systems in a week or two. Grant few |
147 |
consultants has that sort of unbiased integrity, because we all have |
148 |
bills to pay and much is getting outsourced... Integrity has always been |
149 |
the rarest of qualities, particularly with humanoids...... |
150 |
|
151 |
|
152 |
> |
153 |
>> and the switch was was configured, then the code would |
154 |
>> essentially 'vote' and majority ruled. This is what led to phone calls |
155 |
>> (switched phone calls) having variable delays, often in the order of |
156 |
>> seconds, mis-connections and other problems we all encountered during |
157 |
>> periods of excessive demand. |
158 |
> |
159 |
> Not sure if that was the cause in the past, but these days it can also still |
160 |
> take a few seconds before the other end rings. This is due to the phone-system |
161 |
> (all PBXs in the path) needing to setup the routing between both end-points |
162 |
> prior to the ring-tone actually starting. |
163 |
> When the system is busy, these lookups will take time and can even time-out. |
164 |
> (Try wishing everyone you know a happy new year using a wired phone and you'll |
165 |
> see what I mean. Mobile phones have a seperate problem at that time) |
166 |
|
167 |
I did not intend to argue about the minutia of how a particular Baby |
168 |
Bell implemented their SS7 switching systems on unix systems. My point |
169 |
was the 'transaction processing' grew out the early telephone network, |
170 |
the way I remember it:: ymmv. Banks did dual entry accounting by hand |
171 |
and had clerks manually load data sets, then double entry accounting |
172 |
became automated and ACID style transaction processing added later. So |
173 |
what sql folks refer to as ACID properties, comes from the North |
174 |
American switching heritage and eventually the worlds telecom networks, |
175 |
eons ago. |
176 |
|
177 |
>> That scenario was at the heart of how old, crappy AT&T unix (SVR?) could |
178 |
>> perform so well and therefore established the gold standard for RT |
179 |
>> transaction processing, aka the "five 9s" 99.999% of up-time (about 5 |
180 |
>> minutes per year of downtime). |
181 |
> |
182 |
> "Unscheduled" downtime. Regular maintenance will require more than 5 minutes |
183 |
> per year. |
184 |
|
185 |
Yes but the redundancy of 3b2 and other computers (Sequent, Sequoia and |
186 |
Tandem to name a few) meant that the "phone switching" fabric, at any |
187 |
given Central Office (the local building where the copper, Rf and fiber |
188 |
lines are muxed)(was, on average up and available 99.999% of the time. |
189 |
Ironically gentoo now has a 'sys/fabric group :: |
190 |
/usr/portage/sys-fabric, thanks to some forward thinking cluster folk. |
191 |
|
192 |
> |
193 |
>> Sure this part is only related to |
194 |
>> transaction processing as there was much more to the "five 9s" legacy, |
195 |
>> but imho, that is the heart of what was the precursor to ACID property's |
196 |
>> now so greatly espoused in SQL codes that Douglas refers to. |
197 |
>> |
198 |
>> Do folks concur or disagree at this point? |
199 |
> |
200 |
> ACID is about data integrity. The "best 2 out of 3" voting was, in my opinion, |
201 |
> a work-around for unreliable hardware. |
202 |
|
203 |
Absolute true. But the fact that a High Reliability in computer |
204 |
processing (including the billing) could be replicated performed |
205 |
elsewhere and then 'recombined', proves that the need of any ACID |
206 |
function can be split up and ran on clusters and achieve ACID standards |
207 |
or even better. So my point, is that the cluster, if used wisely, |
208 |
will beat the 'dog shit' out of any Oracle fancy-pants database |
209 |
maneuvers. Evidence:: Snoracle is now snapping up billion dollar |
210 |
companies in the cluster space, cause their days of extortion are |
211 |
winding down rather rapidly, imho. |
212 |
|
213 |
Also, just because the kids are writing the codes, have not figured all |
214 |
of this out, does not mean that SQL and any abstraction is better that |
215 |
parallel processing. No way in hell. Cheaper and quicker to set up, |
216 |
surely true, but never superior to a well design properly coded |
217 |
distributed solution. That's my point. Hence, Douglas is full of |
218 |
stuffing, except he alludes to the fact that UBER is doing something |
219 |
much better, beyond what Oracle has an interest in doing, at the last |
220 |
possible moment in his critique. This is back up by Oracles lethargic |
221 |
reaction to the data processing market just leaving Oracle to become the |
222 |
next IBM.... (ymmv). |
223 |
|
224 |
> It is based on a clever idea, but when |
225 |
> 2 computers having the same data and logic come up with 2 different answers, I |
226 |
> wouldn't trust either of them. |
227 |
|
228 |
Yep, That the QA of Transactions is rejected and must be resubmitted, |
229 |
modified or any number of remedies, is quite common in many forms of |
230 |
software. Voting does not correct errors, except maybe a fractional |
231 |
rounding up to 1(pass) or down to zero (failure). It does help to |
232 |
achieve the ACI of ACID |
233 |
|
234 |
Since billions and billions of these (complex) transactions are |
235 |
occurring, it is usually just repeated. If it keeps failing then |
236 |
engineers/coders take a deeper look. Rare statistical anomalies are |
237 |
auto-scrutinized (that would be replications and voting) and the pushed |
238 |
to a logical zero or logical one. |
239 |
|
240 |
> |
241 |
>> The reason this is important to me (and others?), is that, if this idea |
242 |
>> (granted there is much more detail to it) is still valid, then it can |
243 |
>> form the basis for building up superior-ACID processes, that meet or |
244 |
>> exceed, the properties of an expensive (think Oracle) transaction |
245 |
>> process on distributed (parallel) or clustered systems, to a degree of |
246 |
>> accuracy only limited by the limit of the number of odd numbered voter |
247 |
>> codes involve in the distributed and replicated parts of the |
248 |
>> transaction. I even added some code where replicated routines were |
249 |
>> written in different languages, and the results compared to add an |
250 |
>> additional layer of verification before the voter step. (gotta love |
251 |
>> assembler?). |
252 |
> |
253 |
> You have seen how "democracies" work, right? :) |
254 |
|
255 |
Yes I need to shed some light on telecom processing. I never intend to |
256 |
suggest that voting corrected errors; althoght error correction codes |
257 |
are usually part of the overall stack. I tried to suggest that all |
258 |
transactions on phone switches are already (Atomic (pass or fail-redo; |
259 |
Consistent (replications pass on different hardware pathways to |
260 |
satisfaction metrics; Isolated via multiple hardware pathways; Durable |
261 |
passing a voter check scheme and (five nines still is the gold standard |
262 |
for a system (even mil-spec). |
263 |
|
264 |
So the old telecom systems are indeed and infact the heritage for |
265 |
modern ACID transactions. |
266 |
|
267 |
|
268 |
|
269 |
> The more voters involved, the longer it takes for all the votes to be counted. |
270 |
|
271 |
Wrong! Voters are all run in parallel. For this level of redundancy (to |
272 |
achieve a QA result of 99.999% system pristine, it is more expensive, |
273 |
analogous to encryption versus clear text. Nobody, but a business major |
274 |
would use an excessive number of voters in their switching fabric. |
275 |
Telecom incompetences, in my experiences, has been the domain of mid |
276 |
manager too weak to educate upper management on poor ideas many of them |
277 |
have had and continue to have (Verizon comes to mind, too often). |
278 |
|
279 |
> With a small number, it might actually still scale, but when you pass a magic |
280 |
> number (no clue what this would be), the counting time starts to exceed any |
281 |
> time you might have gained by adding more voters. |
282 |
|
283 |
Nope the larger the number, the more expensive. The number of voters |
284 |
rarely goes above 5, but it could for some sorts of physics problems |
285 |
(think quantum mechanics and logic not bound to [0 1] whole numbers. |
286 |
Often logic circuits (constructs for programmers, have "dont care" |
287 |
states that can be handled in a variety of ways (filters, transforms, |
288 |
counters etc etc). |
289 |
|
290 |
> Also, this, to me, seems to counteract the whole reason for using clusters: |
291 |
> Have different nodes handle a different part of the problem. |
292 |
|
293 |
That also occurs. But my point is properly design code for the cluster |
294 |
can replace ACID functions, offered by Oracle and other over priced |
295 |
solutions, on standard cluster hardware. The problem with todays |
296 |
clusters is the vendors that employ the kid-coders, are making things |
297 |
far more complicated that necessary, so the average linux hacker just |
298 |
outsources via the cloud. DUMB, insecure and not a wise choice for many |
299 |
industries. And sooner or later folks are going to get wise can build |
300 |
their own clusters that just solve the problems they have. Surely hybrid |
301 |
clusters will domiant where the owner of the codes does outsource peak |
302 |
loads and mundance collects of ordinary (non-critical) data. Vendors |
303 |
know this and have started another 'smoke and mirrors' campaign called |
304 |
(brace yourself) 'Unikernels'..... Problem with that approach is they |
305 |
should just be using minized (focused) gentoo on striped and optimize |
306 |
linux kernels; but that is another lost art from the linux collection |
307 |
|
308 |
> |
309 |
> Clusters of multiple compute-nodes is a quick and "simple" way of increasing |
310 |
> the amount of computational cores to throw at problems that can be broken down |
311 |
> in a lot of individual steps with minimal inter-dependencies. |
312 |
|
313 |
And surpass the ACID features of either postgresql or Oracle, and spend |
314 |
less money (maybe not with you and postgresql on their team)! |
315 |
|
316 |
|
317 |
> I say "simple" because I think designing a 1,000 core chip is more difficult |
318 |
> than building a 1,000-node cluster using single-core, single cpu boxes. |
319 |
|
320 |
Today, you are correct. Tomorrow you will be wrong. [1]. Besides once |
321 |
that chip or VHDL code or whatever is designed, it can be replicated and |
322 |
resused endlessly. Think ASIC designers, folks to take a fpga project to |
323 |
completing, An EE can codes on large arrays of DSPs, or a GPU |
324 |
(think Khronos group) using Vulcan. |
325 |
|
326 |
|
327 |
> |
328 |
> I would still consider the cluster to be a single "machine". |
329 |
|
330 |
Thats the goal. |
331 |
|
332 |
> |
333 |
>> I guess my point is 'Douglas' is full of stuffing, OR that is what folks |
334 |
>> are doing when they 'role their own solution specifically customized to |
335 |
>> their specific needs' as he alludes to near the end of his commentary? |
336 |
> |
337 |
> The response Douglas linked to is closer to what seems to work when dealing |
338 |
> with large amounts of data. |
339 |
> |
340 |
>> (I'd like your opinion of this and maybe some links to current schemes |
341 |
>> how to have ACID/99.999% accurate transactions on clusters of various |
342 |
>> architectures.) Douglas, like yourself, writes of these things in a |
343 |
>> very lucid fashion, so that is why I'm asking you for your thoughts. |
344 |
> |
345 |
> The way Uber created the cluster is useful when having 1 node handle all the |
346 |
> updates and multiple nodes providing read-only access while also providing |
347 |
> failover functionality. |
348 |
|
349 |
SIMD solution, mimic on a cluster? Cool. |
350 |
> |
351 |
>> Robustness of transactions, in a distributed (clustered) environment is |
352 |
>> fundamental to the usefulness of most codes that are trying to migrate |
353 |
>> to a cluster based processes in (VM/container/HPC) environments. |
354 |
> |
355 |
> Whereas I do consider clusters to be very useful, not all work-loads can be |
356 |
> redesigned to scale properly. |
357 |
|
358 |
Today, correct. Tomorrow, I think you are going to be wrong. It's like |
359 |
the single core, multicore. Granted many old decreped codes had to be |
360 |
redesigned and coded anew with threads and other modern constructs to |
361 |
take advantage of newer processing platforms. Sure the same is true with |
362 |
distributed, but it's far closer than ever. The largest problem with |
363 |
cluster, is Vendors with agendas, are making things more complicated |
364 |
than necessary and completely ignoring many fundamental issues, like |
365 |
kernel stripping and optimizations under the bloated OS they are using. |
366 |
|
367 |
> |
368 |
>> I do |
369 |
>> not have the old articles handy but, I'm sure that many/most of those |
370 |
>> types of inherent processes can be formulated in the algebraic domain, |
371 |
>> normalized and used to solve decisions often where other forms of |
372 |
>> advanced logic failed (not that I'm taking a cheap shot at modern |
373 |
>> programming languages) (wink wink nudge nudge); or at least that's how |
374 |
>> we did it.... as young whipper_snappers bask in the day... |
375 |
> |
376 |
> If you know what you are doing, the language is just a tool. Sometimes a |
377 |
> hammer is sufficient, other times one might need to use a screwdriver. |
378 |
> |
379 |
>> --an_old_farts_logic |
380 |
> |
381 |
> Thinking back on how long I've been playing with computers, I wonder how long |
382 |
> it will be until I am in the "old fart" category? |
383 |
|
384 |
Stay young! I run full court hoops all the time with young college |
385 |
punks; it's one of my greatest joys in life, run with the young |
386 |
stallions, hacking, pushing, shoving, slicing and taunting other |
387 |
athletes. Old farts clubs is not something to be proud of, I just like |
388 |
to share too much...... |
389 |
|
390 |
> Joost |
391 |
|
392 |
Thanks ! |
393 |
|
394 |
James |