Gentoo Archives: gentoo-user

From: "J. Roeleveld" <joost@××××××××.org>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
Date: Mon, 01 Aug 2016 16:49:59
Message-Id: 1954543.QCiV5ja7vZ@andromeda
In Reply to: Re: [gentoo-user] PostgreSQL Vs MySQL @Uber by james
1 On Monday, August 01, 2016 08:43:49 AM james wrote:
2 > On 08/01/2016 02:16 AM, J. Roeleveld wrote:
3 > > On Saturday, July 30, 2016 06:38:01 AM Rich Freeman wrote:
4 > >> On Sat, Jul 30, 2016 at 6:24 AM, Alan McKinnon <alan.mckinnon@×××××.com>
5 > >
6 > > wrote:
7 > >>> On 29/07/2016 22:58, Mick wrote:
8 > >>>> Interesting article explaining why Uber are moving away from
9 > >>>> PostgreSQL.
10 > >>>> I am
11 > >>>> running both DBs on different desktop PCs for akonadi and I'm also
12 > >>>> running
13 > >>>> MySQL on a number of websites. Let's which one goes sideways first.
14 > >>>> :p
15 > >>>>
16 > >>>> https://eng.uber.com/mysql-migration/
17 > >>>
18 > >>> I don't think your akonadi and some web sites compares in any way to
19 > >>> Uber
20 > >>> and what they do.
21 > >>>
22 > >>> FWIW, my Dev colleagues support and entire large corporate ISP's
23 > >>> operational and customer data on PostgreSQL-9.3. With clustering. With
24 > >>> no
25 > >>> db-related issues :-)
26 > >>
27 > >> Agree, you'd need to be fairly large-scale to have their issues,
28 > >
29 > > And also have to design your database by people who think MySQL actually
30 > > follows common SQL standards.
31 > >
32 > >> but I
33 > >> think the article was something anybody interested in databases should
34 > >> read. If nothing else it is a really easy to follow explanation of
35 > >> the underlying architectures.
36 > >
37 > > Check the link posted by Douglas.
38 > > Ubers article has some misunderstandings about the architecture with
39 > > conclusions drawn that are, at least also, caused by their database design
40 > > and usage.
41 > >
42 > >> I'll probably post this to my LUG mailing list. I think one of the
43 > >> Postgres devs lurks there so I'm curious to his impressions.
44 > >>
45 > >> I was a bit surprised to hear about the data corruption bug. I've
46 > >> always considered Postgres to have a better reputation for data
47 > >> integrity.
48 > >
49 > > They do.
50 > >
51 > >> And of course almost any FOSS project could have a bug. I
52 > >> don't know if either project does the kind of regression testing to
53 > >> reliably detect this sort of issue.
54 > >
55 > > Not sure either, I do think PostgreSQL does a lot with regression tests.
56 > >
57 > >> I'd think that it is more likely
58 > >> that the likes of Oracle would (for their flagship DB (not for MySQL),
59 > >
60 > > Never worked with Oracle (or other big software vendors), have you? :)
61 > >
62 > >> and they'd probably be more likely to send out an engineer to beg
63 > >> forgiveness while they fix your database).
64 > >
65 > > Only if you're a big (as in, spend a lot of money with them) customer.
66 > >
67 > >> Of course, if you're Uber
68 > >> the hit you'd take from downtime/etc isn't made up for entirely by
69 > >> having somebody take a few days to get everything fixed.
70 > >
71 > > --
72 > > Joost
73 >
74 > I certainly respect your skills and posts on Databases, Joost, as
75 > everything you have posted, in the past is 'spot on'.
76
77 Comes with a keen interest and long-term (think decades) of working with
78 different databases.
79
80 > Granted, I'm no database expert, far from it.
81
82 Not many people are, nor do they need to be.
83
84 > But I want to share a few thing with you,
85 > and hope you (and others) will 'chime in' on these comments.
86 >
87 > Way back, when the earth was cooling and we all had dinosaurs for pets,
88 > some of us hacked on AT&T "3B2" unix systems. They were know for their
89 > 'roll back and recovery', triplicated (or more) transaction processes
90 > and 'voters' system to ferret out if a transaction was complete and
91 > correct. There was no ACID, the current 'gold standard' if you believe
92 > what Douglas and other write about concerning databases.
93 >
94 > In essence, (from crusted up memories) a basic (SS7) transaction related
95 > to the local telephone switch, was ran on 3 machines. The results were
96 > compared. If they matched, the transaction went forward as valid. If 2/3
97 > matched,
98
99 And what in the likely case when only 1 was correct?
100 Have you seen the movie "minority report"?
101 If yes, think back to why Tom Cruise was found 'guilty' when he wasn't and how
102 often this actually occured.
103
104 > and the switch was was configured, then the code would
105 > essentially 'vote' and majority ruled. This is what led to phone calls
106 > (switched phone calls) having variable delays, often in the order of
107 > seconds, mis-connections and other problems we all encountered during
108 > periods of excessive demand.
109
110 Not sure if that was the cause in the past, but these days it can also still
111 take a few seconds before the other end rings. This is due to the phone-system
112 (all PBXs in the path) needing to setup the routing between both end-points
113 prior to the ring-tone actually starting.
114 When the system is busy, these lookups will take time and can even time-out.
115 (Try wishing everyone you know a happy new year using a wired phone and you'll
116 see what I mean. Mobile phones have a seperate problem at that time)
117
118 > That scenario was at the heart of how old, crappy AT&T unix (SVR?) could
119 > perform so well and therefore established the gold standard for RT
120 > transaction processing, aka the "five 9s" 99.999% of up-time (about 5
121 > minutes per year of downtime).
122
123 "Unscheduled" downtime. Regular maintenance will require more than 5 minutes
124 per year.
125
126 > Sure this part is only related to
127 > transaction processing as there was much more to the "five 9s" legacy,
128 > but imho, that is the heart of what was the precursor to ACID property's
129 > now so greatly espoused in SQL codes that Douglas refers to.
130 >
131 > Do folks concur or disagree at this point?
132
133 ACID is about data integrity. The "best 2 out of 3" voting was, in my opinion,
134 a work-around for unreliable hardware. It is based on a clever idea, but when
135 2 computers having the same data and logic come up with 2 different answers, I
136 wouldn't trust either of them.
137
138 > The reason this is important to me (and others?), is that, if this idea
139 > (granted there is much more detail to it) is still valid, then it can
140 > form the basis for building up superior-ACID processes, that meet or
141 > exceed, the properties of an expensive (think Oracle) transaction
142 > process on distributed (parallel) or clustered systems, to a degree of
143 > accuracy only limited by the limit of the number of odd numbered voter
144 > codes involve in the distributed and replicated parts of the
145 > transaction. I even added some code where replicated routines were
146 > written in different languages, and the results compared to add an
147 > additional layer of verification before the voter step. (gotta love
148 > assembler?).
149
150 You have seen how "democracies" work, right? :)
151 The more voters involved, the longer it takes for all the votes to be counted.
152 With a small number, it might actually still scale, but when you pass a magic
153 number (no clue what this would be), the counting time starts to exceed any
154 time you might have gained by adding more voters.
155
156 Also, this, to me, seems to counteract the whole reason for using clusters:
157 Have different nodes handle a different part of the problem.
158
159 Clusters of multiple compute-nodes is a quick and "simple" way of increasing
160 the amount of computational cores to throw at problems that can be broken down
161 in a lot of individual steps with minimal inter-dependencies.
162 I say "simple" because I think designing a 1,000 core chip is more difficult
163 than building a 1,000-node cluster using single-core, single cpu boxes.
164
165 I would still consider the cluster to be a single "machine".
166
167 > I guess my point is 'Douglas' is full of stuffing, OR that is what folks
168 > are doing when they 'role their own solution specifically customized to
169 > their specific needs' as he alludes to near the end of his commentary?
170
171 The response Douglas linked to is closer to what seems to work when dealing
172 with large amounts of data.
173
174 > (I'd like your opinion of this and maybe some links to current schemes
175 > how to have ACID/99.999% accurate transactions on clusters of various
176 > architectures.) Douglas, like yourself, writes of these things in a
177 > very lucid fashion, so that is why I'm asking you for your thoughts.
178
179 The way Uber created the cluster is useful when having 1 node handle all the
180 updates and multiple nodes providing read-only access while also providing
181 failover functionality.
182
183 > Robustness of transactions, in a distributed (clustered) environment is
184 > fundamental to the usefulness of most codes that are trying to migrate
185 > to a cluster based processes in (VM/container/HPC) environments.
186
187 Whereas I do consider clusters to be very useful, not all work-loads can be
188 redesigned to scale properly.
189
190 > I do
191 > not have the old articles handy but, I'm sure that many/most of those
192 > types of inherent processes can be formulated in the algebraic domain,
193 > normalized and used to solve decisions often where other forms of
194 > advanced logic failed (not that I'm taking a cheap shot at modern
195 > programming languages) (wink wink nudge nudge); or at least that's how
196 > we did it.... as young whipper_snappers bask in the day...
197
198 If you know what you are doing, the language is just a tool. Sometimes a
199 hammer is sufficient, other times one might need to use a screwdriver.
200
201 > --an_old_farts_logic
202
203 Thinking back on how long I've been playing with computers, I wonder how long
204 it will be until I am in the "old fart" category?
205
206 --
207 Joost

Replies

Subject Author
Re: [gentoo-user] PostgreSQL Vs MySQL @Uber Rich Freeman <rich0@g.o>
Re: [gentoo-user] PostgreSQL Vs MySQL @Uber james <garftd@×××××××.net>