Gentoo Archives: gentoo-user

From: "J. Roeleveld" <joost@××××××××.org>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] PostgreSQL Vs MySQL @Uber
Date: Mon, 01 Aug 2016 17:31:39
Message-Id: 2536867.zjvxr0Rkkr@andromeda
In Reply to: Re: [gentoo-user] PostgreSQL Vs MySQL @Uber by Rich Freeman
1 On Monday, August 01, 2016 11:01:28 AM Rich Freeman wrote:
2 > On Mon, Aug 1, 2016 at 3:16 AM, J. Roeleveld <joost@××××××××.org> wrote:
3 > > Check the link posted by Douglas.
4 > > Ubers article has some misunderstandings about the architecture with
5 > > conclusions drawn that are, at least also, caused by their database design
6 > > and usage.
7 >
8 > I've read it. I don't think it actually alleges any misunderstandings
9 > about the Postgres architecture, but rather that it doesn't perform as
10 > well in Uber's design. I don't think it actually alleges that Uber's
11 > design is a bad one in any way.
12
13 It was written quite diplomatic. Seeing the create table for the sample tables
14 already make me wonder how they designed their database schema. Especially
15 from a performance point of view. But that is a seperate discussion :)
16
17 > But, I'm certainly interested in anything else that develops here...
18
19 Same here, and I am hoping some others will also come up with some interesting
20 bits.
21
22 > >> And of course almost any FOSS project could have a bug. I
23 > >> don't know if either project does the kind of regression testing to
24 > >> reliably detect this sort of issue.
25 > >
26 > > Not sure either, I do think PostgreSQL does a lot with regression tests.
27 >
28 > Obviously they missed that bug. Of course, so did Uber in their
29 > internal testing. I've seen a DB bug in production (granted, only one
30 > so far) and they aren't pretty. A big issue for Uber is that their
31 > transaction rate and DB size is such that they really don't have a
32 > practical option of restoring backups.
33
34 From the slides on their migration from MySQL to PostgreSQL in 2013, I see it
35 took them 45 minutes to migrate 50GB of data.
36 To me, that seems like a very bad transfer-rate for, what I would consider, a
37 dev environment. It's only about 20MB/s.
38 I've seen "bad performing" ETL processes reading from 300GB of XML files and
39 loading that into 3 DB-tables within 1.5 hours. That's about 57MB/s.
40 With the XML-engine using up nearly 98% of the total CPU-load.
41
42 If the data would have been supplied in CSV-files, it would have been roughly
43 100GB of data. This could be easily loaded within 20 minutes. Equalling to
44 85MB/s. (Filling up the network bandwidth)
45
46 I think their database design and infrastructure isn't optimized for their
47 specific work-load. Which is, unfortunately, quite common.
48
49 > Obviously they'd do that in a
50 > complete disaster, but short of that they can't really afford to do
51 > so. By the time a backup is recorded it would be incredibly out of
52 > date. They have the same issue with the lack of online upgrades
53 > (which the responding article doesn't really talk about). They really
54 > need it to just work all the time.
55
56 When I migrate a Postgresql to a new major version, I migrate 1 database at a
57 time to minimize downtime. This is done by piping the output of the backup-
58 process straight into a restore-proces connected to the new server.
59
60 If it were even more time-critical, I would develop a migration proces that
61 would:
62 1) copy all the current (as in, needed today) to the new database
63 2) disable the application
64 3) copy all the latest changes for today to the new database
65 4) reenable the application (pointing to new database)
66 5) copy all the historical data I might need
67
68 I would add a note on the website and send out an email first informing the
69 customers that the data is being migrated and historical data might be
70 incomplete during this proces.
71
72 > >> I'd think that it is more likely
73 > >> that the likes of Oracle would (for their flagship DB (not for MySQL),
74 > >
75 > > Never worked with Oracle (or other big software vendors), have you? :)
76 >
77 > Actually, I almost exclusively work with them. Some are better than
78 > others. I don't work directly with Oracle, but I can say that the two
79 > times I've worked with an Oracle consultant they've been worth their
80 > weight in gold, and cost about as much.
81
82 They do have some good ones...
83
84 > The one was fixing some kind
85 > of RDB data corruption on a VAX that was easily a decade out of date
86 > at the time; I was shocked that they could find somebody who knew how
87 > to fix it. interestingly, it looks like they only abandoned RDB
88 > recently.
89
90 Probably one of the few people in the world. And he/she might have been hired
91 in by Oracle for this particular issue.
92
93 > They do tend to be a solution that involves throwing money at
94 > problems. My employer was having issues with a database from another
95 > big software vendor which I'm sure was the result of bad application
96 > design, but throwing Exadata at it did solve the problem, at an
97 > astonishing price.
98
99 I was at Collaborate last year and spoke to some of the guys from Oracle. (Not
100 going into specifics to protect their jobs). When asked if one of my customers
101 should be using Oracle RAC or Exadata, the answer came down to: "If you think
102 RAC might be sufficient, it usually is"
103
104 Exadata, however, is a really nice design. But throwing faster machines at a
105 problem should only be part of the solution.
106 I know someone who claims he can make a "standard" Oracle database outperform
107 an Exadata database. That claim is based on the (usually true) assumption that
108 databases are not designed for performance.
109 Mind, if the same tricks would be done on an Exadata environment, you'd see
110 phenominal performance.
111
112 > Neither my employer nor the big software provider
113 > in question is likely to attract top-notch DB talent (indeed, mine has
114 > steadily gotten rid of anybody who knows how to do anything in Oracle
115 > beyond creating schemas it seems,
116
117 Actively? Or by simply letting the good ones go while replacing them with
118 someone less clued up?
119
120 > though I can only imagine how much
121 > they pay annually in their license fees; and yes, I'm sure 99.9% of
122 > what they use Oracle (or SQL Server) for would work just fine in
123 > Postgres).
124
125 That is my feeling as well. The problem is that the likes of Informatica (one
126 of the leading ETL software vendors) don't actually support PostgreSQL. That
127 is a bit of a downside. I'd need to use ODBC (yes, that also works on non-MS
128 Windows) to connect.
129
130 > > Only if you're a big (as in, spend a lot of money with them) customer.
131 >
132 > So, we are that (and I think a few of our IT execs used to be Oracle
133 > employees, which I'm sure isn't hurting their business).
134
135 I actually didn't join Oracle. I did, however, used to work for one of the
136 companies Oracle bought. I decided not to wait for the inevitable job cuts. In
137 hindsight, that one wasn't too bad as they actually kept that part for nearly
138 8 years.
139
140 > I'll admit
141 > that Uber might not get the same attention. Seems like Oracle is the
142 > solution at work from everything to software that runs the entire
143 > company to software that hosts one table for 10 employees (well, when
144 > somebody notices and gets it out of Access).
145
146 Don't forget the Finance departments. They tend to use Excel files for
147 everything.
148
149 > Well, unless it involves
150 > an MS-oriented dev or Sharepoint, in which case somebody inevitably
151 > wants it on SQL Server. I did mention that we're not a world-class IT
152 > shop, didn't I?
153
154 I won't actually name companies, but I've seen plenty of big ones that would
155 fit your description. So not sure what a "world-class" IT shop would look like
156 when having to deal with the internal politics, bureaucracy and procedures
157 that come as standard with big companies.
158
159 --
160 Joost

Replies

Subject Author
Re: [gentoo-user] PostgreSQL Vs MySQL @Uber Rich Freeman <rich0@g.o>