Gentoo Archives: gentoo-user

From: "J. Roeleveld" <joost@××××××××.org>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Re: Recommendations for scheduler
Date: Tue, 05 Aug 2014 20:43:54
Message-Id: 144d00d7-f17e-4dd0-b34a-21a5c9a0abf0@email.android.com
In Reply to: [gentoo-user] Re: Recommendations for scheduler by James
1 On 5 August 2014 21:57:56 CEST, James <wireless@×××××××××××.com> wrote:
2 >Joost Roeleveld <joost <at> antarean.org> writes:
3 >
4 >
5 >> > Mesos looks promising for a variety of (Apache) reasons. Some key
6 >> > technologies folks may want google about that are related:
7 >> >
8 >> > Quincy (fair schedular)
9 >> > Chronos (scheduler)
10 >> > Hadoop (scheduler)
11 >>
12 >> Hadoop not a scheduler. It's a framework for a Big Data clustered
13 >> database.
14 >
15 >> > HDFS (clusterd file system)
16 >> Unless it's changed recently, not suitable for anything else then
17 >Hadoop
18 >> and contains a single point of failure.
19 >
20 >I'm curious as to more information about this 'single point of failure.
21 >Can
22 >you be more specific or provides links?
23 >
24 >On this resource:
25 >
26 >http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
27 >
28 >JournalNode machines talks about surviving faults:
29 >
30 >"increase the number of failures the system can tolerate, you should
31 >run an
32 >odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N
33 >JournalNodes, the system can tolerate at most (N - 1) / 2 failures and
34 >continue to function normally. "
35
36 Just read that part. Looks like they solved it partly since 2.2.
37 The problem lies with the NameNodes.
38 Prior to 2.2, you only had 1. If that one dies, you loose the entire cluster. If that one is unrecoverable, you loose all the data.
39
40 After 2.2, you can configure a standby NameNode. However, it still requires manual restart.
41
42 Considering that Hadoop is most often running on old machines, chances for hardware failure are higher when compared with clusters using newer hardware.
43
44 I'm not sure how other cluster FSs deal with this, but I consider it a design flaw if the disappearance of a single machine in a 100+ node cluster dies, the entire cluster ends up in a broken state.
45 It's like running a single Raid5 with 100+ drives.
46 Anyone stupid enough to do that deserves to loose their data.
47
48 >> > http://gpo.zugaina.org/sys-cluster/apache-hadoop-common
49 >> >
50 >> > Zookeeper (Fault tolerance)
51 >> > SPARK ( optimized for interative jobs where a datase is resued in
52 >many
53 >> > parallel operations (advanced math/science and many other apps.)
54 >> > https://spark.apache.org/
55 >> >
56 >> > Dryad Torque Mpiche2 MPI
57 >> > Globus tookit
58 >> >
59 >> > mesos_tech_report.pdf
60 >> >
61 >> > It looks as though Amazon, google, facebook and many others
62 >> > large in the Cluster/Cloud arena are using Mesos......?
63 >> >
64 >> > So let's all post what we find, particularly in overlays.
65 >>
66 >> Unless you are dealing with Big Data projects, like Google, Facebook,
67 >Amazon, big banks,... you don't have much use for those projects.
68 >
69 >Many scientific applications are using the cluster (cloud) or big data
70 >approach to all sorts of problems. Furthermore, as GPU and the new
71 >Arm systems with dozens and dozens of cpu cores inside one computer
72 >become
73 >readily available, the cluster-cloud (big data) approach will become
74 >much
75 >more pervasive in the next few years, imho.
76 >
77 >http://blog.rescale.com/reservoir-simulation-moves-to-the-cloud/
78 >
79 >There are thousands of small companies needing reservoir simulation,
80 >not to
81 >mention the millions of folks working on carbon sequestration.....
82 >Anything to do with Biological or Chemical Science is using or moving
83 >to the Cloud-Clustered world. For me, a Cluster is just a cloud
84 >internally
85 >managee, rather than outsourcing it to others; ymmv.
86
87 My apologies. I forgot the scientific research here. But that was mostly because they have been dealing with really large datasets and corresponding large compute clusters for decades.
88
89 The term Big Data is generally applied to financial and social media data.
90
91 >> Mesos looks like a nice project, just like Hadoop and related are
92 >also
93 >> nice. But for most people, they are as usefull as using Exalytics.
94 >
95 >I'm not excited about an Oracle solution to anything. Many of the folks
96 >I know consult on moving technologies away from Oracle's spear of
97 >influence,
98 >not limited to mysql; ymmv. I know of one very large communications
99 >company
100 >that went broke and had to merge because of those ridiculous Oracle
101 >fees.
102 >Caveat Emptor; long live Postresql.
103
104 I'd be interested in the name of that company. Even offlist.
105
106 And I definitely agree. PostgreSQL is often a valid alternative. Unfortunately, it is rarely possible to use it as a back end to enterprise software as these are all designed to be used with databases from the usual suspects (Oracle, IBM, Microsoft, ....)
107
108 Same goes for OSS projects. The developers are often unable to properly code the SQL layer and end up simply using MySQL and its broken SQL implementation.
109
110 >> A scheduler should not have a large set of dependencies that you
111 >wouldn't
112 >> use otherwise. That makes Chronos a non-option to me.
113 >
114 >Those other technologies are often useful to folks who would be
115 >attracted to
116 >something like chronos.
117
118 If you already use Mesos, using Chronos makes sense.
119 If you're only interested in a scheduler, installing Mesos just to use Chronos doesn't make sense.
120
121 >> Martin's project looks promising, but doesn't store the schedules
122 >> internally. For repeating schedules, like what Alan was describing,
123 >you
124 >> need to put those into scripts and start those from an existing cron.
125 >> Of the 2, I think improving Martin's project is the most likely
126 >option
127 >> for me as it doesn't have additional dependencies and seems to be
128 >> easily implemented.
129 >> Joost
130 >
131 >Understood.
132 >Like others, I'll be curious to follow what develops out of Martin's
133 >work.
134
135 I believe Martin's scheduler will be very valuable. Even for me.
136 I am very likely going to start using this for some of my regular maintenance activities on the home network.
137
138 But as the rest of the thread shows, I wouldn't be able to use it as a scheduler for large projects where the schedules can get very complex very quickly.
139
140 The type of scheduler needed for these requires a different approach, which would be overkill for the home network environment where Martin's excels.
141
142 >For me Chronos, Mesos and the other aforementioned technologies look to
143 >be
144 >more viable; particularly if one is preparing for a clustered world
145 >with
146 >CPUs, GPUs, SoCs and Arm machines distributed about the ethernet as
147 >resources to be scheduled and utilized in a variety of schema. It's the
148 >quest for one-infrastructure to solve many problems where scenarios
149 >compete.
150
151 I fully agree, see my comment above where I state Chronos makes sense when Mesos does as well.
152
153 >Big data is not the only reason for cloud-clusters. Theoretically,
154 >(Clustered) systems can have a far greater resource utilization of
155 >networked
156 >resources than traditional (distributed) approaches. I grant you that
157 >this
158 >is a work in progress, but I personally know of dozens of
159 >mathematically
160 >complex distributed systems that are migrating to the clustered
161 >approach
162 >rather than something custom or traditionally distributed.
163
164 I still remember running seti@home and similar programs in the past. Those were large clusters, but with a very badly designed network.
165
166 There is a use-case for large well integrated clusters, loosely coupled clusters and big machines.
167
168 Here is the difference between horizontal (many machines) and vertical (1 really big machine) clustering.
169 The vertical only has clustering between different processes.
170
171 >Granted, Cloud <--> Clustered <--> Distributed are all overlaping
172 >approaches
173 >to big problems. I do appreciate the candor of this thread.
174
175 They are. It started with distributed computing in a lab, then moved onto the internet.
176 Then people started to build a mini internet with a lot of old computers and Clusters were born.
177 Then that ended up back on the internet with clusters being made accessible online. And this is what is considered to be " The Cloud".
178 If you take the general definition of The Cloud, which is along the lines off: "being able to access your data anywhere using any device", running your own server and being able to access the data on there from anywhere with internet access using your laptop, smartphone, tablet,.... then you are using the cloud.
179
180 If anyone is actually planning to implement Mesos and Chronos on Gentoo, I would be interested in joining the effort as it does sound like fun. I just don't have the time to do a lot of work on that at the moment.
181
182 --
183 Joost
184
185
186 --
187 Sent from my Android device with K-9 Mail. Please excuse my brevity.

Replies

Subject Author
Re: [gentoo-user] Re: Recommendations for scheduler Alan McKinnon <alan.mckinnon@×××××.com>
Re: [gentoo-user] Re: Recommendations for scheduler Peter Humphrey <peter@××××××××××××.uk>