1 |
Joost Roeleveld <joost <at> antarean.org> writes: |
2 |
|
3 |
|
4 |
> > Mesos looks promising for a variety of (Apache) reasons. Some key |
5 |
> > technologies folks may want google about that are related: |
6 |
> > |
7 |
> > Quincy (fair schedular) |
8 |
> > Chronos (scheduler) |
9 |
> > Hadoop (scheduler) |
10 |
> |
11 |
> Hadoop not a scheduler. It's a framework for a Big Data clustered |
12 |
> database. |
13 |
|
14 |
> > HDFS (clusterd file system) |
15 |
> Unless it's changed recently, not suitable for anything else then Hadoop |
16 |
> and contains a single point of failure. |
17 |
|
18 |
I'm curious as to more information about this 'single point of failure. Can |
19 |
you be more specific or provides links? |
20 |
|
21 |
On this resource: |
22 |
|
23 |
http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html |
24 |
|
25 |
JournalNode machines talks about surviving faults: |
26 |
|
27 |
"increase the number of failures the system can tolerate, you should run an |
28 |
odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N |
29 |
JournalNodes, the system can tolerate at most (N - 1) / 2 failures and |
30 |
continue to function normally. " |
31 |
|
32 |
> |
33 |
> > http://gpo.zugaina.org/sys-cluster/apache-hadoop-common |
34 |
> > |
35 |
> > Zookeeper (Fault tolerance) |
36 |
> > SPARK ( optimized for interative jobs where a datase is resued in many |
37 |
> > parallel operations (advanced math/science and many other apps.) |
38 |
> > https://spark.apache.org/ |
39 |
> > |
40 |
> > Dryad Torque Mpiche2 MPI |
41 |
> > Globus tookit |
42 |
> > |
43 |
> > mesos_tech_report.pdf |
44 |
> > |
45 |
> > It looks as though Amazon, google, facebook and many others |
46 |
> > large in the Cluster/Cloud arena are using Mesos......? |
47 |
> > |
48 |
> > So let's all post what we find, particularly in overlays. |
49 |
> |
50 |
> Unless you are dealing with Big Data projects, like Google, Facebook, |
51 |
Amazon, big banks,... you don't have much use for those projects. |
52 |
|
53 |
Many scientific applications are using the cluster (cloud) or big data |
54 |
approach to all sorts of problems. Furthermore, as GPU and the new |
55 |
Arm systems with dozens and dozens of cpu cores inside one computer become |
56 |
readily available, the cluster-cloud (big data) approach will become much |
57 |
more pervasive in the next few years, imho. |
58 |
|
59 |
http://blog.rescale.com/reservoir-simulation-moves-to-the-cloud/ |
60 |
|
61 |
There are thousands of small companies needing reservoir simulation, not to |
62 |
mention the millions of folks working on carbon sequestration..... |
63 |
Anything to do with Biological or Chemical Science is using or moving |
64 |
to the Cloud-Clustered world. For me, a Cluster is just a cloud internally |
65 |
managee, rather than outsourcing it to others; ymmv. |
66 |
|
67 |
|
68 |
> Mesos looks like a nice project, just like Hadoop and related are also |
69 |
> nice. But for most people, they are as usefull as using Exalytics. |
70 |
|
71 |
I'm not excited about an Oracle solution to anything. Many of the folks |
72 |
I know consult on moving technologies away from Oracle's spear of influence, |
73 |
not limited to mysql; ymmv. I know of one very large communications company |
74 |
that went broke and had to merge because of those ridiculous Oracle fees. |
75 |
Caveat Emptor; long live Postresql. |
76 |
|
77 |
|
78 |
> A scheduler should not have a large set of dependencies that you wouldn't |
79 |
> use otherwise. That makes Chronos a non-option to me. |
80 |
|
81 |
Those other technologies are often useful to folks who would be attracted to |
82 |
something like chronos. |
83 |
|
84 |
> Martin's project looks promising, but doesn't store the schedules |
85 |
> internally. For repeating schedules, like what Alan was describing, you |
86 |
> need to put those into scripts and start those from an existing cron. |
87 |
> Of the 2, I think improving Martin's project is the most likely option |
88 |
> for me as it doesn't have additional dependencies and seems to be |
89 |
> easily implemented. |
90 |
> Joost |
91 |
|
92 |
Understood. |
93 |
Like others, I'll be curious to follow what develops out of Martin's work. |
94 |
|
95 |
For me Chronos, Mesos and the other aforementioned technologies look to be |
96 |
more viable; particularly if one is preparing for a clustered world with |
97 |
CPUs, GPUs, SoCs and Arm machines distributed about the ethernet as |
98 |
resources to be scheduled and utilized in a variety of schema. It's the |
99 |
quest for one-infrastructure to solve many problems where scenarios compete. |
100 |
|
101 |
Big data is not the only reason for cloud-clusters. Theoretically, |
102 |
(Clustered) systems can have a far greater resource utilization of networked |
103 |
resources than traditional (distributed) approaches. I grant you that this |
104 |
is a work in progress, but I personally know of dozens of mathematically |
105 |
complex distributed systems that are migrating to the clustered approach |
106 |
rather than something custom or traditionally distributed. |
107 |
|
108 |
Granted, Cloud <--> Clustered <--> Distributed are all overlaping approaches |
109 |
to big problems. I do appreciate the candor of this thread. |
110 |
|
111 |
|
112 |
James |