Gentoo Archives: gentoo-user

From:	James <wireless@×××××××××××.com>
To:	gentoo-user@l.g.o
Subject:	[gentoo-user] Re: Recommendations for scheduler
Date:	Tue, 05 Aug 2014 19:58:28
Message-Id:	`loom.20140805T211048-158@post.gmane.org`
In Reply to:	Re: [gentoo-user] Re: Recommendations for scheduler by Joost Roeleveld

1	Joost Roeleveld <joost <at> antarean.org> writes:
2
3
4	> > Mesos looks promising for a variety of (Apache) reasons. Some key
5	> > technologies folks may want google about that are related:
6	> >
7	> > Quincy (fair schedular)
8	> > Chronos (scheduler)
9	> > Hadoop (scheduler)
10	>
11	> Hadoop not a scheduler. It's a framework for a Big Data clustered
12	> database.
13
14	> > HDFS (clusterd file system)
15	> Unless it's changed recently, not suitable for anything else then Hadoop
16	> and contains a single point of failure.
17
18	I'm curious as to more information about this 'single point of failure. Can
19	you be more specific or provides links?
20
21	On this resource:
22
23	http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
24
25	JournalNode machines talks about surviving faults:
26
27	"increase the number of failures the system can tolerate, you should run an
28	odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N
29	JournalNodes, the system can tolerate at most (N - 1) / 2 failures and
30	continue to function normally. "
31
32	>
33	> > http://gpo.zugaina.org/sys-cluster/apache-hadoop-common
34	> >
35	> > Zookeeper (Fault tolerance)
36	> > SPARK ( optimized for interative jobs where a datase is resued in many
37	> > parallel operations (advanced math/science and many other apps.)
38	> > https://spark.apache.org/
39	> >
40	> > Dryad Torque Mpiche2 MPI
41	> > Globus tookit
42	> >
43	> > mesos_tech_report.pdf
44	> >
45	> > It looks as though Amazon, google, facebook and many others
46	> > large in the Cluster/Cloud arena are using Mesos......?
47	> >
48	> > So let's all post what we find, particularly in overlays.
49	>
50	> Unless you are dealing with Big Data projects, like Google, Facebook,
51	Amazon, big banks,... you don't have much use for those projects.
52
53	Many scientific applications are using the cluster (cloud) or big data
54	approach to all sorts of problems. Furthermore, as GPU and the new
55	Arm systems with dozens and dozens of cpu cores inside one computer become
56	readily available, the cluster-cloud (big data) approach will become much
57	more pervasive in the next few years, imho.
58
59	http://blog.rescale.com/reservoir-simulation-moves-to-the-cloud/
60
61	There are thousands of small companies needing reservoir simulation, not to
62	mention the millions of folks working on carbon sequestration.....
63	Anything to do with Biological or Chemical Science is using or moving
64	to the Cloud-Clustered world. For me, a Cluster is just a cloud internally
65	managee, rather than outsourcing it to others; ymmv.
66
67
68	> Mesos looks like a nice project, just like Hadoop and related are also
69	> nice. But for most people, they are as usefull as using Exalytics.
70
71	I'm not excited about an Oracle solution to anything. Many of the folks
72	I know consult on moving technologies away from Oracle's spear of influence,
73	not limited to mysql; ymmv. I know of one very large communications company
74	that went broke and had to merge because of those ridiculous Oracle fees.
75	Caveat Emptor; long live Postresql.
76
77
78	> A scheduler should not have a large set of dependencies that you wouldn't
79	> use otherwise. That makes Chronos a non-option to me.
80
81	Those other technologies are often useful to folks who would be attracted to
82	something like chronos.
83
84	> Martin's project looks promising, but doesn't store the schedules
85	> internally. For repeating schedules, like what Alan was describing, you
86	> need to put those into scripts and start those from an existing cron.
87	> Of the 2, I think improving Martin's project is the most likely option
88	> for me as it doesn't have additional dependencies and seems to be
89	> easily implemented.
90	> Joost
91
92	Understood.
93	Like others, I'll be curious to follow what develops out of Martin's work.
94
95	For me Chronos, Mesos and the other aforementioned technologies look to be
96	more viable; particularly if one is preparing for a clustered world with
97	CPUs, GPUs, SoCs and Arm machines distributed about the ethernet as
98	resources to be scheduled and utilized in a variety of schema. It's the
99	quest for one-infrastructure to solve many problems where scenarios compete.
100
101	Big data is not the only reason for cloud-clusters. Theoretically,
102	(Clustered) systems can have a far greater resource utilization of networked
103	resources than traditional (distributed) approaches. I grant you that this
104	is a work in progress, but I personally know of dozens of mathematically
105	complex distributed systems that are migrating to the clustered approach
106	rather than something custom or traditionally distributed.
107
108	Granted, Cloud <--> Clustered <--> Distributed are all overlaping approaches
109	to big problems. I do appreciate the candor of this thread.
110
111
112	James

Replies

Subject	Author
Re: [gentoo-user] Re: Recommendations for scheduler	"J. Roeleveld" <joost@××××××××.org>

Report Message

Find on MARC Find on Google Groups