Gentoo Archives: gentoo-user

From: "J. Roeleveld" <joost@××××××××.org>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Clusters on Gentoo ?
Date: Mon, 18 Aug 2014 14:31:35
Message-Id: 1855316.WFR9YJczUb@andromeda
In Reply to: Re: [gentoo-user] Clusters on Gentoo ? by thegeezer
1 On Sunday, August 17, 2014 08:46:58 PM thegeezer wrote:
2 > there are many way to do clustering and one thing that i would consider
3 > a "holy grail" would be something like pvm [1]
4 > because nothing else seems to have similar horizontal scaling of cpu at
5 > the kernel level
6
7 PVM, from the webpage, looks more like a pre-built VM. Not some kernel module
8 that distributes existing code to different nodes.
9 This kind of clustering also has no benefit for most uses. You really need to
10 design your tasks for these kind of environments.
11
12 > i would love to know the mechanism behind dell's equallogic san as it
13 > really is clustered lvm on steroids.
14 > GFS / orangefs / ocfs are not the easiest things to setup (ocfs is) and
15 > i've not found performance to be so great for writes.
16
17 I have seen weird issues when using Oracle's filesystems for anything not
18 Oracle. How important is reliability?
19
20 > DRBD is only 2 devices as far as i understand, so not really super scalable
21 > i'm still not convinced over the likes of hadoop for storage, maybe i
22 > just don't have the scale to "get" it?
23
24 I wouldn't use Hadoop for storage of files. It's only useful if you have a lot
25 (and I do mean a LOT) of data where a query only returns a very small amount.
26 Performance of a Hadoop cluster is high because the same query is sent to all
27 nodes at once and the answers get merged into a single answer along the way
28 back to the requestor. I don't see it as a valid system to actually store
29 important data you do not want to risk losing.
30
31 > the thing with clusters is that you want to be able to spin an extra
32 > node up and join it to the group and then you increase cpu / storage by
33 > n+1 but also you want to be able to spin nodes down dynamically and go
34 > down by n-1. i guess this is where hadoop is of benefit because that is
35 > not a happy thing for a typical file system.
36
37 Not necessary. That is only one way to use a cluster.
38 It's also an "easy" and "cheap" method of increasing the available processing
39 power. This only works properly if the tasks can be distributed over multiple
40 nodes easily. Having the option to quickly add and remove nodes make it
41 difficult to keep the data consistent. Especially Hadoop prefers the nodes to
42 stay available as there is no single node containing all the data. There is
43 some redundancy, but remove a few nodes and you can easily loose data.
44
45 > network load balancing is super easy, all info required is in each
46 > packet -- application load balancing requires more thought.
47 > this is where the likes of memcached can help but also why a good design
48 > of the cluster is better. localised data and tiered access etc... kind
49 > of why i would like to see a pvm kind of solution -- so that a page
50 > fault is triggered like swap memory which then fetches the relevant
51 > memory from the network:
52
53 That is going to kill performance...
54 Have a look into NUMA. It's always best to have the data where it is being
55 processed. Either by moving the data to the processing unit, or by using a
56 processing unit local to the data.
57 Moving data is always expensive with regards to performance.
58
59 This is how Hadoop clusters work, the data is processed on the node actually
60 having the data. The result (which is often less then 1% of the source-data)
61 is then sent over the network to another node, which, at this stage, merges
62 the result and passes it to another node. This then continues until all the
63 results are merged into a single result-set which is then returned to the
64 requesting application.
65
66 > bearing in mind that a computer can typically
67 > trigger thousands of page faults a second and that memory access is very
68 > very many times faster than gigabit networking!
69 >
70 > [1] http://www.csm.ornl.gov/pvm/pvm_home.html
71
72 Looks nice, but is not going to help with performance if the application is
73 not designed for distributed processing.
74
75 --
76 Joost

Replies

Subject Author
Re: [gentoo-user] Clusters on Gentoo ? Rich Freeman <rich0@g.o>
Re: [gentoo-user] Clusters on Gentoo ? thegeezer <thegeezer@×××××××××.net>