1 |
On Wednesday, September 17, 2014 08:56:28 PM James wrote: |
2 |
> Alec Ten Harmsel <alec <at> alectenharmsel.com> writes: |
3 |
> > As far as HDFS goes, I would only set that up if you will use it for |
4 |
> > Hadoop or related tools. It's highly specific, and the performance is |
5 |
> > not good unless you're doing a massively parallel read (what it was |
6 |
> > designed for). I can elaborate why if anyone is actually interested. |
7 |
> |
8 |
> Acutally, from my research and my goal (one really big scientific |
9 |
simulation |
10 |
> running constantly). |
11 |
|
12 |
Out of curiosity, what do you want to simulate? |
13 |
|
14 |
> Many folks are recommending to skip Hadoop/HDFS all |
15 |
> together |
16 |
|
17 |
I agree, Hadoop/HDFS is for data analysis. Like building a profile about |
18 |
people based on the information companies like Facebook, Google, NSA, |
19 |
Walmart, Governments, Banks,.... collect about their |
20 |
customers/users/citizens/slaves/.... |
21 |
|
22 |
> and go straight to mesos/spark. RDD (in-memory) cluster |
23 |
> calculations are at the heart of my needs. The opposite end of the |
24 |
> spectrum, loads of small files and small apps; I dunno about, but, I'm all |
25 |
> ears. |
26 |
> In the end, my (3) node scientific cluster will morph and support |
27 |
> the typical myriad of networked applications, but I can take |
28 |
> a few years to figure that out, or just copy what smart guys like |
29 |
> you and joost do..... |
30 |
|
31 |
Nope, I'm simply following what you do and provide suggestions where I |
32 |
can. |
33 |
Most of the clusters and distributed computing stuff I do is based on |
34 |
adding machines to distribute the load. But the mechanisms for these are |
35 |
implemented in the applications I work with, not what I design underneath. |
36 |
|
37 |
The filesystems I am interested in are different to the ones you want. |
38 |
I need to provided access to software installation files to a VM server and |
39 |
access to documentation which is created by the users. |
40 |
The VM server is physically next to what I already mentioned as server A. |
41 |
Access to the VM from the remote site will be using remote desktop |
42 |
connections. |
43 |
But to allow faster and easier access to the documentation, I need a |
44 |
server B at the remote site which functions as described. |
45 |
AFS might be suitable, but I need to be able to layer Samba on top of that |
46 |
to allow a seamless operation. |
47 |
I don't want the laptops to have their own cache and then having to figure |
48 |
out how to solve the multiple different changes to documents containing |
49 |
layouts. (MS Word and OpenDocument files) |
50 |
|
51 |
> > We use Lustre for our high performance general storage. I don't have |
52 |
any |
53 |
> > numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB |
54 |
> > sounds familiar, but don't quote me on that). |
55 |
> |
56 |
> AT Umich, you guys should test the FhGFS/btrfs combo. The folks |
57 |
> at UCI swear about it, although they are only publishing a wee bit. |
58 |
> (you know, water cooler gossip)...... Surely the Wolverines do not |
59 |
> want those californians getting up on them? |
60 |
> |
61 |
> Are you guys planning a mesos/spark test? |
62 |
> |
63 |
> > > Personally, I would read up on these and see how they work. Then, |
64 |
> > > based on that, decide if they are likely to assist in the specific |
65 |
> > > situation you are interested in. |
66 |
> |
67 |
> It's a ton of reading. It's not apples-to-apple_cider type of reading. |
68 |
> My head hurts..... |
69 |
|
70 |
Take a walk outside. Clear air should help you with the headaches :P |
71 |
|
72 |
> I'm leaning to DFS/LFS |
73 |
> |
74 |
> (2) Luster/btrfs and FhGFS/btrfs |
75 |
> |
76 |
> Thoughts/comments? |
77 |
|
78 |
I have insufficient knowledge to advise on either of these. |
79 |
One question, why BTRFS instead of ZFS? |
80 |
|
81 |
My current understanding is: |
82 |
- ZFS is production ready, but due to licensing issues, not included in the |
83 |
kernel |
84 |
- BTRFS is included, but not yet production ready with all planned features |
85 |
|
86 |
For me, Raid6-like functionality is an absolute requirement and latest I |
87 |
know is that that isn't implemented in BTRFS yet. Does anyone know when |
88 |
that will be implemented and reliable? Eg. what time-frame are we talking |
89 |
about? |
90 |
|
91 |
-- |
92 |
Joost |