Gentoo Archives: gentoo-user

From: "J. Roeleveld" <joost@××××××××.org>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Clusters on Gentoo ?
Date: Tue, 19 Aug 2014 10:59:16
Message-Id: 4635101.9kPSixBvRE@andromeda
In Reply to: Re: [gentoo-user] Clusters on Gentoo ? by Rich Freeman
1 On Tuesday, August 19, 2014 06:33:29 AM Rich Freeman wrote:
2 > On Tue, Aug 19, 2014 at 5:34 AM, J. Roeleveld <joost@××××××××.org> wrote:
3 > > On Monday, August 18, 2014 10:53:51 AM Alec Ten Harmsel wrote:
4 > >> On Mon 18 Aug 2014 10:50:23 AM EDT, Rich Freeman wrote:
5 > >> > Hadoop is a very specialized tool. It does what it does very well,
6 > >> > but if you want to use it for something other than map/reduce then
7 > >> > consider carefully whether it is the right tool for the job.
8 > >>
9 > >> Agreed; unless you have decent hardware and can comfortably measure
10 > >> your data in TB, it'll be quicker to use something else once you factor
11 > >> in the administration time and learning curve.
12 > >
13 > > The benefit of clustering technologies is that you don't need high-end
14 > > hardware to start with. You can use the old hardware you found collecting
15 > > dust in the basement.
16 > >
17 > > The learning curve isn't as steep as it used to be. There are plenty of
18 > > tools to make it easier to start using Hadoop.
19 >
20 > As long as you're counting words and don't mind coding everything in Java.
21 > :)
22 >
23 > I found that if you want to avoid using Java, then the available
24 > documentation plummets, and I'm pretty sure the version I was
25 > attempting to use was buggy - it was losing records in the sort/reduce
26 > phase I believe. Or perhaps I was just using it incorrectly, but the
27 > same exact code worked just fine when I ran it on a single host with a
28 > smaller dataset and just piped map | sort | reduce without using
29 > Hadoop. The documentation was pretty sparse on how to get Hadoop to
30 > work via stdin/out with non-Java code and it is quite possible I
31 > wasn't quite doing things right. In the end my problem wasn't big
32 > enough to necessitate using Hadoop and I used GNU parallel instead.
33
34 No need for Java knowledge to develop against Hadoop.
35 A commercial product:
36 http://www.informatica.com/Images/01603_powerexchange-for-hadoop_ds_en-US.pdf
37 Nice and easy graphical interface. The same "code" that works against a
38 relational database also works with Hadoop. The tool does the translation.
39
40 I would be surprised if there are no other tools that can make it easier to
41 develop code to work with Hadoop. I just haven't had the reason to search for
42 those yet.
43
44 --
45 Joost