1 |
On Tuesday, August 19, 2014 06:33:29 AM Rich Freeman wrote: |
2 |
> On Tue, Aug 19, 2014 at 5:34 AM, J. Roeleveld <joost@××××××××.org> wrote: |
3 |
> > On Monday, August 18, 2014 10:53:51 AM Alec Ten Harmsel wrote: |
4 |
> >> On Mon 18 Aug 2014 10:50:23 AM EDT, Rich Freeman wrote: |
5 |
> >> > Hadoop is a very specialized tool. It does what it does very well, |
6 |
> >> > but if you want to use it for something other than map/reduce then |
7 |
> >> > consider carefully whether it is the right tool for the job. |
8 |
> >> |
9 |
> >> Agreed; unless you have decent hardware and can comfortably measure |
10 |
> >> your data in TB, it'll be quicker to use something else once you factor |
11 |
> >> in the administration time and learning curve. |
12 |
> > |
13 |
> > The benefit of clustering technologies is that you don't need high-end |
14 |
> > hardware to start with. You can use the old hardware you found collecting |
15 |
> > dust in the basement. |
16 |
> > |
17 |
> > The learning curve isn't as steep as it used to be. There are plenty of |
18 |
> > tools to make it easier to start using Hadoop. |
19 |
> |
20 |
> As long as you're counting words and don't mind coding everything in Java. |
21 |
> :) |
22 |
> |
23 |
> I found that if you want to avoid using Java, then the available |
24 |
> documentation plummets, and I'm pretty sure the version I was |
25 |
> attempting to use was buggy - it was losing records in the sort/reduce |
26 |
> phase I believe. Or perhaps I was just using it incorrectly, but the |
27 |
> same exact code worked just fine when I ran it on a single host with a |
28 |
> smaller dataset and just piped map | sort | reduce without using |
29 |
> Hadoop. The documentation was pretty sparse on how to get Hadoop to |
30 |
> work via stdin/out with non-Java code and it is quite possible I |
31 |
> wasn't quite doing things right. In the end my problem wasn't big |
32 |
> enough to necessitate using Hadoop and I used GNU parallel instead. |
33 |
|
34 |
No need for Java knowledge to develop against Hadoop. |
35 |
A commercial product: |
36 |
http://www.informatica.com/Images/01603_powerexchange-for-hadoop_ds_en-US.pdf |
37 |
Nice and easy graphical interface. The same "code" that works against a |
38 |
relational database also works with Hadoop. The tool does the translation. |
39 |
|
40 |
I would be surprised if there are no other tools that can make it easier to |
41 |
develop code to work with Hadoop. I just haven't had the reason to search for |
42 |
those yet. |
43 |
|
44 |
-- |
45 |
Joost |