1 |
On Tue, Aug 19, 2014 at 5:34 AM, J. Roeleveld <joost@××××××××.org> wrote: |
2 |
> On Monday, August 18, 2014 10:53:51 AM Alec Ten Harmsel wrote: |
3 |
>> On Mon 18 Aug 2014 10:50:23 AM EDT, Rich Freeman wrote: |
4 |
>> > Hadoop is a very specialized tool. It does what it does very well, |
5 |
>> > but if you want to use it for something other than map/reduce then |
6 |
>> > consider carefully whether it is the right tool for the job. |
7 |
>> |
8 |
>> Agreed; unless you have decent hardware and can comfortably measure |
9 |
>> your data in TB, it'll be quicker to use something else once you factor |
10 |
>> in the administration time and learning curve. |
11 |
> |
12 |
> The benefit of clustering technologies is that you don't need high-end |
13 |
> hardware to start with. You can use the old hardware you found collecting dust |
14 |
> in the basement. |
15 |
> |
16 |
> The learning curve isn't as steep as it used to be. There are plenty of tools |
17 |
> to make it easier to start using Hadoop. |
18 |
> |
19 |
|
20 |
As long as you're counting words and don't mind coding everything in Java. :) |
21 |
|
22 |
I found that if you want to avoid using Java, then the available |
23 |
documentation plummets, and I'm pretty sure the version I was |
24 |
attempting to use was buggy - it was losing records in the sort/reduce |
25 |
phase I believe. Or perhaps I was just using it incorrectly, but the |
26 |
same exact code worked just fine when I ran it on a single host with a |
27 |
smaller dataset and just piped map | sort | reduce without using |
28 |
Hadoop. The documentation was pretty sparse on how to get Hadoop to |
29 |
work via stdin/out with non-Java code and it is quite possible I |
30 |
wasn't quite doing things right. In the end my problem wasn't big |
31 |
enough to necessitate using Hadoop and I used GNU parallel instead. |
32 |
|
33 |
-- |
34 |
Rich |