I recently had to learn Hadoop and getting Hadoop running on Windows Vista is not as straightforward as I thought. I believe getting Hadoop up and running on Linux is much more easier. I would like to use this blog to share my experiences and hopefully it will help the next person that is trying to do the same thing.
My goal was to be able to:
1) Run some of the examples that came with Hadoop
2) Run a MapReduce Java program in Eclipse and able to debug it
First Problem:
Hadoop comes with a set of shell scripts, so the first thing to do is download and install Cygwin. Since the scripts were written on some Unix variation, they will not work out of the box. When I tried ‘bin/hadoop’ command, I got the following:
./bin/hadoop
./bin/hadoop: line 18: $'\r': command not found
: No such file or directory./bin
./bin/hadoop: line 21: $'\r': command not found
: No such file or directorydrive/c/tools/hadoop-0.14.4
Apparently this is a common problem and it is related to new line differences between Windows and Unix. Windows uses two characters (\r\n) and Unix uses on character (\n). Here is a link to a solution. Basically you need to run comand ‘dos2unix’ on the Hadoop’s scripts or use your favoriate Unix command to strip out the ‘\r’, i.e sed $’/\r//g’ <file name>
Once this problem is resolved, then I was able to run the examples that came with Hadoop like WordCount or Grep inside Cygwin shell.
Second Problem:
My second goal was to run one of the Hadoop’s example inside Eclipse. When I tried to this I got an exception while Hadoop is trying to create a process – ‘CreateProcess error=2′ and the command is something like ‘df -k’.
So it was frustrating because I was able to the examples in Cygwin shell. It turned out the MapReduce framework is trying to execute a command ‘df -k’. Once I added the Cygwin path to Vista PATH environment variable, then this problem went away. It was great!! Now I can actually step throug the code. As a developer, this is very important.
I am looking forward to sharing my Hadoop experience as I learn more about it.
Hadoop is a very powerful piece of technology and often power comes with complexity.
October 9, 2008 at 11:15 pm |
Thank you for your article! I’ve just started to study Hadoop and I have an experience of it’s setting up on the Ubuntu Linux. I don’t have enough time to hack the Win & Hadoop, so your article is very useful for me
November 7, 2008 at 8:13 am |
Hi,
Thanks for the tip.
Simply adding Cygwin to the path solved the problem for me too.
Niels Basjes
March 20, 2009 at 7:19 pm |
Here is a tutorial on how to setup the Hadoop 0.19.1 Cluster on Windows using Cygwin and Eclipse
March 20, 2009 at 7:20 pm |
Sorry didn’t post the link in the first post
Here is a tutorial on how to setup the Hadoop 0.19.1 Cluster on Windows using Cygwin and Eclipse
http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html
July 10, 2009 at 6:58 am |
How to debug the examples