I recently had to learn Hadoop and getting Hadoop running on Windows Vista is not as straightforward as I thought. I believe getting Hadoop up and running on Linux is much more easier. I would like to use this blog to share my experiences and hopefully it will help the next person that is trying to do the same thing.
My goal was to be able to:
1) Run some of the examples that came with Hadoop
2) Run a MapReduce Java program in Eclipse and able to debug it
Hadoop comes with a set of shell scripts, so the first thing to do is download and install Cygwin. Since the scripts were written on some Unix variation, they will not work out of the box. When I tried ‘bin/hadoop’ command, I got the following:
./bin/hadoop: line 18: $'\r': command not found
: No such file or directory./bin
./bin/hadoop: line 21: $'\r': command not found
: No such file or directorydrive/c/tools/hadoop-0.14.4
Apparently this is a common problem and it is related to new line differences between Windows and Unix. Windows uses two characters (\r\n) and Unix uses on character (\n). Here is a link to a solution. Basically you need to run comand ‘dos2unix’ on the Hadoop’s scripts or use your favoriate Unix command to strip out the ‘\r’, i.e sed $’/\r//g’ <file name>
Once this problem is resolved, then I was able to run the examples that came with Hadoop like WordCount or Grep inside Cygwin shell.
My second goal was to run one of the Hadoop’s example inside Eclipse. When I tried to this I got an exception while Hadoop is trying to create a process – ‘CreateProcess error=2’ and the command is something like ‘df -k’.
So it was frustrating because I was able to the examples in Cygwin shell. It turned out the MapReduce framework is trying to execute a command ‘df -k’. Once I added the Cygwin path to Vista PATH environment variable, then this problem went away. It was great!! Now I can actually step throug the code. As a developer, this is very important.
I am looking forward to sharing my Hadoop experience as I learn more about it.
Hadoop is a very powerful piece of technology and often power comes with complexity.