Setting Up Hadoop On Windows

I recently had to learn Hadoop and getting Hadoop running on Windows Vista is not as straightforward as I thought.  I believe getting Hadoop up and running on Linux is much more easier.  I would like to use this blog to share my experiences and hopefully it will help the next person that is trying to do the same thing.

My goal was to be able to:
1) Run some of the examples that came with Hadoop
2) Run a MapReduce Java program in Eclipse and able to debug it

First Problem:

Hadoop comes with a set of shell scripts, so the first thing to do is download and install Cygwin.  Since the scripts were written on some Unix variation, they will not work out of the box.  When I tried ‘bin/hadoop’ command, I got the following:

./bin/hadoop: line 18: $'\r': command not found
: No such file or directory./bin
./bin/hadoop: line 21: $'\r': command not found
: No such file or directorydrive/c/tools/hadoop-0.14.4

Apparently this is a common problem and it is related to new line differences between Windows and Unix.  Windows uses two characters (\r\n) and Unix uses on character (\n).  Here is a link to a solution.  Basically you need to run comand ‘dos2unix’ on the Hadoop’s scripts or use your favoriate Unix command to strip out the ‘\r’, i.e sed $’/\r//g’ <file name>

Once this problem is resolved, then I was able to run the examples that came with Hadoop like WordCount or Grep inside Cygwin shell.

Second Problem:

My second goal was to run one of the Hadoop’s example inside Eclipse.   When I tried to this I got an exception while Hadoop is trying to create a process – ‘CreateProcess error=2’ and the command is something like ‘df -k’.

So it was frustrating because I was able to the examples in Cygwin shell.  It turned out the MapReduce framework is trying to execute a command ‘df -k’.  Once I added the Cygwin path to Vista PATH environment variable, then this problem went away.  It was great!! Now I can actually step throug the code.  As a developer, this is very important.

I am looking forward to sharing my Hadoop experience as I learn more about it.

Hadoop is a very powerful piece of technology and often power comes with complexity.

This entry was posted in Distributed Programming and tagged . Bookmark the permalink.

5 Responses to Setting Up Hadoop On Windows

  1. Nicholas says:

    Thank you for your article! I’ve just started to study Hadoop and I have an experience of it’s setting up on the Ubuntu Linux. I don’t have enough time to hack the Win & Hadoop, so your article is very useful for me 🙂

  2. Niels Basjes says:

    Thanks for the tip.
    Simply adding Cygwin to the path solved the problem for me too.
    Niels Basjes

  3. Vlad Korolev says:

    Here is a tutorial on how to setup the Hadoop 0.19.1 Cluster on Windows using Cygwin and Eclipse

  4. Vlad Korolev says:

    Sorry didn’t post the link in the first post

    Here is a tutorial on how to setup the Hadoop 0.19.1 Cluster on Windows using Cygwin and Eclipse

  5. Sanu says:

    How to debug the examples

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s