Simulating Network Problem Using iptables Command

October 19, 2009

In a distributed system where communication between services are happening at all time and at the same time network issues or hiccups occur frequently.  A well designed service consumer should able to handle the network connection or timeout issues in a graceful manner.  This issue is magnified to multiple folds if a service client is time bound and expects responses to come back in milliseconds.  So what are the general guidelines or best practices for dealing with the issues outlined above?

Here is a short list of possible solutions.  One must carefully pick the right solution for one’s specific application needs.

  • Exponential backoff – more info
  • Denied access gate with background pinging thread

Regardless of which solution is chosen, what is a good way to simulate the network issues.  This is where the “iptable” command comes in. “iptable” command is generally used by network administrators to administer the tables of IP packet filter rules.  The rule that is useful for our purpose is “DROP”, which drops the packets on the floor.    To simulate a network connection, we can setup a filter rule for a specific host such that all packets that are supposed to go this host will be dropped on the floor.

In short here is the command to set up such filter rule:

sudo iptables -A OUTPUT -p tcp -d <remote host ip> –dport <remote port> -j DROP

When testing is done, make sure to remove the packet filter rule with the following command:

sudo iptables -D OUTPUT -p tcp -d <remote host ip> –dport <remote port> -j DROP

Now we know how to simulate network connection issue and this should help in testing the connection issue error handling code.


Ten Things About Groovy Java Developers Should Know

July 15, 2009

In recent years, there has been an explosion of new scripting languages on the programming landscape and as a developer one needs to be alert of what they are and their strengths and weaknesses.  I recently started  reading about Groovy and below is a synopsis of what I just learned.

  1. Groovy is designed for Java developers and its foundation is the standard APIs of the Java Platform
  2. Features in Groovy are inspired by languages like Python, Ruby and Smalltalk
  3. Groovy brings advanced features such as closures, dynamic typing and the meta object protocol to Java platform
  4. Groovy is Java’s newest best friend
    • Seamless integration with Java Runtime Environment
    • Groovy syntax is aligned with Java
  5. Every Groovy type is a subtype of java.lang.Object
  6. A Groovy  class is a Java class
  7. XML handling in Groovy is so easy that make Java developers drool
  8. Groovy brings the fun of programming back to Java developers
  9. Groovy classes are compiled into Java bytecodes
  10. Groovy has integration with Ant and Maven and is well supported by major editors like Eclipse, IntelliJ and NetBeans

I am looking forward to dive into the some of Groovy advanced features.


Google App Engine for Java

April 20, 2009

By now I am sure everyone has heard that Google App Engine has supported Java and this is totally exciting for Java developers including myself.  Not only Java is supported, but one can use JPA to store/retrieve data from Google DataStore.  The current support has pretty much most of what one needs to build a useful Java web based application running on Google App Engine.

The next thing we need is some MVC framework for the presentation tier.  Surprisingly, there is a lot of folks trying to  use Spring MVC and Spring framework.  This seems to make sense because Spring Framework provides both web-tier support and back-send support like JPA integration and transaction management.  Maybe that’s why I haven’t seen many folks using Struts or JSF with Google App Engine.

There is a lot of buzz on google-appengine-java-group forum. Folks are trying out different technologies and exchanging tips and tricks.

As for myself, I ported the JDO Guestbook example to use Spring MVC + JPA and it is working just fine.  Check it out when you get a chance.

From what I have been seeing, the following technologies are working on GAE:

Spring MVC, Tile 2.0.7.

One of the most frustrating things that most folks encountered is that their GAE applications work on their local box and it doesn’t work when running on GAE production.  Most of the time it is related to security issue.  GAE allows a subset of JRE classes and their white list is on their site.

I have an application in mind and I will be busy in the next several weeks developing it.  Stay tune.


Small Stuff That Matters – Ubuntu File Associations

March 10, 2009

Often times there is a little thing that you would like to figure out how to do it and because it is little you tend to delay trying to figure out due to your busy schedule.

An example of this little thing is figuring out how to associate an application to a file based on file type on Ubuntu. In my case is PDF files. By default “Document Viewer” is used to display the content of a PDF file and it works, but I prefer to use Acrobat Reader. Of course the solution is already out there if you search for it, but what I am amazed at is people are really happy whey they found that solution and I know this because of their thank you notes.

Following the DRY principle, here is the link to the solution that I found.


Hiring & Acceptance Test

March 9, 2009

I came across this posting on a Yahoo! group recently and it was too good not to pass it along.  It is about writing an acceptance for hiring a senior developer in the spirit of Extreme Programming.


class SeniorDeveloperAcceptanceTest extends TestCase{
   Developer candidate;
   Collection team;

   public void setUp() {
      candidate = new Developer();
      team = YourCompany.getTeam();
   }

   public void testTechnicalSkills() {
      assertTrue(candidate.isJavaExpert());
      assertTrue(candidate.canDesignLargeApplication());
      assertTrue(candidate.canReduceTechnicalDebt());
      assertTrue(candidate.practiceTDD());
   }

   public void testTeachingSkills() {
      assertTrue(candidate.canImproveTeamSkills());
      assertTrue(candidate.canArgueAboutAgility());
   }

   public void testHumanBehavior() {
      assertTrue(candidate.canPairProgram());
   assertTrue(candidate.canIntegrateWith(team));
   assertTrue(candidate.hasPositiveAttitude());
   }

   public void testMethodologySkills() {
      assertTrue(candidate.knowExtremeProgramming());
      assertTrue(candidate.canImproveTeamVelocity());
   }
   }


Java Concurrency Synchronizers

November 5, 2008

Java Concurrency Utilities provides a number of powerful and high-performance threading utiities.  At the high level they can be grouped into four categories and this article will cover one of the categories.

  1. Thread Pools and Task Scheduling
  2. Concurrent Collections
  3. Locks and Synchronizers
  4. Atomic Variables

We all know that Java supports synchronization since day one through the synchronized keyword, but the limitation is that this mechanism works at the block level and is limited to a single thread at a time.  A number of new mechanisms were introduced.  Among them are semaphore, barrier, latch and exchanger.

Semaphore are used to control or limit the number of activities that can access a certain resource or perform a given action at the same time.  An easier way to understanding and remembering what a Semaphore is by associating Semaphore with permits.  A semaphore maintains a set of permits and a thread must acquire a permit from the semaphore before it can obtain a resource or perform a certain activity.  The permit is returned to the semaphore when thread is done accessing a resource or perform a certain activity.  If all the permits were already given out, then the next thread that asks fro a permit will be blocked.

Latch is used to allow one or more thread to wait for a set of threads to complete an action.  Once a latch is set, it never changes.  Latch is commonly used to coordinate certain threads and the common use case for latch is start several threads and have them wait until a signal is received from a coordinating thread.  Another example in a multiplayer games is you don’t want the game to start until all the players have joined.

CyclicBarrier is used to create a barrier and there are two different kinds of barrier.  The first kind is a barrier with a number of threads and the other is a number of threads and a barrier action.  The barrier action is a Runnable task that runs when all the threads have joined together.  Basically a barrier is used stop a set of threads from running until they all reach a specified point.  Comparing to a latch, which is used to let threads run wild, a barrier is used stop a set of thread.

Out of the four kinds of synchronizers, exchanger is a unique one.  It is used to allow two threads to exchange data in thread-safe manner.  Imagine the producer and consumer problem,  an exchanger can be used to allow producer and consumer to exhange the buffer that contains tasks to do in one shot, instead of consumer picks one task out of the task queue at a time.

Next article will cover Atomic Variables.


My First Writable With Hadoop

October 20, 2008

Hadoop uses a simple and efficient serialization protocol to serialize data between the map and reduce steps.  There is a lot going on between these two steps, but this article is not about that.  Rather it focuses on what a developer needs to know in order to write a custom Writable class.  Just for the folks that are new to MapReduce in Hadoop, the OutputCollector in the Map and Reduce step accepts only the value as of type Writable.

Writable is an interface in Hadoop and it has two methods: void readFields(DataInput in) and void write(DataOutput out).  If you browse Hadoop Javadocs, ther are roughly about 43 classes in Hadoop that implements the Writable interface.

Depending on what your MapReduce application needs, one of the out-of-box Writables will do the job, but if there isn’t one, then it is fairly straight forward to write a custom Writable.  That’s what I had to do for my project.  What I discovered and there isn’t that much documentation on it is in addition to the two methods defined in the Writable interface, you also need to implement the toString() method if you want the data in your custom Writable to appear in the output file (this took me sometime to figure out).

One of the interesting Writables is the GenericWritable.  This comes in handy when the Map and Combiner output the same key type, but different value type.  The requirement in Hadoop is the values that are mapped out to reduce, only one value type is allowed.  The GenericWritable basically helps you wrap instances of value of different types.  See GenericWritable JavaDoc for more details.


Setting Up Hadoop On Windows

October 1, 2008

I recently had to learn Hadoop and getting Hadoop running on Windows Vista is not as straightforward as I thought.  I believe getting Hadoop up and running on Linux is much more easier.  I would like to use this blog to share my experiences and hopefully it will help the next person that is trying to do the same thing.

My goal was to be able to:
1) Run some of the examples that came with Hadoop
2) Run a MapReduce Java program in Eclipse and able to debug it

First Problem:

Hadoop comes with a set of shell scripts, so the first thing to do is download and install Cygwin.  Since the scripts were written on some Unix variation, they will not work out of the box.  When I tried ‘bin/hadoop’ command, I got the following:


./bin/hadoop
./bin/hadoop: line 18: $'\r': command not found
: No such file or directory./bin
./bin/hadoop: line 21: $'\r': command not found
: No such file or directorydrive/c/tools/hadoop-0.14.4

Apparently this is a common problem and it is related to new line differences between Windows and Unix.  Windows uses two characters (\r\n) and Unix uses on character (\n).  Here is a link to a solution.  Basically you need to run comand ‘dos2unix’ on the Hadoop’s scripts or use your favoriate Unix command to strip out the ‘\r’, i.e sed $’/\r//g’ <file name>

Once this problem is resolved, then I was able to run the examples that came with Hadoop like WordCount or Grep inside Cygwin shell.

Second Problem:

My second goal was to run one of the Hadoop’s example inside Eclipse.   When I tried to this I got an exception while Hadoop is trying to create a process – ‘CreateProcess error=2′ and the command is something like ‘df -k’.

So it was frustrating because I was able to the examples in Cygwin shell.  It turned out the MapReduce framework is trying to execute a command ‘df -k’.  Once I added the Cygwin path to Vista PATH environment variable, then this problem went away.  It was great!! Now I can actually step throug the code.  As a developer, this is very important.

I am looking forward to sharing my Hadoop experience as I learn more about it.

Hadoop is a very powerful piece of technology and often power comes with complexity.


Nutch & Lucene

September 11, 2008

Just discovered a good video about “Experiences with the Nutch search engine” and the presenter is the project founder himself, Doug Cutting.

He gave a great history of Lucene and then covered Nutch, Hadoop and their future. This is a great way for gaining overview of these 3 very interesting and popular technologies.



Experiences with the Nutch search engine
– Doug Cutting


Lessions Learned From Recent Project

September 11, 2008

Reflection is an important process in the journey of learning. This post tries to capture the lessons learned from leading a project. This project is not a large one and not extremely complicated, but it encompasses all the typical elements of a software projects, for example, project planning, design, implementation, testing, task assignment, collaborating with other teams like QA, public perception, etc.

The lessons are categorized by the various aspects of a software project.

Conceptual Phase

This phase is all about thinking, analyzing and validations. It requires exercising a large number of neurons in your brain. Strong analytical and thinking on your feet skills are useful tools to have to be effective.

Like most of other tasks, the task of translating conceptual ideas into something more concrete will require some trial and error thinking, exploring options, validating new ideas and be open to other perspectives. One must be open to new ideas and not rush into judgment. Use cases are the guiding lights as well as your best friends.

This is probably the most challenging phase. Coming out of this phase may reveal that one needs to be humble and accepting the fact that there are so many more things that one doesn’t know.

Technology Selection

Working with new technologies is always fun and challenging at the same time. The important thing is to make sure they can satisfy your requirements. Resist the urge to use a piece of new technology just for the sake of fun or seeing it as another bullet point on your resume.

Technologies suppose to help you becoming more productive and having better sleep at night. Therefore using familiar technologies or mature ones is a safer bet.

Building Phase and Pounding the key board

This is all about following a blue print that was laid out in the previous steps with a grain of salt. When the rubber meets the road, sometimes it requires revisiting the initial thinking and coming with another alternative solution.

The other important elements are avoiding taking shortcuts and about being consistency. Short cuts will come back and haunt you during the QA cycle or worse after it is in production. Typically an application has a set of similar structures and functionalities.  It is important to be consistent in the implementation of these similar parts.

Consider unit testing as your friend. Test cases will provide confidence during refactoring or enhancement times. Writing test cases requires at least 40% of the overall development effort, so don’t underestimate this task.

Centralized error handling and other useful facilities that are needed across different parts of the application. This will help avoid committing the “duplicate codes” sin.

Make it a habit to do code review. This is a great chance for sharing knowledge and catching bugs before they become real bugs. In addition, this will help maintaining consistency, but not necessary eliminating it.

Teamwork and Communication

Small teams have better success in accomplishing something concrete in a reasonable amount of time. Nevertheless, make it a priority to communicate, via meetings, informal conversation in the hallways, or in someone’s cube.

This will also help catching wrong assumptions in a timely manner and have sufficient time to correct them. In addition, it will help building working relationship and making it more fun to work with your teammates.

Document the technical decisions that were made during the course of the development phase. Any time something requires more than one person to make a technical decision, it is important to document the details of the decision such as what was the agreement, the reasoning, and possibly other options were discussed. This is mainly because we have limited memory and the number of technical decisions is not small. The obvious benefits from doing this are you don’t have clutter your memory and you can easily defend your decision when someone asks about it 6 months after the decisions were made.

Discipline

Being disciplined about following good programming practices through out the development phase is an important aspect. Among them are code review and writing unit test cases.

Often time during the initial development, some shortcuts or hard coding were made. Make sure to remove them as soon as possible. Otherwise they will create surprises and may cause missing the deadline. In our project, we delayed the need to integrate with corporate single sign on until it was absolutely necessary. When we finally took the plunge to do this, it requires more time than expected. We had high confidence that it will work so it was not a huge risk that we took.

Schedule

Project schedule helps providing a road map to march forward to. The challenging part is once the schedule is public, it is almost an mandate to accomplish your work according to that schedule. The key is possibly to have private and public dates and preferably the later comes after the former.

A couple of days before the committed public date arrives, it is important to evaluate where you are. If you think you are running late, it is important to communicate this with your manager and possibly your customer.

Sometimes you need to see what’s are the committed features and possibly declaring the code completion without the minor features that your customers don’t care too much about or can be delayed to a later release. By doing this, you will be able to make your customer happy and at the same time you don’t loose the trust that other have on you.

Pacing

Keeping pacing by breaking down your development cycle in a number of smaller cycles. This allows you to sprint and then take small breaks to re-energize. The problem with a long development cycle is that it creates an great opportunity to procrastinate and eventually you will be caught by surprise. Procrastination is a part of human nature and it is better to create an environment that promotes anti-procrastination.