Hadoop uses a simple and efficient serialization protocol to serialize data between the map and reduce steps. There is a lot going on between these two steps, but this article is not about that. Rather it focuses on what a developer needs to know in order to write a custom Writable class. Just for the folks that are new to MapReduce in Hadoop, the OutputCollector in the Map and Reduce step accepts only the value as of type Writable.
Writable is an interface in Hadoop and it has two methods: void readFields(DataInput in) and void write(DataOutput out). If you browse Hadoop Javadocs, ther are roughly about 43 classes in Hadoop that implements the Writable interface.
Depending on what your MapReduce application needs, one of the out-of-box Writables will do the job, but if there isn’t one, then it is fairly straight forward to write a custom Writable. That’s what I had to do for my project. What I discovered and there isn’t that much documentation on it is in addition to the two methods defined in the Writable interface, you also need to implement the toString() method if you want the data in your custom Writable to appear in the output file (this took me sometime to figure out).
One of the interesting Writables is the GenericWritable. This comes in handy when the Map and Combiner output the same key type, but different value type. The requirement in Hadoop is the values that are mapped out to reduce, only one value type is allowed. The GenericWritable basically helps you wrap instances of value of different types. See GenericWritable JavaDoc for more details.
Posted by fantastic