最近的一个新的 MR job 中需要定义一个表征状态的 Writable 对象,于是大手一挥,刷刷写出了下面的实现:

public class Status implements WritableComparable<Status> {
    private int first;
    private int second;

    public void write(DataOutput out) throws IOException {

    public void readFields(DataInput in) throws IOException {
        first = in.readInt();
        second = in.readInt();

这里由于 Status 中有两个表征计数的变量 firstsecond,所以 writereadFields 方法中均需要写/读两次。

不过写完之后运行时在 readFields 方法中读取总会出错,跳出一个莫名其妙的 EOFException,查代码查资料查了半天也没发现是什么问题。

没办法,关键时刻还得靠自己。冷静下来再认真看了看 DataOutput 的 API 文档,才发现是犯了个低级错误被 Java 坑了:

public interface DataOutput {
     * Writes to the output stream the eight
     * low-order bits of the argument <code>b</code>.
     * The 24 high-order  bits of <code>b</code>
     * are ignored.
     * @param      b   the byte to be written.
     * @throws     IOException  if an I/O error occurs.
    void write(int b) throws IOException;

     * Writes an <code>int</code> value, which is
     * comprised of four bytes, to the output stream.
     * The byte values to be written, in the  order
     * shown, are:
     * <p><pre><code>
     * (byte)(0xff &amp; (v &gt;&gt; 24))
     * (byte)(0xff &amp; (v &gt;&gt; 16))
     * (byte)(0xff &amp; (v &gt;&gt; &#32; &#32;8))
     * (byte)(0xff &amp; v)
     * </code></pre><p>
     * The bytes written by this method may be read
     * by the <code>readInt</code> method of interface
     * <code>DataInput</code> , which will then
     * return an <code>int</code> equal to <code>v</code>.
     * @param      v   the <code>int</code> value to be written.
     * @throws     IOException  if an I/O error occurs.
    void writeInt(int v) throws IOException;

write(int b) 方法只读取 int 参数的低 8 位值,而直接抛弃高 24 位;真正用于读取整型数字的还是 writeInt(int v) 方法,所以,对象实现的 write 方法应该改成:

    public void write(DataOutput out) throws IOException {




comments powered by Disqus