How can I store and manipulate unsigned bytes in Java?

Robert Baruch

The wacky thing about Java is that bytes, shorts, ints, and longs are all signed. Thus, the code:

byte b = 0xAA;

will generate an error (rather than a warning):

possible loss of precision
found   : int
required: byte
  byte b = 0xAA;

More's the bummer, there's no unsigned keyword in Java.

This is obviously of great consternation and gnashing of teeth to C/C++ programmers who are used to reading in files as streams of unsigned bytes, or generating unsigned bytes, or saving space by using unsigned bytes.

In the Java Virtual Machine, bytes, shorts and ints are all four bytes long. Hence, when you add two bytes together you are actually performing 32-bit arithmatic. And when you store the result back into a byte, you're not even lopping off the high 24 bits -- because the number is signed, and you need to retain the sign bit.

Believe it or not, the best way to represent an unsigned byte is to use a signed integer (not that there's any other kind of integer). Because the Java VM represents bytes as 32 bits, you're not saving anything by using a byte. And then you really can initialize your "unsigned byte" to a value greater than 0x7F, and you can read in an unsigned byte stream using integers (since the read method returns an int and not a byte).

You can do all your byte arithmetic (+,-,/,*,%) on the integers and the result will come out the same (but you should logical-and (&) with 0xFF to get your "real" unsigned byte). All of your logical operators (&,|) will work, too, since 0xAA as an unsigned byte is stored as 0x000000AA as a signed integer.

Your shift operators (<<,>>,>>>) will also work as expected, except you have to be a little careful. You should logical-and with 0xFF for the result after a left shift, since 0xFF < 1 == 0x1FE, not 0xFE. Thus (0xFF << 1) >> 1 == 0xFF, not 0x7F.

You will not be clever by converting 0xFF to a byte first (either through a typecast or through a byte variable), and then shifting left by 1. The signed byte 0xFF is really -1 (integer 0xFFFFFFFF). The result of the shift will be 0xFFFFFFFE, courtesy of the Java VM's sign-exteded 32-bit bytes.