388

Why doesn't Java include support for unsigned integers?

It seems to me to be an odd omission, given that they allow one to write code that is less likely to produce overflows on unexpectedly large input.

Furthermore, using unsigned integers can be a form of self-documentation, since they indicate that the value which the unsigned int was intended to hold is never supposed to be negative.

Lastly, in some cases, unsigned integers can be more efficient for certain operations, such as division.

What's the downside to including these?

Orkun Ozen
  • 6,343
  • 7
  • 46
  • 82
dsimcha
  • 64,236
  • 45
  • 196
  • 319
  • 143
    I don't know but it annoys the hell out of me; for example it is much harder to write network code this way. – Tamas Czinege Jan 10 '09 at 05:35
  • 21
    I wish there were only two types in the language/database/... world: number and string :) – Liao Jan 27 '10 at 04:51
  • 5
    Writing network code isn't much harder at all. BTW InputStream.read(), returns an unsigned byte, not a signed one for example so the network example is a confusion IMHO. Its only confusing is you asume that writing a signed value is any different to writing an unsigned one. i.e. if you don't actually know what is happening at the byte level. – Peter Lawrey Aug 27 '10 at 21:59
  • 19
    @ZachSaw - I also did a double-take when I saw a language designer make that quote. There is nothing simpler than an unsigned integer. Signed integers are complicated. Particularly when you consider the bit twiddling at the transistor level. And how does a signed integer shift? I had to conclude that the designer of Java has a serious issue understanding boolean logic. – PP. Feb 12 '12 at 21:58
  • 1
    @PP.: A major problem with unsigned types in C is that they represent a weird cross between integers and a cyclic groups, since uint1-uint2 is defined as the value which, when added to uint2, yields uint1. The problem could be solved if a language allowed for implicit casts which could not participate in operator overloading (so that e.g. `someUInt32=someUInt16;` would be legal, but `if (someInt32==someInt16)` would not be legal without a typecast) but included operator overloads that made sense (e.g. adding a signed value to a uint value cycles the group by that amount). – supercat Sep 03 '13 at 04:26
  • 8
    To me it becomes harder to do any image processing with images `byte` not being able to give a straight `140` gray level but a `-116` that you need to `& 0xff` to get the correct value. – Matthieu Mar 12 '14 at 11:52
  • 3
    While working with bytes I generally don’t care about the sign, it all works out just the way it should. I would however *love* a way to use unsigned byte literals, i.e. the possibility to write `byte b = 0x99;` . – Bombe Aug 21 '14 at 20:01
  • 4
    FWIW, unsigned APIs finally got its presence in [Java 8](https://blogs.oracle.com/darcy/entry/unsigned_api) – phuclv Mar 20 '15 at 10:03
  • 2
    @PP. You don't use int for performing boolean logic. :P – Syed Aqeel Ashiq Dec 10 '15 at 09:12
  • @Matthieu may be you can use int then? No? – Syed Aqeel Ashiq Dec 10 '15 at 09:22
  • 2
    @djaqeel that's what's happening internally when you `&0xff`. Loss of computing power... – Matthieu Dec 10 '15 at 15:37
  • 1
    @PP and Matthieu I didn't get you at all. While using boolean logic or processing bytes, I have to deal with bits, rather than it's output integer value. And AFAK, bits behave same way and don't care about the sign. Sign and min-max byte value matters for us, humans. So if I need to show byte value(very rare case though) I just make two port-in port-out functions for int to byte and vice versa. – Syed Aqeel Ashiq Dec 11 '15 at 07:27
  • @PeterLawrey how can `InputStream.read()` *return* unsigned bytes if unsigned bytes don't exist in Java? Since we have everywhere just bits, not signs, we have the two complement, a special interpretation of those bits which requires “special” treatment; that is, also you need to know extra rules, which sometimes you don't want to have in the way. E.g., using bytes read with `InputStream.read()` to build a 32 bit value, `(b[3] << 24)|(b[2] << 16)|(b[1] << 8)|b[0]`… this works always iff you have unsigned bytes. Otherwise, sign extension gets in the way and you need masking. – ShinTakezou Jun 24 '18 at 10:23
  • @ShinTakezou the `read()` returns an `int` type which has the range of an unsigned byte (0 to 255) unless the end of file is reached. A more efficient way to read an `int` is as a 32-bit value rather than byte by byte. Note: you have to handle running out of bytes e.g. there is only 2 bytes when you actually want 4. – Peter Lawrey Jun 24 '18 at 10:37
  • 1
    @PeterLawrey confused with the wrong overloaded `read`: written the previous comment with `read(byte b[], ...)` in mind. Holding unsigned values into a signed bigger “container” is what I currently do — except in case of `long`. Still, when converting C code into Java, you have to watch yourself more than it should be if there were unsigned types. – ShinTakezou Jun 24 '18 at 10:47
  • 1
    @ShinTakezou personally I think Java should have unsigned types apart from char. esp unsigned byte which is more natural than a signed one. – Peter Lawrey Jun 25 '18 at 08:01

16 Answers16

203

This is from an interview with Gosling and others, about simplicity:

Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.

Erik Kaplun
  • 33,421
  • 12
  • 92
  • 102
Uri
  • 84,589
  • 46
  • 214
  • 312
  • 231
    I'm going to have to disagree with Gosling here with a specific example (from CLR no less). What's more confusing giving an Array a signed integer length value or an unsigned length? It's impossible for an Array to have negative length yet our API indicates that's possible. – JaredPar Jan 10 '09 at 02:29
  • 19
    The argument of making Java simple is part of what got us into the whole mess with a lack of templates which they eventually brought into the language because the alternatives were so cumbersome. I do think that one could support unsigned ints with an appropriate class though, it doesn't need prims – Uri Jan 10 '09 at 05:12
  • 6
    @Uri true but it will almost certainly not be as performant as a primitive implementation in the JVM. – JaredPar Jan 10 '09 at 15:54
  • How often is that an issue though? I'm also not sure about how commands and types are encoded in bytecode, perhaps adding unsigned would have also affected the instruction set? – Uri Jan 10 '09 at 17:34
  • 8
    @Uri, it can be a large issue if you do math intensive work. Consider the JITing of a primitive type. Math instructions on primitives can be easily JIT'd into corresponding assembly. But if math is hand implemented it won't translate directly to assembly and hence performance will suffer. – JaredPar Jan 10 '09 at 17:51
  • 2
    @JaredPar: Well, modern (JIT) compilers are pretty clever and can optimize a lot. If guess in many case a simple Unsigned class could be optimized down to native CPU operations on unsigned ints. So I don't think it absolutely has to be a primitive. As a matter of fact many people believe that primitives are (no longer) really necessary. Smalltalk e.g. has no primitives, only objects. – sleske Oct 12 '09 at 08:05
  • 60
    If Java needs unsigned integers because Array indices can't be negative, then it also needs subranges (a la Pascal) because an array index can't be greater than the array size. – Wayne Conrad Jan 27 '10 at 05:05
  • 2
    @Uri I believe generics (not templates) were supposed to be in Java, it's just that getting the specification (and implementation) right is difficult. Some people might even argue that more time should have been spent. – Tom Hawtin - tackline Jan 27 '10 at 06:38
  • 3
    There is a recording of Gosling telling the story at the last Javapolis (before it became Devoxx). Bloch points out that the example given is also valid for signed ints. Gosling gets confused by ints. Sensible languages should stick to arbitrary-size integers only. – Tom Hawtin - tackline Jan 27 '10 at 06:45
  • 86
    Okay, he just told the advantages of not having unsigned types. Now let's count the disadvantages... – Zippo Dec 07 '10 at 14:28
  • 89
    I prefer code simplicity over language simplicity. That's why I hate Java. – Pijusn Sep 16 '12 at 11:43
  • 4
    @Pius: Worse is language simplistic-ity, where a requirement that certain rules be applicable in all contexts forces the selection of sub-optimal rules. The *only* reason I can see to allow implicit float to double conversions but not vice versa, is to allow avoid having `==` work as something other than an equivalence operator when applied to mixed floats and doubles; I would posit that a better way to avoid having `==` work as something other than an equivalence operator in such cases would be to disallow direct `float`-`double` comparisons, which are *almost always wrong*. – supercat Sep 03 '13 at 04:31
  • 9
    I guess this guy doesn't realize that writing to a binary file may cause a lot of pain when somebody has only 128 as the top value for a byte. – luke1985 Mar 21 '14 at 13:43
  • 4
    IMOH Java may be simpler than C++ for certain things, but has some ugly things too. Many library classes are somewhat complex (Date and Calendar for example). And consider the funny fact that `A A(A A)` is a valid method definition :D – Paolo Apr 22 '14 at 14:58
  • 6
    as a developer i would say that a language shall never dictate the way a person codes, it should provide the facilities and leave the use to the developer. and he just created a very good corner case by providing facilities for arrays with negative length – ArunMKumar Oct 16 '14 at 11:39
  • 18
    As a developer I find it insulting that Gosling thinks I'm not smart enough to understand signed and unsigned integers. Was Java only intended for beginners? – Bernard Igiri Nov 10 '14 at 12:52
  • 8
    Hang your head, Gosling. This, not the Ask toolbar, is Java's deepest shame. – Chris Hatton Feb 21 '15 at 03:19
  • 1
    @JaredPar if you want types to actually constrain semantics like this, I think Java's lack of sum types and newtypes are much larger obstacles to correctness. – Kris Nuttycombe Apr 20 '15 at 17:54
  • 3
    It's unfair to compare java with a language that allows and understands `unsigned long long`, Java could get their `==` to work, so far it's only slightly better than the PHP's `==`, IMO, Java is getting obsolete with all this oversimplification, C# is only expanding and destroying java each day. Now such a shame a language that doesn't know `unsigned` because its claimed "simple" have 800+ pages of lang spec, while C# allows for both unsigned and pointers have 511 pages of lang spec. This "Simple" language is much more overcomplicated than C. Yet refuses to have a simple `unsigned` – Felype Apr 28 '15 at 01:22
  • 1
    @Felype: I don't mind Gosling's decision to disallow unsigned types whose values are not all representable as `int`. I don't consider his reasons adequate to justify the lack of an unsigned 8-bit type, nor an unsigned 16-bit type whose purpose is to store numbers, since the interaction of an unsigned byte type with `int` would be no worse than that of a signed byte type with `int`: either gets promoted to `int`. – supercat Jun 01 '15 at 19:31
  • 1
    @supercat When you promote a `byte` to `int` in Java, you're bound expect a negative result, so you need to trim the value to the lowest 8 bits eg. `int n = (b[0] & 0xFF);`. **With unsigned types, sign extension doesn't occur.** – bit2shift Nov 03 '15 at 21:09
  • @bit2shift: The promotion is performed without sign-extension, rather than being performed with sign extension, but an expression like "someUnsignedByte > someSignedByte" would pose no problem, since sign-extending someSignedByte, but promoting someUnsignedByte without sign-extension, would yield the arithmetically-correct result in all cases. Such types don't have the problems of "full-sized" unsigned types, where interactions can only be handled sensibly by promoting to an even larger type. – supercat Apr 07 '16 at 19:59
  • 1
    @supercat Try `byte a = (byte)0xF0; int b = a; System.out.println(b);` and come back to me with the result. I'll give you an hint, it's `-16`. Also `someUnsignedByte > someSignedByte` would cause a warning `comparison between signed and unsigned integer expressions` in a language with unsigned types, eg. C++. – bit2shift Apr 09 '16 at 16:15
  • 1
    @bit2shift: Java's "byte" type is signed. If there existed an unsigned byte type, promotion (of that type) would occur without sign extension. As for the latter comparison, I wouldn't expect a warning, since both values would get promoted to type "signed int", and the comparison would behave in the numerically-expected fashion. – supercat Apr 11 '16 at 14:51
  • 1
    I'm a C# developer, a lot of C# open source project in github that i seen don't use signed type. why my car should have feature that i maybe remember use it once at sometime! – Ayub Apr 12 '17 at 11:26
  • 1
    Quiz any developer about floating points numbers, and pretty soon you discover that many developers actually dont fully understand what goes on with floating point arithmetics...... tbh I am starting to get really annoyed by the java philosophy – 463035818_is_not_a_number May 16 '18 at 09:47
54

Reading between the lines, I think the logic was something like this:

  • generally, the Java designers wanted to simplify the repertoire of data types available
  • for everyday purposes, they felt that the most common need was for signed data types
  • for implementing certain algorithms, unsigned arithmetic is sometimes needed, but the kind of programmers that would be implementing such algorithms would also have the knowledge to "work round" doing unsigned arithmetic with signed data types

Mostly, I'd say it was a reasonable decision. Possibly, I would have:

  • made byte unsigned, or at least have provided a signed/unsigned alternatives, possibly with different names, for this one data type (making it signed is good for consistency, but when do you ever need a signed byte?)
  • done away with 'short' (when did you last use 16-bit signed arithmetic?)

Still, with a bit of kludging, operations on unsigned values up to 32 bits aren't tooo bad, and most people don't need unsigned 64-bit division or comparison.

Neil Coffey
  • 20,815
  • 6
  • 58
  • 78
  • I agree; esp. about having and unsigned byte and tossing short. – Lawrence Dol Jan 10 '09 at 05:07
  • 2
    I would love to have unsigned bytes, too, but I suspect the advantage of complete consistency among the integer types outweighs the convenience that unsigned bytes would bring. – Alan Moore Jan 10 '09 at 05:47
  • 70
    "For everyday purposes, they felt that the most common need was for signed data types". In my C++ code, I more than often find myself thinking "Why on earth am I using a signed integer here instead of an unsigned one?!". I have the feeling that "signed" is the exception rather than the rule (of course, it depends on the domain, but there is a reason why positive integers are called *natural* numbers ;-) ). – Luc Touraille Jun 27 '11 at 08:27
  • Java is old enough that arithmetic in shorts still made sense when the first java was written. – Joshua Aug 01 '11 at 04:08
  • Yeah they could have had specific types like in C99 for when you need them (file and network code): uint8, int8, uint16, int16, etc. And then an extra int type which is set to int32 or int64 depending on the platform. – Timmmm Jan 16 '12 at 11:21
  • 17
    thumb up for the call for unsigned bytes, when doing image processing, assuming bytes to be unsigned(as it should be), made me spent hours debugging. – Helin Wang Apr 30 '12 at 00:55
  • 7
    you'd surprised how often `short` is used - defltate/gzip/inflate algorithms are 16bit and they rely heavily on shorts... or at least `short[]` [admittedly they are native - yet java impl of the algorithm carry terrabytes of data]. The latter (`short[]`) has significant advantage to `int[]` since it takes twice less memory and less memory = better caching properties, much better performance. – bestsss May 09 '12 at 10:43
  • 9
    Though in a particular application, you should *measure* whether using shorts gives you better performance rather than assuming it to be true. It is possible that the extra jiggery-pokery required to manipulate shorts rather than ints (which is usually the type that the processor 'likes to use') could actually be detrimental to performance in a particular application. Not always, but you should test, not assume. – Neil Coffey May 09 '12 at 16:47
  • @NeilCoffey: Java's int seems to be a feature that is doomed for obsolescence once 128-bit and 256-bit processors become common because Java defines int to be exactly 32-bit everywhere, while in the future that size will no longer be any processor's preferred integer size, so we all will be porting all our code to use "long long long long". I believe Java should never have short in the first place, it makes no sense even back then, and even less now. In future proof cross platform language, arbitrary sized arithmetic (BigInteger) should have been the default integer and have operator support. – Lie Ryan Jan 09 '13 at 08:55
  • 2
    Lie-- what you say is true, but so far, 64 bit processors still give some kind of "privileged" status to the 32 bit width. If we do move to e.g. 128 bit processors, it will be interesting to see if at that point 32 bits starts to become more obsolete. – Neil Coffey Jan 09 '13 at 16:21
  • 2
    128-bit and 256-bit processors won't become popular for MANY MANY years. Java will (hopefully) be long gone. – Miles Rout Feb 18 '13 at 08:07
  • @NeilCoffey: int16_t is the native format of the CD and is thus usable in audio apps. – user877329 Apr 11 '14 at 10:12
  • I couldn't agree more with your answer! Having a smaller set of available integer types, and making byte unsigned are both good points I wholeheartedly support. – Nayuki Apr 22 '16 at 23:41
  • @LieRyan in that case, the VM will simply deal with that, like it does currently with `byte` and `short` values in 32-bit implementations, or with `int` values in 64-bit ones. The worst-case scenario here would be a performance penalty. – DragShot Jul 04 '17 at 15:57
21

This is an older question and pat did briefly mention char, I just thought I should expand upon this for others who will look at this down the road. Let's take a closer look at the Java primitive types:

byte - 8-bit signed integer

short - 16-bit signed integer

int - 32-bit signed integer

long - 64-bit signed integer

char - 16-bit character (unsigned integer)

Although char does not support unsigned arithmetic, it essentially can be treated as an unsigned integer. You would have to explicitly cast arithmetic operations back into char, but it does provide you with a way to specify unsigned numbers.

char a = 0;
char b = 6;
a += 1;
a = (char) (a * b);
a = (char) (a + b);
a = (char) (a - 16);
b = (char) (b % 3);
b = (char) (b / a);
//a = -1; // Generates complier error, must be cast to char
System.out.println(a); // Prints ? 
System.out.println((int) a); // Prints 65532
System.out.println((short) a); // Prints -4
short c = -4;
System.out.println((int) c); // Prints -4, notice the difference with char
a *= 2;
a -= 6;
a /= 3;
a %= 7;
a++;
a--;

Yes, there isn't direct support for unsigned integers (obviously, I wouldn't have to cast most of my operations back into char if there was direct support). However, there certainly exists an unsigned primitive data type. I would liked to have seen an unsigned byte as well, but I guess doubling the memory cost and instead use char is a viable option.


Edit

With JDK8 there are new APIs for Long and Integer which provide helper methods when treating long and int values as unsigned values.

  • compareUnsigned
  • divideUnsigned
  • parseUnsignedInt
  • parseUnsignedLong
  • remainderUnsigned
  • toUnsignedLong
  • toUnsignedString

Additionally, Guava provides a number of helper methods to do similar things for at the integer types which helps close the gap left by the lack of native support for unsigned integers.

Mr. Polywhirl
  • 31,606
  • 11
  • 65
  • 114
Jyro117
  • 4,411
  • 20
  • 28
  • 2
    But however, `char` is too small to support `long` arithmetic, for example. –  Feb 26 '13 at 13:03
  • 3
    This could be a disadvantage of Java –  Feb 27 '13 at 14:43
  • Hoping that they support Unsigned values for bytes. Makes things more easy. – mixturez May 22 '17 at 09:32
  • 2
    I ran head into this when trying to read data from a harddrive, which was written to by a C program, which had specs in C structures. I was forced not only to deal with Endian differences, but to add insult to injury, read everything using 64 bit integers then applying shifting operations to get the correct data out, just because someone decided they didn't want to implement unsigned numbers. Sorry if they're allowed to say something like "To keep java simple" then I'm going to say that's a lazy reason. – Dan Chase Nov 05 '20 at 20:20
17

Java does have unsigned types, or at least one: char is an unsigned short. So whatever excuse Gosling throws up it's really just his ignorance why there are no other unsigned types.

Also Short types: shorts are used all the time for multimedia. The reason is you can fit 2 samples in a single 32-bit unsigned long and vectorize many operations. Same thing with 8-bit data and unsigned byte. You can fit 4 or 8 samples in a register for vectorizing.

pat
  • 211
  • 2
  • 2
  • 41
    Yeah, I'm sure Gosling is very ignorant about Java in comparison to you. – jakeboxer Dec 12 '09 at 16:55
  • Does Java allow arithmetic to be performed directly on unsigned-byte quantities, or do values always get promoted? Having an unsigned type for storage, but always performing arithmetic on a signed type which is large enough to accommodate it works out well semantically, but would cause operations on unsigned types that were the same size as "normal" integers to be more expensive. – supercat Jul 16 '12 at 16:23
  • 2
    It's bad style to use `char` for anything but characters. – starblue Dec 24 '12 at 08:34
  • 5
    @starblue Of course it is, but it's a hack to get around a limitation of the language – Basic Mar 31 '14 at 20:53
15

As soon as signed and unsigned ints are mixed in an expression things start to get messy and you probably will lose information. Restricting Java to signed ints only really clears things up. I’m glad I don’t have to worry about the whole signed/unsigned business, though I sometimes do miss the 8th bit in a byte.

Perception
  • 75,573
  • 19
  • 170
  • 185
Bombe
  • 74,913
  • 20
  • 118
  • 125
  • 14
    As to mixing signed/unsigned: You could have unsigned types, but disallow the mixing (or require explicit casts). Still, not clear whether it's necessary. – sleske Oct 12 '09 at 08:27
  • 2
    In C++ you have to sprinkle `static_cast`s around much to mix them. It is indeed messy. – Raedwald Aug 02 '11 at 12:40
  • 4
    The 8th bit is there, it just tries to hide itself as the sign. – starblue Nov 28 '12 at 14:01
  • Things only get messy with types 32 bits or bigger. I see no reason Java shouldn't have had `byte` be signed as it was in Pascal. – supercat Apr 25 '14 at 22:06
  • 13
    Come see me when you're having issues with image processing in Java, where you expect bytes to be unsigned. Then you'll know that `& 0xFF`'ing every byte-to-int promotion makes the code even messier. – bit2shift Nov 03 '15 at 21:16
13

http://skeletoncoder.blogspot.com/2006/09/java-tutorials-why-no-unsigned.html

This guy says because the C standard defines operations involving unsigned and signed ints to be treated as unsigned. This could cause negative signed integers to roll around into a large unsigned int, potentially causing bugs.

akatakritos
  • 9,716
  • 1
  • 22
  • 29
  • 34
    Java signed integers roll around, too. I don't see your point. – foo Aug 18 '11 at 18:21
  • 9
    @foo: Signed integers have to get big before they cause problems. By contrast, in C, one can have problems comparing any negative integer--even `-1`--to any unsigned quanity--even zero. – supercat Jul 16 '12 at 16:27
  • It's too bad Java couldn't have included unsigned types, but with a limited set of conversions and mixed operators (somewhat analogous to the way that in C one can add 5 to a pointer, but one can't compare a pointer to 5). The idea that using an operator on mixed types when an implicit cast exist, should force the implicit use of that cast (and use the consequent type as the result type) lies at the heart of a lot of dubious design decisions in both .NET and Java. – supercat Sep 03 '13 at 04:38
  • 4
    Not to rant on your answer, but having `-1` as "unknown" age (as the article suggests) is one of the **classic examples of "code smell"**. For instance, if you want to compute "how much Alice is older than Bob?", and A=25 and B=-1, you will get an answer of `±26` which is simply wrong. The proper handling of unknown values is some kind of `Option` when `Some(25) - None` would return `None`. – bytebuster Dec 04 '14 at 16:05
12

I think Java is fine as it is, adding unsigned would complicate it without much gain. Even with the simplified integer model, most Java programmers don't know how the basic numeric types behave - just read the book Java Puzzlers to see what misconceptions you might hold.

As for practical advice:

  • If your values are somewhat arbitrary size and don't fit into int, use long. If they don't fit into long use BigInteger.

  • Use the smaller types only for arrays when you need to save space.

  • If you need exactly 64/32/16/8 bits, use long/int/short/byte and stop worrying about the sign bit, except for division, comparison, right shift, and casting.

See also this answer about "porting a random number generator from C to Java".

Nayuki
  • 16,655
  • 5
  • 47
  • 75
starblue
  • 51,675
  • 14
  • 88
  • 146
  • 5
    Yes, for shifting right you have to choose between `>>` and `>>>` for signed and unsigned, respectively. Shifting left is no problem. – starblue Oct 20 '11 at 06:48
  • 1
    @starblue Actually `>>>` doesn't work for `short` and `byte`. For example, `(byte)0xff>>>1` yields `0x7fffffff` rather than `0x7f`. Another example: `byte b=(byte)0xff; b>>>=1;` will result in `b==(byte)0xff`. Of course you can do `b=(byte)(b & 0xff >> 1);` but this adds one more operation (bitwise &). – CITBL Nov 05 '13 at 21:07
  • 8
    "...Even with the simplified model most Java programmers don't know how the basic numeric types behave..." Something in me just resents a language aimed at the lowest common denominator. – Basic Mar 31 '14 at 16:58
  • The opening line in your answer, about more complication and little gain, is precisely what I elaborated on in my article 6 years later: https://www.nayuki.io/page/unsigned-int-considered-harmful-for-java – Nayuki Apr 22 '16 at 23:44
  • 1
    @Nayuki Your article is really nice. Only a small remark, I would use addition of 0x80000000 for comparison operators instead of XOR, because it explains why it works, it shifts the contiguous region where the comparison occurs up from -MAXINT to 0. Bitwise its effect is exactly the same. – starblue Aug 02 '17 at 12:06
  • @starblue Thanks for the compliments. Btw I edited your answer a bit to improve the facts and wording, hope you like it. Regarding 0x80000000, why use addition? XOR is a simpler operation that has no carry. Also I have equivalently seen subtraction in some other code for compareUnsigned(). As for explaining, I didn't explain any of the other operators in low level detail, so for consistency I can't explain the comparison hack... – Nayuki Aug 02 '17 at 15:33
8

I know this post is too old; however for your interest, in Java 8 and later, you can use the int data type to represent an unsigned 32-bit integer, which has a minimum value of 0 and a maximum value of 232−1. Use the Integer class to use int data type as an unsigned integer and static methods like compareUnsigned(), divideUnsigned() etc. have been added to the Integer class to support the arithmetic operations for unsigned integers.

Nayuki
  • 16,655
  • 5
  • 47
  • 75
Morteza Adi
  • 2,266
  • 2
  • 19
  • 35
6

With JDK8 it does have some support for them.

We may yet see full support of unsigned types in Java despite Gosling's concerns.

John Hascall
  • 8,682
  • 4
  • 46
  • 64
  • 12
    aka "So people really do use it and we were wrong not to include it to start with" - but we still don't quite trust Java devs to know whether a variable is signed or not - so we're not going to implement them in the VM or as types equivalent to their signed cousins. – Basic Nov 11 '13 at 17:49
5

I once took a C++ course with someone on the C++ standards committee who implied that Java made the right decision to avoid having unsigned integers because (1) most programs that use unsigned integers can do just as well with signed integers and this is more natural in terms of how people think, and (2) using unsigned integers results in lots easy to create but difficult to debug issues such as integer arithmetic overflow and losing significant bits when converting between signed and unsigned types. If you mistakenly subtract 1 from 0 using signed integers it often more quickly causes your program to crash and makes it easier to find the bug than if it wraps around to 2^32 - 1, and compilers and static analysis tools and runtime checks have to assume you know what you're doing since you chose to use unsigned arithmetic. Also, negative numbers like -1 can often represent something useful, like a field being ignored/defaulted/unset while if you were using unsigned you'd have to reserve a special value like 2^32 - 1 or something similar.

Long ago, when memory was limited and processors did not automatically operate on 64 bits at once, every bit counted a lot more, so having signed vs unsigned bytes or shorts actually mattered a lot more often and was obviously the right design decision. Today just using a signed int is more than sufficient in almost all regular programming cases, and if your program really needs to use values bigger than 2^31 - 1, you often just want a long anyway. Once you're into the territory of using longs, it's even harder to come up with a reason why you really can't get by with 2^63 - 1 positive integers. Whenever we go to 128 bit processors it'll be even less of an issue.

Jonathan
  • 63
  • 1
  • 3
5

I've heard stories that they were to be included close to the orignal Java release. Oak was the precursor to Java, and in some spec documents there was mention of usigned values. Unfortunately these never made it into the Java language. As far as anyone has been able to figure out they just didn't get implemented, likely due to a time constraint.

Rob Ottaway
  • 604
  • 5
  • 9
  • This would be fine ... except the evidence from the Gosling interview implies that unsigned integers (apart from `char`) were left out because the designers thought they were a bad idea ... given the goals of the language. – Stephen C Jun 30 '12 at 10:12
  • It is a good idea never to put too much value in eyewitness statements, if documentary evidence is also at hand. – user7610 Sep 07 '18 at 10:50
2

Your question is "Why doesn't Java support unsigned ints"?

And my answer to your question is that Java wants that all of it's primitive types: byte, char, short, int and long should be treated as byte, word, dword and qword respectively, exactly like in assembly, and the Java operators are signed operations on all of it's primitive types except for char, but only on char they are unsigned 16 bit only.

So static methods suppose to be the unsigned operations also for both 32 and 64 bit.

You need final class, whose static methods can be called for the unsigned operations.

You can create this final class, call it whatever name you want and implement it's static methods.

If you have no idea about how to implement the static methods then this link may help you.

In my opinion, Java is not similar to C++ at all, if it neither support unsigned types nor operator overloading, so I think that Java should be treated as completely different language from both C++ and from C.

It is also completely different in the name of the languages by the way.

So I don't recommend in Java to type code similar to C and I don't recommend to type code similar to C++ at all, because then in Java you won't be able to do what you want to do next in C++, i.e. the code won't continue to be C++ like at all and for me this is bad to code like that, to change the style in the middle.

I recommend to write and use static methods also for the signed operations, so you don't see in the code mixture of operators and static methods for both signed and unsigned operations, unless you need only signed operations in the code, and it's okay to use the operators only.

Also I recommend to avoid using short, int and long primitive types, and use word, dword and qword respectively instead, and you are about call the static methods for unsigned operations and/or signed operations instead of using operators.

If you are about to do signed operations only and use the operators only in the code, then this is okay to use these primitive types short, int and long.

Actually word, dword and qword don't exist in the language, but you can create new class for each and the implementation of each should be very easy:

The class word holds the primitive type short only, the class dword holds the primitive type int only and the class qword holds the primitive type long only. Now all the unsigned and the signed methods as static or not as your choice, you can implement in each class, i.e. all the 16 bit operations both unsigned and signed by giving meaning names on the word class, all the 32 bit operations both unsigned and signed by giving meaning names on the dword class and all the 64 bit operations both unsigned and signed by giving meaning names on the qword class.

If you don't like giving too many different names for each method, you can always use overloading in Java, good to read that Java didn't remove that too!

If you want methods rather than operators for 8 bit signed operations and methods for 8 bit unsigned operations that have no operators at all, then you can create the Byte class (note that the first letter 'B' is capital, so this is not the primitive type byte) and implement the methods in this class.

About passing by value and passing by reference:

If I am not wrong, like in C#, primitive objects are passed by value naturally, but class objects are passed by reference naturally, so that means that objects of type Byte, word, dword and qword will be passed by reference and not by value by default. I wish Java had struct objects as C# has, so all Byte, word, dword and qword could be implemented to be struct instead of class, so by default they were passed by value and not by reference by default, like any struct object in C#, like the primitive types, are passed by value and not by reference by default, but because that Java is worse than C# and we have to deal with that, then there is only classes and interfaces, that are passed by reference and not by value by default. So if you want to pass Byte, word, dword and qword objects by value and not by reference, like any other class object in Java and also in C#, you will have to simply use the copy constructor and that's it.

That's the only solution that I can think about. I just wish that I could just typedef the primitive types to word, dword and qword, but Java neither support typedef nor using at all, unlike C# that supports using, which is equivalent to the C's typedef.

About output:

For the same sequence of bits, you can print them in many ways: As binary, as decimal (like the meaning of %u in C printf), as octal (like the meaning of %o in C printf), as hexadecimal (like the meaning of %x in C printf) and as integer (like the meaning of the %d in C printf).

Note that C printf doesn't know the type of the variables being passed as parameters to the function, so printf knows the type of each variable only from the char* object passed to the first parameter of the function.

So in each of the classes: Byte, word, dword and qword, you can implement print method and get the functionality of printf, even though the primitive type of the class is signed, you still can print it as unsigned by following some algorithm involving logical and shift operations to get the digits to print to the output.

Unfortunately the link I gave you doesn't show how to implement these print methods, but I am sure you can google for the algorithms you need to implement these print methods.

That's all I can answer your question and suggest you.

  • MASM (Microsoft assembler) and Windows define BYTE, WORD, DWORD, QWORD, as unsigned types. For MASM, SBYTE, SWORD, SDWORD, SQWORD are the signed types. – rcgldr Apr 22 '19 at 07:21
1

Because unsigned type is pure evil.

The fact that in C unsigned - int produces unsigned is even more evil.

Here is a snapshot of the problem that burned me more than once:

// We have odd positive number of rays, 
// consecutive ones at angle delta from each other.
assert( rays.size() > 0 && rays.size() % 2 == 1 );

// Get a set of ray at delta angle between them.
for( size_t n = 0; n < rays.size(); ++n )
{
    // Compute the angle between nth ray and the middle one.
    // The index of the middle one is (rays.size() - 1) / 2,
    // the rays are evenly spaced at angle delta, therefore
    // the magnitude of the angle between nth ray and the 
    // middle one is: 
    double angle = delta * fabs( n - (rays.size() - 1) / 2 ); 

    // Do something else ...
}

Have you noticed the bug yet? I confess I only saw it after stepping in with the debugger.

Because n is of unsigned type size_t the entire expression n - (rays.size() - 1) / 2 evaluates as unsigned. That expression is intended to be a signed position of the nth ray from the middle one: the 1st ray from the middle one on the left side would have position -1, the 1st one on the right would have position +1, etc. After taking abs value and multiplying by the delta angle I would get the angle between nth ray and the middle one.

Unfortunately for me the above expression contained the evil unsigned and instead of evaluating to, say, -1, it evaluated to 2^32-1. The subsequent conversion to double sealed the bug.

After a bug or two caused by misuse of unsigned arithmetic one has to start wondering whether the extra bit one gets is worth the extra trouble. I am trying, as much as feasible, to avoid any use of unsigned types in arithmetic, although still use it for non-arithmetic operations such as binary masks.

Michael
  • 5,195
  • 1
  • 28
  • 46
  • Adding "unsigned long" to Java would be awkward. Adding smaller unsigned types, however, should have posed no problem. Especially types smaller than "int" could have been easily handled by having them promote to "int" in the numerically-obvious fashion, and "unsigned int" could have been handled by saying that operations involving a signed int and an unsigned int will promote both operands to "long". The only problem situation would be operations involving an unsigned long and a signed quantity, since there would be no type capable of representing all values of both operands. – supercat Sep 26 '16 at 17:20
  • @supercat: if `unsigned` gets converted to `int` at every operation what's the use of `unsigned`? It won't have any functionality distinguishable from `short`. And if you convert to `int` only on mixed operations, such as `unsigned+int` or `unsigned+float`, then you still have the problem of `((unsigned)25-(unsigned)30)*1.0 > 0`, which is a major cause of `unsigned`-related bugs. – Michael Sep 26 '16 at 20:56
  • Many operations on unsigned types would promote to "long". Requiring explicit casts when storing the result back to unsigned types would cause much the same annoyances as exist with short and byte, but if the type is mainly a storage format rather than a computation format that shouldn't be a problem. In any case, unsigned types shorter than "int" should simply be able to promote to "int" without difficulty. – supercat Sep 26 '16 at 22:18
  • 3
    I dislike this answer because it uses the argument "unsigned integers are evil and should not exist because they can never be signed". Anyone trying to subtract from an unsigned integer should know this already. And as for readability, C is not exactly known for being easy to follow. Furthermore, the (semi-)argument "the extra bit is not worth the extra trouble" is very weak as well. Is error handling instead of `exit(1);` really 'worth the extra trouble'? Is not being able to open large files really worth the security that less experienced java programmers will not mess up using `unsigned`? – yyny Oct 14 '16 at 20:15
  • cont. I understand why Java chose to exclude unsigned integers, but I do not think it is for one of these reasons. To improve readability, maybe, to make the JVM easier to implement, sure, but I do not think they chose to exclude them solely because they can never be unsigned (in which case they could choose to make subtractions to unsigned signed for some reason) or because the extra bit is not worth it (which it most definitely is, I have found multiple java programs crash when trying to processing large files). – yyny Oct 14 '16 at 20:24
  • cont. Furthermore, even if the programs do not crash, they can severely restrict flexibility, as with e.g. [the Minecraft command block character limit](http://minecraft.gamepedia.com/Command_Block#History). – yyny Oct 14 '16 at 20:26
  • 2
    The only evil thing I see in this code is `n - (rays.size() - 1) / 2`. You should always bracket binary operators because the reader of the code should not need to assume anything about order of operations in a computer program. Just because we conventionally say a+b*c = a+(b*c) does not mean you can assume this when reading code. Furthermore, the computation should be defined outside the loop so that it can be tested without the loop present. This is a bug in not making sure your types line up rather than a problem of unsigned integers. In C it's up to you to make sure your types line up. – Dmitry Oct 30 '16 at 21:07
  • In a C program, it is the epitome of stupidity to perform a binary operation on two things of different types. This is no different than having {int a = 2; int *b = &a; int c = a + b} Then complaining that a+b does not give you 4: It's up to you to make sure that your types line up; do not rely on the compiler to coerce your types the way you want them to be coerced. – Dmitry Oct 30 '16 at 21:09
  • I repeat; it is not the compiler being evil; it is your code being disgustingly misleading. angle is a double, delta is a god knows what type, n is an unsigned int, rays.size() is god knows what type, 1 is an int, 2 is an int. fabs takes a double and returns a double. (+) takes an a => (a, a) and returns an a, (/) takes an a => (a, a) and returns an a. naturally, you have two constructs which we don't even know what types of them are, on top of that, you mix unsigned and signed without acknowledging that binary operators are polymorphic. – Dmitry Oct 30 '16 at 21:23
  • I see this all the time in the world of "Math" expressions, and it disgusts me. If you don't understand basic lambda calculus/Haskell/THE IDEA THAT TYPES MUST LINE UP, you should not be writing any production code that contains chains of transformations; Even if unsigned behaved correctly, the mental state that wrote this code would cause other bugs shortly after. You must ALWAYS be aware of the world of types you are operating in, and prefer to put them into a single world of a single type(in this case, a world of doubles). – Dmitry Oct 30 '16 at 21:26
  • @Dmitry: the snippet is simple enough. Why don't you re-write right here in what you deem be the "right way". – Michael Oct 31 '16 at 19:22
  • @Michael it can't even be rewritten, there is information in the code sample that we don't know the implementation of. We don't know what ray is and what its' size returns. We don't know what delta's type is. It is dishonest to attempt to rewrite something containing dependencies you aren't aware of. – Dmitry Nov 02 '16 at 15:51
  • @Dmitry, my point was that following strictly theoretical rule of how the code "should be" makes it inefficient in real life. If you try to implement something like that functional or OOP style you'll implicitly add a couple of unnecessary instructions. I've seen 2x memory and runtime penalties for seemingly harmless OOP code, which of course is a tremendous marketing disadvantage for commercial code. – Michael Nov 02 '16 at 16:41
  • @Dmitry: Here's one of those: `class Point { float x, y; public: virtual ~Point() {} };` Can you spot why the commercial software with that class used 2x more memory than the competitor with exactly the same functionality? That's right: making `Point::~Point()` virtual made it OOP-friendly (which was unnecessary in that context), but that also added 8 additional bytes to the 8-byte structure (which was crucial for memory footprint). – Michael Nov 02 '16 at 16:47
  • @Michael I'm not sure what you are talking about. I'm not talking about the structure; I'm talking about the fact that you are using different non explicit types in a single formula and are expecting a correct result. If you add three variables of completely different types, you can't be surprised if the result makes no sense, you must place them into the same typespace first(doubles for example). don't perform binary (a -> a -> a) operations on three variables of different types, first make their types the same. Even if the language does coersion, you want to be in control of this coersion. – Dmitry Nov 02 '16 at 22:07
  • @Michael also I am not sure what you mean about making destructor virtual more oop friendly. OOP is not about subclassing, it's about giving you the control over the direction of dependencies by strategically placing interfaces in between source code dependencies to make the source code dependencies be plugins into your system rather than coupled to your system. You only need virtual destructors if you expect the class to be subclassed, and Point has no need to ever be subclassed by any sane system, you can delegate it if you wish though. – Dmitry Nov 02 '16 at 22:13
1

There's a few gems in the 'C' spec that Java dropped for pragmatic reasons but which are slowly creeping back with developer demand (closures, etc).

I mention a first one because it's related to this discussion; the adherence of pointer values to unsigned integer arithmetic. And, in relation to this thread topic, the difficulty of maintaining Unsigned semantics in the Signed world of Java.

I would guess if one were to get a Dennis Ritchie alter ego to advise Gosling's design team it would have suggested giving Signed's a "zero at infinity", so that all address offset requests would first add their ALGEBRAIC RING SIZE to obviate negative values.

That way, any offset thrown at the array can never generate a SEGFAULT. For example in an encapsulated class which I call RingArray of doubles that needs unsigned behaviour - in "self rotating loop" context:

// ...
// Housekeeping state variable
long entrycount;     // A sequence number
int cycle;           // Number of loops cycled
int size;            // Active size of the array because size<modulus during cycle 0
int modulus;         // Maximal size of the array

// Ring state variables
private int head;   // The 'head' of the Ring
private int tail;   // The ring iterator 'cursor'
// tail may get the current cursor position
// and head gets the old tail value
// there are other semantic variations possible

// The Array state variable
double [] darray;    // The array of doubles

// somewhere in constructor
public RingArray(int modulus) {
    super();
    this.modulus = modulus;
    tail =  head =  cycle = 0;
    darray = new double[modulus];
// ...
}
// ...
double getElementAt(int offset){
    return darray[(tail+modulus+offset%modulus)%modulus];
}
//  remember, the above is treating steady-state where size==modulus
// ...

The above RingArray would never ever 'get' from a negative index, even if a malicious requestor tried to. Remember, there are also many legitimate requests for asking for prior (negative) index values.

NB: The outer %modulus de-references legitimate requests whereas the inner %modulus masks out blatant malice from negatives more negative than -modulus. If this were to ever appear in a Java +..+9 || 8+..+ spec, then the problem would genuinely become a 'programmer who cannot "self rotate" FAULT'.

I'm sure the so-called Java unsigned int 'deficiency' can be made up for with the above one-liner.

PS: Just to give context to above RingArray housekeeping, here's a candidate 'set' operation to match the above 'get' element operation:

void addElement(long entrycount,double value){ // to be called only by the keeper of entrycount
    this.entrycount= entrycount;
    cycle = (int)entrycount/modulus;
    if(cycle==0){                       // start-up is when the ring is being populated the first time around
        size = (int)entrycount;         // during start-up, size is less than modulus so use modulo size arithmetic
        tail = (int)entrycount%size;    //  during start-up
    }
    else {
        size = modulus;
        head = tail;
        tail = (int)entrycount%modulus; //  after start-up
    }
    darray[head] = value;               //  always overwrite old tail
}
MKhomo
  • 161
  • 8
-2

I can think of one unfortunate side-effect. In java embedded databases, the number of ids you can have with a 32bit id field is 2^31, not 2^32 (~2billion, not ~4billion).

mike g
  • 1,751
  • 1
  • 18
  • 25
  • 1
    He's probably thinking of arrays and not being able to use negative integers as indices. Probably. – SK9 Jan 19 '11 at 07:36
  • 2
    When auto-increment fields in databases overflow they often go wacko. – Joshua Aug 01 '11 at 04:10
-8

The reason IMHO is because they are/were too lazy to implement/correct that mistake. Suggesting that C/C++ programmers does not understand unsigned, structure, union, bit flag... Is just preposterous.

Ether you were talking with a basic/bash/java programmer on the verge of beginning programming a la C, without any real knowledge this language or you are just talking out of your own mind. ;)

when you deal every day on format either from file or hardware you begin to question, what in the hell they were thinking.

A good example here would be trying to use an unsigned byte as a self rotating loop. For those of you who do not understand the last sentence, how on earth you call yourself a programmer.

DC

Denis Co
  • 69
  • 2
  • 34
    Just for kicks, Google the phrase "self rotating loop". **Clearly**, Denis Co is the only person in the world worthy of calling himself / herself a programmer :-) – Stephen C Jun 30 '12 at 10:16
  • 9
    This answer is so bad that it's funny – Nayuki Apr 22 '16 at 23:46