Why does BufferedInputStream copy a field to a local variable rather than use the field directly

Question

When I read the source code from java.io.BufferedInputStream.getInIfOpen(), I am confused about why it wrote code like this:

/**
 * Check to make sure that underlying input stream has not been
 * nulled out due to close; if not return it;
 */
private InputStream getInIfOpen() throws IOException {
    InputStream input = in;
    if (input == null)
        throw new IOException("Stream closed");
    return input;
}

Why does it use the alias instead of use the field variable in directly like below:

/**
 * Check to make sure that underlying input stream has not been
 * nulled out due to close; if not return it;
 */
private InputStream getInIfOpen() throws IOException {
    if (in == null)
        throw new IOException("Stream closed");
    return in;
}

Can someone give a reasonable explanation?

In `Eclipse`, you can't pause a debugger on an `if` statement. *Might* be a reason for that alias variable. Just wanted to throw that out there. I speculate, of course. — Debosmit Ray, Mar 26 '16 at 03:07
@rkosegi On my version of Eclipse, the problem is similar to [this one](http://stackoverflow.com/questions/20413287/bug-with-eclipse-debuggers-step-over). Might not be a very common occurrence. And anyway, I didn't meant it on a light note (clearly a bad joke). :) — Debosmit Ray, Mar 27 '16 at 02:43

score 118 · Accepted Answer · edited Mar 26 '16 at 17:42

118

If you look at this code out of context there is no good explanation for that "alias". It is simply redundant code or poor code style.

But the context is that BufferedInputStream is a class that can be subclassed, and that it needs to work in a multi-threaded context.

The clue is that in is declared in FilterInputStream is protected volatile. That means that there is a chance that a subclass could reach in and assign null to in. Given that possibility, the "alias" is actually there to prevent a race condition.

Consider the code without the "alias"

private InputStream getInIfOpen() throws IOException {
    if (in == null)
        throw new IOException("Stream closed");
    return in;
}

Thread A calls getInIfOpen()
Thread A evaluates in == null and sees that in is not null.
Thread B assigns null to in.
Thread A executes return in. Which returns null because a is a volatile.

The "alias" prevents this. Now in is read just once by thread A. If thread B assigns null after thread A has in it doesn't matter. Thread A will either throw an exception or return a (guaranteed) non-null value.

edited Mar 26 '16 at 17:42

Peter O.

28,965
14
72
87

answered Mar 26 '16 at 02:53

Stephen C

632,615
86
730
1,096

11

Which shows why `protected` variables are evil in a multi-threaded context. – Mick Mnemonic Mar 26 '16 at 03:23
2

It does indeed. However, AFAIK these classes go all the way back to Java 1.0. This is just another example of a poor design decision that could not be fixed for fear of breaking customer code. – Stephen C Mar 26 '16 at 03:26
2

@StephenC Thanks for the detailed explanation +1. So does that mean, we shouldn't be using `protected` variables in our code if it is multi-threaded? – Madhusudana Reddy Sunnapu Mar 26 '16 at 03:29
@MadhusudanaReddySunnapu - Well basically because it means that the superclass needs to do things like this to protect against race conditions that could be triggered when a subclass assigns to the protected variable. If the variable was private then the superclass would be able to "mediate" the assignment in some way. – Stephen C Mar 26 '16 at 03:37
1

@StephenC But then, I don't think that it is a problem because the variable is declared `protected` in the super class. The same problem/race condition can occur even with a standalone class (i.e., no super class for it). I mean, variable is declared in this single class and there are two methods in the class on that does operation similar to `close()` method i.e., nullifying the variable and the other method that does similar to `getInIfOpen` method i.e., checks for null and returns it. Or did I get it wrong? – Madhusudana Reddy Sunnapu Mar 26 '16 at 03:55
Well yes, that is correct. But in this case, it is down to the fact that the variable is protected and a subclass (which you can't see) might assign to it. – Stephen C Mar 26 '16 at 04:05
3

@MadhusudanaReddySunnapu The overall lesson is that in an environment where multiple threads may access the same state, you need to control that access somehow. That may be a private variable only accessible via setter, it could be a local guard like this, it could be by making the variable write-once in a threadsafe way. – Chris Hayes Mar 26 '16 at 04:13
That makes sense, Stephen & Chris. – Madhusudana Reddy Sunnapu Mar 26 '16 at 04:25
1

I'm surprised... does Java's thread model really work like this? Usually you have to perform an acquire-read (or volatile read, etc. depending on the language) in most languages for a variable that can be written to by another thread... is this not the case in Java? – user541686 Mar 26 '16 at 06:52
@Mehrdad Which languages? I can't think of many that *force* you to acquire a lock in order to read a shared variable, though there are many where you'd want to do that for safety's sake. Perhaps I'm misunderstanding your comment. – Chris Hayes Mar 26 '16 at 07:05
@ChrisHayes: Oh, by "have to" I meant "if you want desirable results", not "if you want your code to compile" =P For example, C++ has `atomic::load`, C# has `VolatileRead`, Rust has `atomic.load`, etc. – user541686 Mar 26 '16 at 07:08
@Mehdrad - There are other ways to implement shared state in Java. This Q&A is not intended to be a general tutorial on this topic. – Stephen C Mar 26 '16 at 07:15
@Mehrdad Oh, gotcha. In this case the class member in question is declared `volatile`, so it is a volatile read. So while it may not actually be the true value at the time the function returns, it is a valid value which was present at the start of the function. – Chris Hayes Mar 26 '16 at 07:23
@ChrisHayes: Ohh I didn't know it was volatile, that explains it, thanks. – user541686 Mar 26 '16 at 07:24
Quoting from my answer: *"The clue is that `in` is declared in `FilterInputStream` is `protected volatile`. "* – Stephen C Mar 26 '16 at 07:52
1

This does not cover all race conditions and states available to streams. This answer is misleading and disrespectful of the original code. – Sam Mar 26 '16 at 22:21
3

@sam - 1) It doesn't need to explain all race conditions and states. The aim of the answer is to point out why this *seemingly inexplicable* code is in fact necessary. 2) How so? – Stephen C Mar 27 '16 at 01:08
1

@StephenC If you do not use `input` then after you check `in` for non-null it may be assigned to `null` and returned. It is not poor style (which is why I feel your post was disrespectful) but necessary to guarantee the contract in all cases. Volatile will not protect you from two separate reads. You must copy the reference so what you test is what you return. – Sam Mar 28 '16 at 15:30
@sam - Did you read past the first two sentences of my answer? Where I explained all of the stuff? Do you understand what *"If you look at this code out of context ..."* means? – Stephen C Mar 28 '16 at 15:48
1

@StephenC, I read your first response. You inverted its meaning. I agree with your current answer. That you've condescended to me, spoke insultingly of the original developer, and stand rewarded for this behavior can't be edited. – Sam Mar 28 '16 at 20:47
But the fact remains that you did NOT read the version of the answer that you commented on. Your bad. The condescension was deserved. – Stephen C Mar 30 '16 at 03:10

Stefan Dollase · Answer 2 · 2016-03-26T12:27:20.563

This is because the class BufferedInputStream is designed for multi-threaded usage.

Here, you see the declaration of in, which is placed in the parent class FilterInputStream:

protected volatile InputStream in;

Since it is protected, its value can be changed by any subclass of FilterInputStream, including BufferedInputStream and its subclasses. Also, it is declared volatile, which means that if any thread changes the value of the variable, this change will immediately be reflected in all other threads. This combination is bad, since it means the class BufferedInputStream has no way to control or know when in is changed. Thus, the value can even be changed between the check for null and the return statement in BufferedInputStream::getInIfOpen, which effectively makes the check for null useless. By reading the value of in only once to cache it in the local variable input, the method BufferedInputStream::getInIfOpen is safe against changes from other threads, since local variables are always owned by a single thread.

There is an example in BufferedInputStream::close, which sets in to null:

public void close() throws IOException {
    byte[] buffer;
    while ( (buffer = buf) != null) {
        if (bufUpdater.compareAndSet(this, buffer, null)) {
            InputStream input = in;
            in = null;
            if (input != null)
                input.close();
            return;
        }
        // Else retry in case a new buf was CASed in fill()
    }
}

If BufferedInputStream::close is called by another thread while BufferedInputStream::getInIfOpen is executed, this would result in the race condition described above.

I agree, inasmuch as we're seeing things like `compareAndSet()`, `CAS`, etc. in the code and in the comments. I also searched the `BufferedInputStream` code and found numerous `synchronized` methods. So, it is intended for multi-threaded use, though I sure have never used it that way. Anyway, I think your answer is correct! — sparc_spread, Mar 26 '16 at 03:07
This probably makes sense because `getInIfOpen()` is only called from `public synchronized` methods of `BufferedInputStream`. — Mick Mnemonic, Mar 26 '16 at 03:11

score 6 · Answer 3 · answered Mar 26 '16 at 02:59

6

This is such a short code, but, theoretically, in a multi-threaded environment, in may change right after the comparison, so the method could return something it didn't check (it could return null, thus doing the exact thing it was meant to prevent).

answered Mar 26 '16 at 02:59

acdcjunior

114,460
30
289
276

Am I correct if I say that the reference `in` *might* change between the time you call the method and return of the value (in a multi-threaded environment)? – Debosmit Ray Mar 26 '16 at 03:03
Yes, you can say that. Ultimately the likelihood will really depend on the concrete case (all we know for a fact is that `in` can change anytime). – acdcjunior Mar 26 '16 at 03:14

score 4 · Answer 4 · answered Mar 26 '16 at 03:02

I believe capturing the class variable in to the local variable input is to prevent inconsistent behavior if in is change by another thread while getInIfOpen() is running.

Notice that the owner of in is the parent class and does not mark it as final.

This pattern is replicated in other parts of the class and seems to be reasonable defensive coding.

Why does BufferedInputStream copy a field to a local variable rather than use the field directly

4 Answers4

Linked

Related