3

I recently used Apache Commons-IO's class CountingInputStream. It just keeps track how many bytes were read by updating a counter field.

I noticed that it updates its counter with the synchronized keyword. The best source that I could find is IO-201, but it doesn't explain why.

I read in a few places that the Apache Commons code is good quality, but I'm now wondering why they have synchronized methods in an InputStream. I don't believe that thread safety is a useful property on a stream, nor do the commentors of IO-201.

Given that I'm not accessing an InputStream concurrently, is there any valid reason to have its methods synchronized? Or is there a valid use case to access an InputStream concurrently that will not produce data garbage?

TriskalJM
  • 2,123
  • 1
  • 16
  • 18
Markus Malkusch
  • 7,201
  • 35
  • 61
  • I cannot wrap my mind how one could read an InputStream concurrently without getting data garbage. So I assume concurrency can't be a motivation for this `synchronized`. But maybe it's just my limited imagination. I have no concern, I just want to understand the motivation to learn something out of that. – Markus Malkusch Jan 17 '17 at 22:23
  • Ok I see. Another thread might call `getCount()`. Without any synchronization action the memory model kicks in and the other thread might never see an update. Ok I'm satisfied. Volatile instead of synchronized would probably also do the same job, right? – Markus Malkusch Jan 17 '17 at 22:32
  • No, incrementing a `volatile` variable is not thread safe. – Sean Bright Jan 17 '17 at 22:33
  • It doesn't make sense for multiple threads to concurrently read this stream, the `read` method is not synchronized. IMO, the only reason `synchronized` is used was to ensure safe publication of the counter. `volatile` is not enough here because we are doing `read-update-write`, `AtomicLong` is probably a valid alternative. – xiaofeng.li Jan 17 '17 at 22:34
  • Let's assume updating the counter is externally synchronized as it happens only during the stream's read operations. So it is externally protected against race conditions. Only reading the counter might happen by another thread (getCount() without any updates). Therefore volatile might be sufficient as a happens-before rule, no? Actually that's also what the author of [IO-201](https://issues.apache.org/jira/browse/IO-201) claims: "If only one thread reads the stream, then the count field could be made volatile. This would allow other threads to read the count safely." – Markus Malkusch Jan 17 '17 at 22:44
  • If only one thread was `read()`ing - and by extension mutating `count` - then yes, it would be safe assuming `count` was `volatile`. That being said, all of that would have to be documented so consumers of the API knew what they could and couldn't do. The code as-is appears to be fine, I don't see a compelling reason for it to change. – Sean Bright Jan 17 '17 at 22:58

1 Answers1

4

Given that I'm not accessing an InputStream concurrently, is there any valid reason to have its methods synchronized?

No, there is no valid reason for you to use synchronized methods on objects that you are only accessing from a single thread. That being said, you are using third party code that does synchronize to protect mutable state (the count field), so you don't have much of an option other than implementing it yourself.

Or is there a valid use case to access an InputStream concurrently which will not produce data garbage?

An InputStream? Probably not. A CountingInputStream? Sure... in one thread I call CountingInputStream.read() repeatedly to consume the stream, and in another thread I call CountingInputStream.getCount() repeatedly to update my UI to show the number of lines I have read.

Sean Bright
  • 109,632
  • 17
  • 131
  • 138