10

I am familiar with many of the mechanisms and idioms surrounding concurrency in Java. Where I am confused is with a simple concept: concurrent access of different members of the same object.

I have a set of variables which can be accessed by two threads, in this case concerning graphical information within a game engine. I need to be able to modify the position of an object in one thread and read it off in another. The standard approach to this problem is to write the following code:

private int xpos;
private object xposAccess;

public int getXpos() {
    int result;
    synchronized (xposAccess) {
        result = xpos;
    }
    return result;
}

public void setXpos(int xpos) {
    synchronized (xposAccess) {
        this.xpos = xpos;
    }
}

However, I'm writing a real-time game engine, not a 20 questions application. I need things to work fast, especially when I access and modify them as often as I do the position of a graphical asset. I want to remove the synchronized overhead. Even better, I'd like to remove the function call overhead altogether.

private int xpos;
private int bufxpos;
...

public void finalize()
{
    bufxpos = xpos;
    ...
}

Using locks, I can make the threads wait on each other, and then call finalize() while the object is neither being accessed nor modified. After this quick buffering step, both threads are free to act on the object, with one modifying/accessing xpos and one accessing bufxpos.

I have already had success using a similar method where the information was copied in to a second object, and each thread acted on a separate object. However, both members are still part of the same object in the above code, and some funny things begin to happen when both my threads access the object concurrently, even when acting on different members. Unpredictable behaviour, phantom graphical objects, random errors in screen position, etc. To verify that this was indeed a concurrency issue, I ran the code for both threads in a single thread, where it executed flawlessly.

I want performance above all else, and I am considering buffering the critical data in to separate objects. Are my errors caused by concurrent access of the same objects? Is there a better solution for concurrency?

EDIT: If you are doubting my valuation of performance, I should give you more context. My engine is written for Android, and I use it to draw hundreds or thousands of graphic assets. I have a single-threaded solution working, but I have seen a near doubling in performance since implementing the multi-threaded solution, despite the phantom concurrency issues and occasional uncaught exceptions.

EDIT: Thanks for the fantastic discussion about multi-threading performance. In the end, I was able to solve the problem by buffering the data while the worker threads were dormant, and then allowing them each their own set of data within the object to operate on.

Boston Walker
  • 534
  • 1
  • 6
  • 17
  • 1
    What happens if you make xpos and bufxpos volatile? – Eric Stein Aug 28 '13 at 18:22
  • What do you mean with _call finalize() while the object is neither being accessed nor modified_? That's not possible without using a synchronization mechanism. – nosid Aug 28 '13 at 18:25
  • 5
    @nosid I don't think his finalize method is intended to override Object#finalize(). OP, you probably want a better name, since there's a method called finalize() on all Object instances. That method is Special and you really shouldn't be clobbering it. – Eric Stein Aug 28 '13 at 18:28
  • What if the writer thread constructs an _immutable object_ with the necessary data then publishes it to the reader thread? As far as I know, immutable objects can be used without synchronization. – Katona Aug 28 '13 at 18:36
  • Why do you need to have the synchronize in the read access anyway? If only one thread is writing the others may read without any synchronization. The only problem is that they see some milliseconds old data. (This is similar to Katona's comment) – Christian Fries Aug 28 '13 at 18:58
  • @Katona This is the exact method I have had success with. The problem is it necessitates a complicated mechanism which adds performance overhead. – Boston Walker Aug 28 '13 at 19:38
  • @Christian My idiom might be off, but my point is that the synchronize block is part of what I am trying to avoid for performance reasons. – Boston Walker Aug 28 '13 at 19:39
  • I'm not sure I understand why you don't simply make `xpos` volatile and get rid of `xposAccess` and the synchronized block… – assylias Aug 28 '13 at 20:04
  • @assylias The synchronized block is my example of a textbook method, where the second bit of code was my actual implementation. And unfortunately, volatility adds a large performance overhead I would care to avoid for a game engine. – Boston Walker Aug 28 '13 at 20:06
  • @BostonWalker a volatile read is as efficient as a normal read on x86 - a volatile write can be 50x slower, but we are talking of 50 nanoseconds difference max on a recent PC, which should not be a problem if you target a few 10s of frames per second. This is unlikely to be the bottleneck... If that really is an issue then the alternative is to do that specific operation in a single thread environment (which can be efficient when done properly). – assylias Aug 28 '13 at 20:10
  • I should add that volatile can also prevent some JIT optimisations, which can be a "hidden" cost – assylias Aug 28 '13 at 20:17
  • @assylias Ah, but I am not talking about x86, recent PCs, or a few 10s of frames per second. I am talking about thousands of objects drawn 60 times a second on Cortex or ARM chip. I have had past success getting good performance by doing these tasks in a single thread, but not the performance I was looking for. If you ignore the phantom concurrency issues, this method is vastly superior because the Android VM gives me twice as many cores to play with, and I have seen a near doubling of performance. – Boston Walker Aug 28 '13 at 20:19
  • @BostonWalker: You say you're doing this on android. Android's UI is single threaded anyway. So what is it that you're actually trying to do? What are your different threads doing that you think you need to thread this out? – Falmarri Aug 28 '13 at 20:30
  • You should be more specific as to what you are trying to achieve and your environment. – assylias Aug 28 '13 at 20:57
  • 1
    @Falmarri I already implement two worker threads on top of the UI thread successfully by buffering data between them. By adding the third thread though, the Dalvik machine is much more likely to allow the application to use a second processor core, hence huge performance boosts in most cases (35 -> 55 FPS). – Boston Walker Aug 28 '13 at 21:00
  • @assylias Actually, I would prefer to keep this question general to keep irrelevant discussion about the value of multithreading, etc. away, as well as to allow the question to be more informative to more people. This is really a Java question, it goes beyond Android. – Boston Walker Aug 28 '13 at 21:02

3 Answers3

4

If you are dealing with just individual primitives, such as AtomicInteger, which has operations like compareAndSet, are great. They are non-blocking and you can get a good deal of atomicity, and fall back to blocking locks when needed.

For atomically setting accessing variables or objects, you can leverage non-blocking locks, falling back to traditional locks.

However, the simplest step forward from where you are in your code is to use synchronized but not with the implicit this object, but with several different member objects, one per partition of members that need atomic access: synchronized(partition_2) { /* ... */ }, synchronized(partition_1) { /* ... */ }, etc. where you have members private Object partition1;, private Object partition2; etc.

However, if the members cannot be partitioned, then each operation must acquire more than one lock. If so, use the Lock object linked earlier, but make sure that all operation acquires the locks it needs in some universal order, otherwise your code might deadlock.

Update: Perhaps it is genuinely not possible to increase the performance if even volatile presents an unacceptable hit to performance. The fundamental underlying aspect, which you cannot work around, is that mutual exclusion necessarily implies a tradeoff with the substantial benefits of a memory hierarchy, i. e. caches. The fastest per-processor-core memory cache cannot hold variables that you are synchronizing. Processor registers are arguably the fastest "cache" and even if the processor is sophisticated enough to keep the closest caches consistent, it still precludes keeping values in registers. Hopefully this helps you see that it is a fundamental block to performance and there is no magic wand.

In case of mobile platforms, the platform is deliberately designed against letting arbitrary apps run as fast as possible, because of battery life concerns. It is not a priority to let any one app exhaust battery in a couple of hours.

Given the first factor, the best thing to do would be redesign your app so that it doesn't need as much mutual exclusion -- consider tracking x-pos inconsistently except if two objects come close to each other say within a 10x10 box. So you have locking on a coarse grid of 10x10 boxes and as long an object is within it you track position inconsistently. Not sure if that applies or makes sense for your app, but it is just an example to convey the spirit of an algorithm redesign rather than search for a faster synchronization method.

necromancer
  • 21,492
  • 19
  • 65
  • 111
  • Your answer is a great summary of the many possible solutions, and for that I upvoted you. I would appreciate, however, if you could shed some light on my initial two questions, namely the cause of the concurrency glitches on what should be a safe method, and the performance of a solution which must cope with 10K-100K reads and writes each second. – Boston Walker Aug 28 '13 at 21:46
  • Lock splitting is a great idea but can incur a lot of context switching in practice, making it a not so efficient solution. – assylias Aug 28 '13 at 22:19
  • @BostonWalker I cannot understand what you are doing with the "finalize" in your question. `finalize` is typically related to garbage collection and it is not clear to me what the connection is to your mutual exclusion scheme. I will still update my answer with the performance concern. – necromancer Aug 29 '13 at 00:32
2

I don't think that I get exactly what you mean, but generally

Is there a better solution for concurrency?

Yes, there is:

Salah Eddine Taouririt
  • 20,669
  • 14
  • 52
  • 83
  • I already use the Java Lock API to control high-level thread timing. It is indeed a good suggestion, but it becomes tricky when used hundreds or thousands of times per frame. – Boston Walker Aug 28 '13 at 20:24
  • As for Atomic variables, I don't know much about their performance costs. I imagine it would be higher than my current solution, but please correct my if I'm wrong. – Boston Walker Aug 28 '13 at 20:25
  • No, using `AtomigInteger` is more efficient than using `int` or `Integer` and surrounding all the code paths by `synchronized` keyword, see your self [AtomicInteger](http://hg.openjdk.java.net/aarch64-port/jdk8/jdk/file/ddd3675163c0/src/share/classes/java/util/concurrent/atomic/AtomicInteger.java) don't use any kind of lock. – Salah Eddine Taouririt Aug 28 '13 at 20:39
  • It may be true that a synchronized block doesn't perform well, but I am well aware of that and I have no intention of implementing one in this case. I am more interested in how the AtomicInteger compares in performance to buffering the data. – Boston Walker Aug 28 '13 at 20:58
  • An AtomicInteger is essentially a wrapper around a volatile int with additional CAS operations. – assylias Aug 28 '13 at 20:59
  • @assylias Using volatile ints already incurs significant performance costs, which I am trying to avoid. – Boston Walker Aug 28 '13 at 21:03
1

I think synchronization or any kind of locking can be avoided here with using an immutable object for inter-thread communication. Let's say the message to be sent looks like this:

public final class ImmutableMessage {
    private final int xPos;
    // ... other fields with adhering the rules of immutability

    public ImmutableObject(int xPos /* arguments */) { ... }

    public int getXPos() { return xPos; }
}

Then somewhere in the writer thread:

sharedObject.message = new ImmutableMessage(1);

The reader thread:

ImmutableMessage message = sharedObject.message;
int xPos = message.getXPos();

The shared object (public field for the shake of simplicity):

public class SharedObject {

    public volatile ImmutableMessage message;
}

I guess things change rapidly in a real-time game engine which might end up creating a lot of ImmutableMessage object which in the end may degrade the performance, but may be it is balanced by the non-locking nature of this solution.

Finally, if you have one free hour for this topic, I think it's worth to watch this video about the Java Memory Model by Angelika Langer.

Katona
  • 4,507
  • 19
  • 25
  • @nosid, even if that's only a simple assignment/read of a field in the accessor methods? – Katona Aug 28 '13 at 19:25
  • @nosid Good point, I should mention that sequential consistency is important in this application. – Boston Walker Aug 28 '13 at 19:30
  • @nosid modified it with using `volatile`, to be at least correct, not necessarily high performant due to the implicit memory barrier that `volatile` means – Katona Aug 28 '13 at 19:36
  • I have watched the first 25 minutes of the mentioned video. I think it's misleading. You should look for videos from Hans Boehm. He talks about the C++ Memory Model, which is the origin of the _fixed_ Java Memory Model. – nosid Aug 28 '13 at 22:20
  • @nosid I thought java memory model is "older" than C++, see questions [this](http://stackoverflow.com/questions/7363462/what-are-the-similarities-between-the-java-memory-model-and-the-c11-memory-mod) and [this](http://stackoverflow.com/questions/6319146/c11-introduced-a-standardized-memory-model-what-does-it-mean-and-how-is-it-g) – Katona Aug 29 '13 at 11:12