2

Summary:

It seems to me that:

  1. wrapping fields representing a logical state into a single immutable consumable object
  2. updating the object's authoritative reference with a call to Interlocked.CompareExchange<T>
  3. and handling update failures appropriately

provides a kind of concurrency that renders the "lock" construct not only unnecessary, but a truly misleading construct that dodges certain realities about concurrency and introduces a host of new problems as a result.

Problem Discussion:

First, let's consider the main problems with using a lock:

  1. Locks cause a performance hit, and must be used in tandem for reading and writing.
  2. Locks block thread execution, hindering concurrency and risking deadlocks.

Consider the ridiculous behavior inspired by the "lock". When the need arises to update a logical set of resources concurrently, we "lock" the set of resources, and we do so via a loosely associated, but dedicated locking object, which otherwise serves no purpose (red flag #1).

We then use the "lock" pattern to mark-off a region of code where a logically consistent state change on a SET of data fields occurs, and yet we shoot ourselves in the foot by mixing the fields with unrelated fields in the same object, while leaving them all mutable and then forcing ourselves into a corner (red flag #2) where we have to also use locks when reading these various fields, so we don't catch them in an inconsistent state.

Clearly, there's a serious problem with that design. It's somewhat unstable, because it requires careful management of the lock objects (locking order, nested locks, coordination among threads, blocking/waiting on a resource in use by another thread that's waiting for you to do something, etc.), which depends on the context. We also hear people talk about how avoiding deadlock is "hard", when it's actually very straightforward: don't steal the shoes of a person you plan on asking to run a race for you!

Solution:

Stop using "lock" altogether. Properly roll your fields into an incorruptible/immutable object representing a consistent state or schema. Perhaps it's simply a pair of dictionaries for converting to and from display-names and internal-identifiers, or maybe it's a head node of a queue containing a value and a link to the next object; whatever it is, wrap it into it's own object and seal it for consistency.

Recognize write or update failure as a possibility, detect it when it occurs, and make a contextually informed decision to retry immediately (or later) or do something else instead of blocking indefinitely.

While blocking seems like a simple way to queue a task that seems like it must be done, not all threads are so dedicated and self-serving that they can afford to do such a thing at the risk of compromising the entire system. Not only is it lazy to serialize things with a "lock", but as a side affect of trying to pretend a write shouldn't fail, you block/freeze your thread, so it sets there unresponsive and useless, forsaking all other responsibilities in its stubborn wait to accomplish what it set out to do some time earlier, ignorant of the fact that assisting others is sometimes necessary for fulfilling it's own responsibilities.

Race conditions are normal when independent, spontaneous actions are occurring simultaneously, but unlike uncontrolled Ethernet collisions, as programmers we have total control over our "system" (i.e. deterministic digital hardware) and its inputs (no matter how random, and how random can a zero or one really be?) and outputs, and the memory that stores our system's state, so livelock should be a non-issue; furthermore, we have atomic operations with memory barriers that resolve the fact that there may be many processors operating concurrently.

To summarize:

  1. Grab the current state object, consume its data, and construct a new state.
  2. Realize that other active threads will be doing the very same thing, and may beat you to it, but all observe an authoritative reference point representing the "current" state.
  3. Use Interlocked.CompareExchange to simultaneously see if the state object you based your work on is still the most current state, and replace it with your new one, otherwise fail (because another thread finished first) and take appropriate corrective action.

The most important part is how you handle the failure and get back on your horse. This is where we avoid livelocks, thinking too much and not doing enough or doing the right thing. I would say that locks create the illusion that you'll never fall off your horse, despite riding in a stampede, and while a thread daydreams in such a fantasy land, the rest of the system can fall apart and crash and burn.


So, is there something the "lock" construct can do that can't be achieved (better, in a less unstable fashion) with a lock-free implementation using CompareExchange and immutable logical state objects?

All of this is a realization I've come to on my own after dealing with locks intensely, but after some searching, in another thread Is lock free multithreaded programming making anything easier?, someone mentions that lock-free programming is going to be very important when we face highly parallel systems with hundreds of processors, were we cannot afford to use highly contended locks.

Cœur
  • 32,421
  • 21
  • 173
  • 232
Triynko
  • 17,370
  • 20
  • 92
  • 154
  • 2
    It sounds like you are describing http://en.wikipedia.org/wiki/Software_transactional_memory. – Michael Sep 16 '09 at 00:01
  • How would you implement a lock free FIFO queue? – Remus Rusanu Sep 16 '09 at 00:06
  • @Michael, u know, that's interesting, because I've been doing a LOT with SQL Server lately, in particular with it's complex contextual error handling, scope/transaction aborts, etc... and that's where all this seems to be coming from. There's certainly something to it all. It does sound VERY similar. – Triynko Sep 16 '09 at 00:48
  • FIFO? Easy, quantum entanglement. In a computer, things can be in two places at once. After that, they diverge into two separate tails, where one thread can add to one tail, while the other eats at the other end, but in reality, they are one and the same. There is truely only ONE current state... and what you SEE is what is REAL. – Triynko Sep 16 '09 at 00:51
  • 1
    Let me be more specific. Queue x = new Queue(); [Queuing thread: read x, update x, compare/exchange x.] [Dequing thread: read x, create new Queue(), compare/exchange x with new Queue()]. So you see, the Dequeing thread periodically STEALS the queue and replaces it with an empty one, which queuing thread(s) steadily stream into it (always a winner!). The dequeuing thread MERGES the queues it has stolen, which it keeps private, and distributes tasks to sleeping threads. See my other comments on multi-threaded queuing (e.g. default state is "wait for work", not "contend for work"). – Triynko Sep 16 '09 at 01:20
  • T1 enqueues A, T2 enqueues B. T3 dequeues, steals the queue (A, B). T4 enqueues C. T5 dequeues, steals the queue (C). T3 merges back B. T5 had just dequeued C out of order. About wait-for-work vs. content for work: you just replaced the mutex lock with a thread doing protected access. Arguably, a worse bottleneck than a lock. – Remus Rusanu Sep 16 '09 at 17:26
  • I forgot to mention here (see other comments) there is only ONE dequeuing thread, which steals the (head of) the queue and dequeues tasks to worker threads that are not busy. This threads purpose is to monitor worker threads and give them jobs to do. The thread is not a bottleneck, and it's not doing protected access... it's doing an atomic Interlocked.CompareExchange to steal/replace the entry point of the queue. The enqueing process reads the entry node, constructs a new one pointing to the old node, and then attempts a CompareExchange to effectively insert an item in the queue. – Triynko Sep 17 '09 at 17:37
  • Failure is expected and handled, for both inserting a new head for the queue, as well as the dequeing process, where the head of a "new queue" is inserted. Stealing the queue, and adding to the queue, are then essentially identical operations. We can use an atomic "counter" to function as a turn-taker, therefore order is still preserved, but no thread is blocked, so they can do extra work that may need to be done up-front, while still observing order. Since they can do invariant work up-front, retries can be quicker than original attempts. The point is that "blocking" is a waste of cycles. – Triynko Sep 17 '09 at 17:42
  • short version: "shared mutable state bad. messages good." – kyoryu Jul 27 '10 at 04:11
  • Yes! Kyoryu understands! The locking/blocking approach is inherently INCOMPATIBLE with massively parallel systems that by their very nature operate completely independently for some period of time, then need to come back, check in, and make a sudden and valid alteration to some shared global state. Because things can change while they're working on their own, they sometimes need to go back to the drawing board and try again later... but all is not always lost... and rapid re-attempts at publishing their contribution can occur as long as it's determined to be valid! – Triynko Jul 27 '10 at 22:20
  • This is how CONSCIOUSNESS WORKS. Its a globally consistent state that's constructed through underlying parallel subconscious processes, which are constantly doing work, then testing it against the current global state for validity! – Triynko Jul 27 '10 at 22:22

6 Answers6

3

Your compare-exchange-suggestion has one big drawback - it is not fair because it favors short tasks. If there are many short tasks in a system, chances for a long task to ever complete may get very low.

Daniel Brückner
  • 56,191
  • 15
  • 92
  • 137
  • Nothing is left to chance in deterministic hardware (how random can a zero or one really be?). Everything can be handled. Some part of the system can be reduced to a lock-free queue, in which case, the "long" operations we're talking about probably need to be serialized anyway. If not, then again, it's the "what to do when update fails" question. Depending on the type of work done up front, checking whether the work is still valid may in fact required just a few simple checks, so while the original attempt was a long operation, the RETRY may in fact be a very short operation. – Triynko Sep 16 '09 at 00:17
  • Computers are not deterministic if they operate in an uncontrolled environment - a large range of influences from radiation corrupting a bit in memory and trigger error correction hard- or software, over variations of oscillator frequencies due to temperature changes, up to (user) input that arrives from mouse, keyboards, microphones, cameras, or network connection at unpredictable times makes it practically impossible to have a completely deterministic system. – Daniel Brückner Sep 16 '09 at 00:41
  • 1
    Take the following example. There is a list and some threads insert items into the list while other threads remove items matching a condition. The insert could be very fast while removing could be linear. In consequence your solution would favor adding new items and this will in turn make the removal taking even longer because the list grows and again favor inserts more. – Daniel Brückner Sep 16 '09 at 01:03
  • If I smash the computer, I don't have a computer any more than if radiation corrupts a bit. However, as long as we DO have a computer, it is deterministic, including inputs. For example... for every BIT that enters the system, I KNOW it's a zero or one. With branch, I handle BOTH possible inputs. When you scale this up to multiple bits, the principle doesn't go away. All bugs... stack overflows, buffer overflows, ARE avoidable; some programmers are just ignorant. It IS that simple. If 0-0xFFFFFFFF are not ALL valid, then don't use "int", create immutable constrained data class; pass it. – Triynko Sep 16 '09 at 01:11
  • Computers are not zeros and ones - they are analog. No bit is ever zero or one. It is below 1.2 volts or above. And this values are influenced by environment factors you have no control over. You could - in theory - try to build a system that handles all states correctly, but this is practically impossible. You have to ensure the correct behavior of billions of transistors, millions of lines of code in operating systems grown over decades, ... – Daniel Brückner Sep 16 '09 at 01:21
  • Of course. At some point, at some layer we've built on that analog system, we have to assume a digital system for writing software. If we can't depend on mathematical operations being well-defined, then then we're moving into the realm of unrecoverable errors. Most errors are not fatal in such a way that they invalidate the design of the software. The question is... what's within the range of expected values, and is the software designed to account for all possibilities? If it's not, then the design is incomplete, and unhandled conditions can cause crashes or undefined behaviors. – Triynko Sep 17 '09 at 17:22
  • Oh, sorry, I guess I DID use the phrase "deterministic hardware". I actually meant "system", so I was referring to the system's we write software on, not directly to the hardware layer, which is, of course, analog. – Triynko Sep 17 '09 at 17:25
  • In any truly asynchronous environment, there's bound to be some level of non-determinism. That's a big part of what makes multithreading hard. – kyoryu Jul 27 '10 at 03:24
  • Yes, there is some level of non-determinism, because the threads are running independently of one another; however, the whole point of having locks and other structures is to RESOLVE to a shared, consistent state in a deterministic manner... thereby eliminating any level of non-determinism. – Triynko Jul 28 '10 at 03:57
2

There are four conditions for a race to occur.

  1. The first condition is that there are memory locations that are accessible from more than one thread. Typically, these locations are global/static variables, or are heap memory reachable from global/static variables.
  2. The second condition is that there is a property (often called an invariant), which is associated with these shared memory locations that must be true, or valid, for the program to function correctly. Typically, the property needs to hold true before an update occurs for the update to be correct.
  3. The third condition is that the invariant property does not hold during some part of the actual update. (It is transiently invalid or false during some portion of the processing).
  4. The fourth and final condition that must occur for a race to happen is that another thread accesses the memory while the invariant is broken, thereby causing inconsistent or incorrect behavior.

    1. If you don't have a shared memory location which is accessible from multiple threads, or you can write your code to either eliminate that shared memory variable, or restrict access to it to only one thread, then there is no possibility of a race condition, and you don't need to worry about anything. Otherwise, the lock statement, or some other synchronization routine is absolutely necessary and cannot be safely ignored.

    2. If there is no invariant (let's say all you do is write to this shared memory location and nothing in the thread operation reads it's value) then again, there is no problem.

    3. If the invariant is never invalid, again no problem. (say the shared memory is a datetime field storing the datetime of the last time the code ran, then it can't be invalid unless a thread fails to write it at all...

    4. To eliminate nbr 4, you have to restrict write access to the block of code that accesses the shared memory from more than one thread at a time, using lock or some comparable synchronization methodology.

The "concurrency hit" is in this case not only unavoidable but absolutely necessary. Intelligent analysis of what exactly is the shared memory, and what exactly is your critical "invariant" allows you to code the system to minimize this concurrency "Hit". (i.e., maximize concurrency safely. )

Charles Bretana
  • 131,932
  • 22
  • 140
  • 207
  • I agree. I actually cache temporal object hierarchies, object schemas, and object permission set in my code to perform hierarchical permission checks on parented objects, then resolve the permissions list, checking for deny, then favoring most-specific permission on the object itself, then up the hierarchy. The concurrency "hit" is unavoidable; but I think "lock" takes the hit worse than Compare/Exchange, because its easy to design it wrong and run into deadlock, which is essentially an infinite loop waiting for a condition that will never occur. – Triynko Sep 16 '09 at 02:00
  • I cached that stuff, because trying to hit the database every time is multiple orders of magnitude slower. The caching and use of the cache takes milliseconds, whereas the same process without the cache takes 30 or more seconds. It's a HUGE gain. When an object is reparented and a new entry going into the temporal hierarchy, these caches are reconstructed anew in the background, then compared/exchanged instantly. Threads can continue to work with old copies, but it's FINE, because when they write to the database, it will FAIL if it actually breaks the relational integrity. It's nice. – Triynko Sep 16 '09 at 02:05
  • @Triynko, deadlocks can only occur if two different threads in different blocks of code attempt to 'lock' two different objects in the opposite order. (i.e., one block of code locks A, then B, and the other locks B, then A... ) Leaving aside the issue of designing synchronization code to use two different lock references (which is the whole complex topic of synchronization object granularity), this is not hard to avoid. Understanding the nature of your shared memory and invariants, and their relationship, is sufficient to design systems which are deadlock-Safe. – Charles Bretana Sep 16 '09 at 02:10
  • @triynko, RI in the DB is not the only shared memory that can cause concurrency issues. In fact, DB concurrency is a completely different, (and much easier to solve) problem than concurrency issues caused by multi-threading code on a client. caching data is not a solution if multiple threads have concurrent unsynchronized access to the cache. – Charles Bretana Sep 16 '09 at 02:14
  • Yeah, avoiding deadlock is easy, that's one thing I always stress when people complain about it. It's just that in principle, locking and blocking is a waste of time; there's a much more efficient way to handle concurrency. – Triynko Sep 16 '09 at 03:36
  • One of my caches consists of three dictionaries. Since I'm concerned about the integrity of the set, not how current the set is (e.g. I don't care if permission changes don't effect an operation that started before they were changed), I put all three dictionaries in one container object, and when updating, I grab the original object, do the update, then I reconstruct the cache as a new container object with 3 new dictionaries, and then update the container reference with Interlocked.CompareExchange. An internal version number determines whether the update was successful and what happens next – Triynko Sep 16 '09 at 03:47
  • That will be ok as long as the update to the dictionarys' data is not in any way dependant on the pre-update value of the dictionary... If it is, then the read operation to determine the 'current' value, which will be used in determining the 'new' value, must be synchronized with all other threads that might try to read it. This is the more common situation. some shared memory must be read, and used to determine a new value to be written back into that shared memory. Between the read and the write some defined invariant is broken, and no other threads can access it safely. – Charles Bretana Sep 16 '09 at 13:30
  • That view only holds when three specific assumptions are true: 1. The data structure is mutable, 2. an update always succeeds, and 3. the structure can momentarily represent an inconsistent state. I am destroying those assumptions by declaring that an immutable data structure is used representing a consistent state, and an atomic compare-exchange (an inherently conditional operation that may or may not succeed) is used to publish the update. If the compare-exchange succeeds, then the pre-update data structure upon which the update is based has not changed and the update is valid. – Triynko Jul 27 '10 at 21:50
  • I'll talk about an unsuccessful update next, but first consider the consequences of a successful update. Instantaneously after a thread successfully makes an update... all other threads working with the same data are suddenly working with outdated, although logically consistent, data. That only time the outdatedness matters to the overall system state is when it attempts to publish the work done. Now... assume the update FAILS. All is not lost. First... had locks been used, the other threads would have been blocked doing NO WORK, potentially wasting processor cycles. – Triynko Jul 27 '10 at 21:58
  • Second, because of the logically consistent, although outdated, state of the structure operated upon, SOME OR ALL of the work done may still be valid, which puts this approach a step ahead of the locking/blocking approach which would have left multiple cores waiting around doing nothing at all. The result of this design is ultimately rapid bursts of successful updates to the shared state... thanks to having lots of work done in parallel on immutable structures, and rapid re-attempts at publishing valid work done to the shared global state. – Triynko Jul 27 '10 at 22:04
  • If you REALLY think about it... this approach parallels how millions of independent thinkers operate in reality, advancing the state of the humanity. We don't all wait around for someone else to solve a problem... we all work on it. Sometimes we find that someone has beaten us to the solution, but the work we did does not necessarily overlap, and therefore does not necessarily conflict and is not necessarily redundant. As a result, a SECOND rapid publication to the global state can occur, whereas alternatively we'd have been doing nothing, waiting for the first advance to be published. – Triynko Jul 27 '10 at 22:08
  • Finally, the immutable nature of the data structure means its always consistent and always available for consumption as long as timeliness isn't critical... which is the case in a lot of situations. Like any kind of publication, even though updates are constantly being made... at some point, you need to grab a snapshot of the information, burn it or print it and ship it out for consumption. – Triynko Jul 27 '10 at 22:13
  • 1. How can the data in the dictionary be immutable if the purpose of this function is to update it ?? Your other comments betray a fundamental failure to grok the issue here. Which, again, is that the acccuracy of the value you are writing back into the dictionary is compromised by the fact that the algorithm used to calculate it did not have access to all the information it needed, to wit, the updates made to the value after your algorithm "read" the preceding value.. – Charles Bretana Jul 28 '10 at 13:44
  • The simplest example is the bank account balance... If you are incrementing it, and you read a curr value of $50, and add $10, and write back the sum ($60) but while you were calculating some other thread has written $20 in here, in doing so you overwrite that value of $20 ... The correct value should have been $30, and you are writing $60 into the data.. which is wrong. – Charles Bretana Jul 28 '10 at 13:45
  • "rapid work done in immutable structures" ?? immutable means unchangeable, impossible to update, in other words, permanently locked. You cannot "do work" on an immutable data structure, all you can do is read it... If you are referring to the data structure your process is actually modifying or updating, then it is by definition, not immutable. The fact that you discuss how "current" this data structure is betrays the fact that it cannot be immutable.. Anything that can become non-current cannot be immutable. If it was immutable, it would, by definition always be current, forever. – Charles Bretana Jul 28 '10 at 13:51
  • No, you're completely missing what I said. The data set (a state) is composed of three immutable dictionaries, so the state is an immutible snapshot that doesn't require locking for reading. The state has as ID. When you want to update the state, you read it, work with the data, and constructed a new state. Because the state (snapshot) was immutible, your work done is guaranteed to be consistant, but NOT NECESSARILY correct. – Triynko Sep 22 '10 at 15:31
  • It's correctness depends on whether the state (snapshot) has been updated since your work began, which can be checked by an atomic compare/exchange on the state (snapshot). If it succeeds, the data you worked with to produce a new state is still current, so your new state is valid. Now here's what I meant about the potential for failure. Because other threads could be doing work, the state may have been updated, causing all other operations to be invalidated. HOWEVER, my point was that the work done in those other operations may be partially valid, so when they re-calculate... – Triynko Sep 22 '10 at 15:34
  • ...and try to re-publish, the operation could be much shorter, so that not all work was done in vein. This is because checking to see if the work is still valid (by COMPARING the original source state, with the newly updated source state), it can be QUICKLY determined that the work done (that was theoretically invalid)... is actually still valid... or is partically valid and needs a slight alteration. In you're very simple contrived example, of course adding 50 + 10, and overwriting a value of 20 would be invalid, and that simple calculation would have to be redone, because IT WOULD FAIL... – Triynko Sep 22 '10 at 15:36
  • ...using this system.. it wouldn't overwrite the 20... it would see that 20 is there, whereas a 50 was there when the calculation started. Because the state is an immutible object, an ENTIRE STATE is represented by a single object id, so it's easy to compare/exchange the state such that only valid new states are published. The fact that calculated new states are ALLOWED to fail (rather than blocking doing nothing), and the fact that invalid new states can be QUICKLY validated and altered without redoing ALL the work, is the advantage of this. Refer back to my research metaphor earlier. – Triynko Sep 22 '10 at 15:42
  • again, immutable means unchangeable. If the state is immutable, it is a contradiction to even talk about updating it.. you are either not accurately communicating what you are thinking (perhaps you are talking about 2 different things, one of which is imutable and the other is not?) or you don't understand what immutable means – Charles Bretana Sep 22 '10 at 18:02
  • Perhaps all along here you have been talking about an object (a dictionary) whose elements are themselves mutable, but whose presence in the dictionary cannot be changed. The dictionary copuld accurately be described as read-only, since the elements in it cannot be removed and new elements cannot be added, but to say that the dictionary is immutable when those elements themselves can be modified or updated is at best misleading, and at worst, not even wrong. – Charles Bretana Sep 22 '10 at 19:50
  • It's been almost 5 years, but this concept has proven very useful to me in theoretical massively parallel indexing systems (with a memory/prediction architecture). To clarify, the "immutable" object was immutable in the way a string is immutable. To make this concrete, consider a JSON string that represents data in three arrays. That is essentially identical to an actual object with four members. Three readonly members which are all immutable dictionaries, and another readonly member which is a GUID. – Triynko Mar 31 '15 at 07:41
  • Although the object instance itself cannot be changed, it can be "read" by multiple threads, and those threads can perform complex calculations independently and in parallel. When finished, each thread has a complete new copy of a new immutable object, and it will try to overwrite the old shared object with a new one, which is another immutable object which was constructed based on the old one. If the shared objects GUID has not changed, it is replaced (e.g. with a `CompareExchange` call. Assuming one update succeeds, the other thread's attempt will fail. – Triynko Mar 31 '15 at 07:44
  • Now, at that moment, something incredible happens. Whereas in a normal synchronized context those two threads would have to run serially, they were allowed to run in parallel, with the understanding that some or possibly all of their work could be invalidated by a parallel update. However, because these two threads are performing different operations based on reading different sections of the same shared state object, the hope is that expensive work done can be validated with a few simple checks. – Triynko Mar 31 '15 at 07:47
  • For example, consider shared immutable object S. Thread 1 acquires reference to S and operations on dictionary S.D1. Thread 2 acquires reference to S as well, and operates on dictionary S.D2. Thread 1 finishes with a new/updated shared immutable object of the same type as S and successfully compare/exchanges it with the main reference. Thread 2 then tries to do the same thing, but it's compare/exchange fails because the old object has been replaced. – Triynko Mar 31 '15 at 08:00
  • Thread 2 then re-acquires the *new* shared immutable object (containing the work done by thread 1), and again constructs it's own new version (e.g. from a memory managed pool of this state), quickly applying its already-completed computationally-expensive results in an instant. In the end, we see that complex calculations ran in parallel instead of serially, so instead of the runtime's lower bound being (Thread 1 time + Thread 2 time), it becomes (max(Thread 1 time, Thread 2 time) + verification time of failure thread. – Triynko Mar 31 '15 at 08:03
  • Suppose thread 1 runs for 500 seconds, and thread 2 runs for 600 seconds. In a synchronized system, the runtime would be 500+600 = 1100 seconds. However, in the proposed system, it would be 600 seconds + (some relatively fast remediation step). The remediation step would involve grabbing the new object, testing whether data the calculation depends on had changed, and if not, reconstruct a new state from the updated state and try to compare/exchange it again. The point is, if the remediation steps are must faster than the calculations themselves, total runtime shrinks significantly. – Triynko Mar 31 '15 at 08:08
  • once again.... "new/updated shared immutable object " is a contradiction. For a race to occur, there must be a logical invariant property, whose value, at some point during the processing, is invalid or inconsistent. If all data manipulated or used by the process is immutable, how can any property value ever be invalid or inconsistent ? This can only occur if the process involved CHANGES the value of something during the process. This is the very definition of mutable. – Charles Bretana Mar 31 '15 at 14:00
1

I'd like to know how you would perform this task using your lock free programming style? You have a number of worker threads all periodically hitting a shared lists of tasks for the next job to perform. (currently) They lock the list, find the item at the head, remove it, and unlock the list. Please take into account all error conditions and possible data races so that no two threads can end up working on the same task, or that a task is accidentally skipped.

I suspect that the code to do this may suffer from a problem of over-complexity and have a possibility of poor performance in the case of high contention.

1800 INFORMATION
  • 119,313
  • 29
  • 152
  • 234
  • This is quite easy ... take the queue, make a copy, dequeue one work item, and replace the original queue with the modified copy (if the original queue is still present). There is of course quite a bit of overhead and every thread might fail several times (up to infinity) because other threads concurrently dequeued a work item. – Daniel Brückner Sep 16 '09 at 00:08
  • My point being that this code is somewhat more complex (thus more susceptible to error), and not really solving the issue of performance under load, and suffering from some serious fairness issues. – 1800 INFORMATION Sep 16 '09 at 00:12
  • The compare-exchange-solution has the advantage that it will not block all other threads if a thread halts (while it owns the lock). Skipping a work item might happen in both versions, if a thread halts after it has dequeued a work item. – Daniel Brückner Sep 16 '09 at 00:12
  • Sorry just a question of terminology, what do you mean by "halt"? Do you mean as in the OS interrupting because of the end of timeslice, or something more terminal? – 1800 INFORMATION Sep 16 '09 at 00:17
  • I wanted to avoid "if a thread crashes" because I wanted to include everything - a thread abort, end of time slice and not obtaining a new for a long time, waiting for I/O, ... the case of skipping a task occurs of course only when aborting or suspending a thread for ever. While this seems unavoidable, there a solutions where other threads complete the work of halted threads (in a limited fashion). – Daniel Brückner Sep 16 '09 at 00:33
  • Easy! Reverse it. Let default thread state be "wait to be assigned work" instead of "contend for work". Have an assignment thread which dequeues an item, sets it aside, and the signals the worker to have at it. The only possible contention then is between dequeuing and enqueuing items, BUT since this happens at opposite ends of the queue, no contention will occur as long as the cue has items in it. Meanwhile, queuing incoming requests is a non-issue as well, because there will always be a winner and therefore a steady flow into the queue: prep prep prep, try queue, try queue, prep prep... – Triynko Sep 16 '09 at 00:34
  • You are just serializing the access to the queue by using a single thread to enqueue or dequeue the items. This has (almost) no advantage over using a lock and adds additional complexity by adding a new coordination thread. Further you are left with the problem of multiple threads trying to enqueue new work items - how would you handle this in an easy way? – Daniel Brückner Sep 16 '09 at 00:48
  • Access to a queue is serial by nature...first in, first out. Not first in, second out. So I'm not sure what you mean. Many threads will be processing tasks, but just one thread handles reading the queue and activating other threads. It's a more economical use of processor cycles to have one thread perform the dequeuing and enlist other threads, than it is to have ALL the threads wasting time checking for stuff to do, when they could be resting (freeing processor cycles for busy threads) or doing actual work. – Triynko Sep 16 '09 at 01:01
  • All threads waiting for a lock don't continuously check the lock state - they are waiting in a queue managed by the runtime or operating system and in consequence don't consume any processor cycles. So your activation thread is already there in the form of the lock manager and thread scheduler. And what about the second problem - multiple threads trying to enqueue work items? Would you suggest to use one queue per thread? – Daniel Brückner Sep 16 '09 at 01:11
  • Multiple threads trying to enqueue: read x, add item, try compare/exchange, repeat. One will always be first. Items will be added as fast as atomic exchanges can be made, in a steady stream. Queue x = new Queue() [Queuing thread: read x, update x, compare/exchange x] [Dequeuing thread: read x, create new Queue(), compare/exchange x with new Queue()]. So you see, the Dequeing thread periodically STEALS the queue and replaces it with an empty one. The dequeuing thread MERGES the queues it has stolen, which it keeps private, and uses to activate sleeping thread in the mean time. – Triynko Sep 16 '09 at 01:25
  • Stealing the queue is a good idea to lower contention I have never thought about before. But I assume the enqueue process could easily live lock under high contention and with a long queue requiring some time to create a copy. You should prototype and profile that. – Daniel Brückner Sep 16 '09 at 01:36
  • In general, one can always use PULL instead of PUSH, to keep things fair. Suppose hardware can buffer few thousand requests and present 4 at a time in "slots". Four processors each work with one slot, sychronized with queuing hardware via clock cycle, so they take turns [test empty/present]--[test empty/read]. So one real "thread" per queue. The coordinating hardware or thread then, in a fair fashion, reads and assigns available items from each of the four slots, one at a time, sending each to an available worker, then loops. – Triynko Sep 16 '09 at 01:40
  • Any parallel system must eventually reduce to a single management thread in order to subdue chaos. In the brain, it's consciousness. The system CAN and WILL function without it, but not as efficiently, and live locks can and will occur otherwise. Of course, as I've said, there will always be one winner, so as fast as compare/exchange can occur, there will be a steady flow into the queue. Stealing the queue makes it smaller, so even though there are two different VIEWS of it... there is truely only one REAL queue. – Triynko Sep 16 '09 at 01:52
1

The big advantage of a lock over a CAS operation like Interlocked.CompareExchange is that you can modify multiple memory locations within a lock and all the modifications will be visible to other threads / processes at the same time.

With CAS, only a single variable is atomically updated. Lockfree code is usually significantly more complex because not only can you only present the update of a single variable (or two adjacent variables with CAS2) to other threads at a time, you also have to be able to handle "fail" conditions when CAS doesn't succeed. Plus you need to handle ABA issues and other possible complications.

There are a variety of methods like low locking, fine grain locking, striped locks, reader-writer locks, etc. that can make simple locking code much more multicore friendly.

That said, there are plenty of interesting uses for both locking and lockfree code. However, unless you REALLY know what you're doing creating your own lockfree code is not for a beginner. Use either lockfree code or algorithms that have been well proven and test them thoroughly because the edge conditions that cause failure in many lockfree attempts are very hard to find.

Adisak
  • 6,282
  • 34
  • 45
  • "Variable" and "memory location" are not synonymous. A single "memory location" can be designated as a pointer, such that it actually represents an *immutable* data structure with an arbitrary number of "variables". That way, replacing the pointer (in a single memory location) to the data structure through a CAS operation effectively makes a whole new data structure (with an abritrary number of "variables") available to all threads instantly. Threads are even free to work with earlier "snapshots" of data structures while new versions of the structures are being built if synchronized properly. – Triynko Oct 30 '09 at 20:41
  • 1
    Snapshots of data structures do not work modifications that need to be done serially. Also, there are many data structures that are referenced through a single "pointer" that don't lend themselves well to this swap methodology. For Example, a Red-Black-Tree has a single pointer as a root-node but it makes much more sense to lock it and work on the tree than to generate an entirely new tree and update with a CAS pointer swap. The CAS-swapped snapshot works best on data that is either read-only or predominantly read. You also need garbage collection or reference counting for cleanup. – Adisak Oct 30 '09 at 22:11
1

I would say it is no more obsolete than saying, in general, that pessimistic concurrency is obsolete given optimistic concurrency, or that pattern A is obsolete because of pattern B. I think it's about context. Lock-free is powerful, but there may not be a point in applying it unilaterally because not every problem is perfectly suited to this. There are trade-offs. That said, it would be good to have a general purpose lockless, optimistic approach where it hasn't been realized traditionally. In short, yes, lock can do something that' can't be achieved with the other approach: represent a potentially simpler solution. Then again, it may be that both have the same result, if certain things don't matter. I suppose I'm equivocating just a bit...

Kit
  • 15,260
  • 3
  • 47
  • 87
0

In theory, if there is a fixed amount of work to be done, a program which uses Interlocked.CompareExchange will manage to do it all without locking. Unfortunately, in the presence of high contention, a read/compute-new/compareExchange loop can end up thrashing so badly that 100 processors each trying to perform one update to a common data item may end up taking longer--in real time--than would a single processor performing 100 updates in sequence. Parallelism wouldn't improve performance--it would kill it. Using a lock to guard the resource would mean that only one CPU at a time could update it, but would improve performance to match the single-CPU case.

The one real advantage of lock-free programming is that system functionality will not be adversely affected if a thread gets waylaid for an arbitrary amount of time. One can maintain that advantage while avoiding the performance pitfalls of purely-CompareExchange-based programming by using a combination of locks and timeouts. The basic idea is that in the presence of contention, the resource switches to lock-based synchronization, but if a thread holds a lock for too long, the a new lock object will be created and the earlier lock will be ignored. This will mean that if that former thread was still trying to do a CompareExchange cycle, it will fail (and have to start all over again), but later threads will not be blocked nor will correctness be sacrificed.

Note that the code required to arbitrate all of the above will be complicated and tricky, but if one wants to have a system be robust in the presence of certain fault conditions, such code may be required.

supercat
  • 69,493
  • 7
  • 143
  • 184
  • What if you had an infinite number of realities to work with (or some large number to be realistic) then complex operations could be performed on each of them in parallel, after which conflicts could be quickly resolved and collapse into a single reality. If your "merge conflicting realities" operation is must faster than the calculations themselves (**even if the merge is serialized**), then it's still faster to throw parallel computation at it. It all depends on the speed of the remediation/merge step and the nature of the calculations, which much be deterministic. – Triynko Mar 31 '15 at 08:23