27

Can someone explain how atomicModifyIORef works? In particular:

(1) Does it wait for a lock, or optimistically try and retry if there's contention (like TVar).
(2) Why is the signature of atomicModifyIORef different to the signature of modifyIORef? In particular, what is this extra variable b?

Edit: I think I've figured out the answer to (2), in that b is a value to be extracted (this can be empty if not needed). In a single threaded program, knowing the value is trivial, but in a multithreaded program, one may want to know what the previous value was at the time of the function being applied. I assume this is why modifyIORef doesn't have this extra return value (as such usages of modifyIORef with this return value probably should use atomicModifyIORef anyway. I'm still interested in the answer to (1) though.

Don Stewart
  • 134,643
  • 35
  • 355
  • 461
Clinton
  • 20,364
  • 13
  • 59
  • 142

2 Answers2

34

Does it wait for a lock, or optimistically try and retry if there's contention (like TVar).

atomicModifyIORef uses an locking instruction on the underlying hardware architecture you're on, to swap the pointer to an allocated Haskell object in an atomic fashion.

On x86 it uses the cas intruction, exposed as a primitive to the language via atomicModifyMutVar#, which is implemented as a runtime service in Cmm as:

stg_atomicModifyMutVarzh
{
...

 retry:
   x = StgMutVar_var(mv);
   StgThunk_payload(z,1) = x;
#ifdef THREADED_RTS
   (h) = foreign "C" cas(mv + SIZEOF_StgHeader + OFFSET_StgMutVar_var, x, y) [];
   if (h != x) { goto retry; }
#else
   StgMutVar_var(mv) = y;
#endif
...
}

That is, it will try to do the swap, and retry otherwise.

The implementation of cas as a primitive shows how we get down to the metal:

/*
 * Compare-and-swap.  Atomically does this:
 */
EXTERN_INLINE StgWord cas(StgVolatilePtr p, StgWord o, StgWord n);

/*
 * CMPXCHG - the single-word atomic compare-and-exchange instruction.  Used
 * in the STM implementation.
 */
EXTERN_INLINE StgWord
cas(StgVolatilePtr p, StgWord o, StgWord n)
{
#if i386_HOST_ARCH || x86_64_HOST_ARCH
    __asm__ __volatile__ (
      "lock\ncmpxchg %3,%1"
          :"=a"(o), "=m" (*(volatile unsigned int *)p)
          :"0" (o), "r" (n));
    return o;
#elif arm_HOST_ARCH && defined(arm_HOST_ARCH_PRE_ARMv6)
    StgWord r;
    arm_atomic_spin_lock();
    r  = *p;
    if (r == o) { *p = n; }
    arm_atomic_spin_unlock();
    return r;
#elif !defined(WITHSMP)
    StgWord result;
    result = *p;
    if (result == o) {
        *p = n;
    }
    return result;

So you can see that it is able to use an atomic instruction in Intel, on other architectures different mechanisms will be used. The runtime will retry.

Don Stewart
  • 134,643
  • 35
  • 355
  • 461
12

atomicModifyIORef takes a r :: IORef a and a function f :: a -> (a, b) and does the following:

It reads the value of r and applies f to this value, yielding (a',b). Then the r is updated with the new value a' while b is the return value. This read and write access is done atomically.

Of course this atomicity only works if all accesses to r are done via atomicModifyIORef. Note that you can find this information by looking at the source [1].

[1] https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-IORef.html#v:atomicModifyIORef

Nikos Baxevanis
  • 10,016
  • 2
  • 41
  • 76
Peter
  • 1,643
  • 12
  • 17
  • 3
    Does it perform locking or is it optimistic? The GHC version just seems to call a GHC primitive. – Clinton Apr 11 '12 at 09:25
  • 2
    Note that due to laziness, `atomicModifyIORef` only has to change the current value to point to a thunk, with the actual work being delayed until it's read at some later time. AFAIK, it compiles to something like a CAS on most platforms. – hammar Apr 11 '12 at 09:32
  • 12
    Optimistic, via an interlocked exchange (cas) loop. https://github.com/ghc/ghc/blob/45740c29b24ea78b885d3b9f737a8bdc00265f7c/rts/PrimOps.cmm#L364 – Nathan Howell Apr 11 '12 at 09:36
  • So what happens if two threads want to perform a modification and send the result to the user as a webpage immediately (so the work isn't delayed). What happens? Do both threads perform their modification, and the one that finishes first causes the other thread to recalculate based the new value? Or does one wait for the other, and if so, does it busy wait, or just sleep until its turn? – Clinton Apr 11 '12 at 10:17
  • 1
    @Clinton: The threads will create thunks representing the updates in some order. What happens next depends on what those thunks look like, but if the new value depends on the old one, whichever thread comes first will evaluate it and overwrite the thunk with the result. If both threads try evaluating it at the same time, the result depends on something called "blackholing", where a thunk that's being evaluated may be marked as a "black hole". If another thread tries to evaluate the black hole, it will block until the first thread is done evaluating it. – hammar Apr 11 '12 at 11:47
  • 1
    [cont'd] It's also possible that both threads may evaluate a thunk at the same time. Since this is pure code, this is safe, and it isn't usually a big deal since thunks tend to be small, but it's still possible to tweak this with compiler flags to enable "eager blackholing" which means that all thunks under evaluation get marked as black holes, whereas normally this only happens if the original thread blocks for some reason, or when doing garbage collection. – hammar Apr 11 '12 at 11:53
  • @hammer: Thanks for the great explanation! – Clinton Apr 11 '12 at 12:42