8

I am sharing some data across multiple processes by using shared memory; I use inter processes mutexes to achieve synchronization.

My question is the following: is it possible to use lock-free data structures AND/OR atomic operations to achieve faster synchronization without using mutexes between 2 processes?

If not do you know what is the main reason for this?

They are used only to synchronize threads of the same process. Are these concepts portable to processes as well? If they aren't do you know any faster method to share/synchronize data across processes?

meager
  • 209,754
  • 38
  • 307
  • 315
Abruzzo Forte e Gentile
  • 13,055
  • 24
  • 87
  • 163
  • You will probably find this just makes performance worse. Though don't use inter-process synchronization primitives for threads in the same process. The synchronization primitives are pretty close to the least you can possibly have on x86 and get any synchronization at all. – David Schwartz Nov 17 '11 at 00:04

2 Answers2

9

Are these concepts portable to processes as well?

Yes, atomic operations are universal both for threads and processes, IIF the memory atomically used is shared.

Atomic operation is specific instruction of processor itself and in knows nothing about threads or processes, it is just All-or-nothing (indivisible) complex of actions (read; compare; store) with low-level hardware implementation.

So, you can setup shared memory between processes and put an atomic_t into it.

lock-free

Yes, if lock-free is implemented only with atomic. (It should)

data structures

You should check, that shared memory is mapped to the same address in both processes when it is used to store pointers (in data structures).

If the memory will be mapped to different address, pointers will be broken in another process. In this case you need to use relative addresses, and do simple memory translation.

inter processes mutexes

And I should say that glibc>2.4 (NPTL) uses futex combined with atomic operations for non-contended lock (for Process shared mutexes = inter process mutexes). So, you already use atomic operations in shared memory.

osgx
  • 80,853
  • 42
  • 303
  • 470
  • Thanks osgx. Do you think that it might work if in shared memory I store simple integers to be read/written via __gcc_sync_fetch_and_xxx operations? If I am not wrong atomic_t is not available anymore like it was in the past in atomic.h. – Abruzzo Forte e Gentile Nov 17 '11 at 00:09
  • Yes, __sync are atomic builtins. Don't know about atomic_t availability, but you should know, which sizes of data are allowed to be used in atomic operation and what is alignment for it (this varies from one cpu to another, even in Intel Pentium3-4 vs Core2). – osgx Nov 17 '11 at 00:20
  • 2
    Hmm, [__sync are legacy](http://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html) and [__atomic should be used](http://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html) – osgx Nov 17 '11 at 00:23
2

On x86 with NPTL, most of the synchronization primitives have as their fast path just a single interlocked operation with a full memory barrier. Since x86 platforms don't really have anything lighter than that, they are already about the best you can do. Unless the existing atomic operations do exactly what you need to do, there will be no performance boost to pay back the costs of using the semantically lighter primitive.

David Schwartz
  • 166,415
  • 16
  • 184
  • 259
  • If he uses mutex to protect some shared data, there will be lock and unlock of mutex for each access. And if data is short (this is not true for all types of usage, but is for some), it can be changed with single atomic operation, which is twice lighter than fastpath (not counting default mutex logic: switch between types of mutex; jumps from selecting fast/slow path). – osgx Nov 17 '11 at 00:17
  • Hi David. Let me check If I understood correctly: you are saying that if I have to strictly "LOCK"/"syncronize" an area of code, linux posix locking primitives are already the fastest/best solutions available.. BUT if I want to strictly do an add/xor/and etc.. the atomic operations __sync_fetch_and_xxx are faster. Am I correct? – Abruzzo Forte e Gentile Nov 17 '11 at 00:20
  • @osgx It's the same. He'd have one interlocked operation for the lock and none for the unlock. The rest of the code optimizes to nearly nothing. – David Schwartz Nov 17 '11 at 00:33
  • @AbruzzoForteeGentile If the atomic operations do *exactly* what you need, they will be faster. But if you have to fake the fact that they don't, locks will usually be faster. – David Schwartz Nov 17 '11 at 00:34
  • 1
    "The rest of the code optimizes to nearly nothing"... Optimized by compiler? CPU? How PLT is Optimized out? How jmps are optimized out? How (atomic instruction + 2 branch + PLT call) = mutex can be faster than (atomic instruction) ? It can be same order of speed, but it is bit slower. – osgx Nov 17 '11 at 00:50
  • "He'd have one interlocked operation for the lock and none for the unlock." - as in "Mutex take 3" there is at least one atomic operation for lock and one for unlock. Both return values are used in conditional branch. – osgx Nov 17 '11 at 00:52
  • @osgx: There's no atomic operation needed for unlock in the fast path. You just do, basically, 'mutex=unlocked; if(waiters) ...' – David Schwartz Nov 17 '11 at 03:27
  • David Schwartz, what is the operation you mean? I think, you talking about very abstract code which is not real. I in turn mean the code from Drepper, http://www.akkadia.org/drepper/futex.pdf , page 8 part "6 Mutex, Take 3". As I know this is the actual Futex usage in Glibc (Drepper is co-author of glibc/nptl and author of linked paper). In this usage of futex/atomic, unlock must be atomic too, because there are 3 possible states of val total and 2 possible in the unlock. If val==2, there are sleeping waiters and they must be waked up using futex_wake (or it will be an endless wait possible). – osgx Nov 17 '11 at 03:44
  • David Schwartz, if you don't trust my comments, there is a code: [glibc-ports-20090518/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.h:220](https://www.google.com/codesearch#xy1xtVWIKOQ/pub/glibc/snapshots/glibc-ports-latest.tar.bz2%7CDNu48aiJSpY/glibc-ports-20090518/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.h&q=lll_unlock&exact_package=ftp://sources.redhat.com/pub/glibc/snapshots/glibc-ports-latest.tar.bz2&type=cs&l=220) `#define __lll_unlock` ... `atomic_exchange_rel`. My case is about real mutex with futex-sleepeng of waiters; and your theory is only for special mutex with busy-waiting. – osgx Nov 17 '11 at 03:58
  • Ok, there is an even faster path since 2007. If lock is Private and If there is only single thread in the process; lock prefix will not be used by lll_unlock (glibc-2.7; not on every arch) - so no atomic. But if any additional thread was started, atomic operations in lll_ lock/unlock are restored. – osgx Nov 17 '11 at 04:33