17

I'm confused because from reading the wiki page it seems like just having a checkValidate and commit system for loads and stores. Is the purpose to solve synchronization problems? Is it a software programming thing build on-top of current hardware, or is it a hardware implementation via an ISA? What's the difference between each (HW/SW) implementation?

Thanks.

JDS
  • 14,991
  • 41
  • 142
  • 202

3 Answers3

36

Transactional Memory is the concept of using transactions rather than locks to synchronise processes that execute in parallel and share memory.

At a very simplified level, to synchronise with locks you identify sections of code (called critical sections) that must not be executed simultaneously by different threads and acquire and release locks around the critical sections. Since each lock can only be held by one thread at a time, this guarantees that once one thread enters a critical section, all of the section's operations will have been completed before another thread enters a critical section protected by the same lock(s).

Transactional memory instead lets you designate sections of code as transactions. The transactional memory system (which can be implemented in hardware, software, or both) then attempts to give you the guarantee that any run of a program in which multiple threads execute transactions in parallel will be equivalent to a different run of the program in which the transactions all executed one after another, never at the same time.

The transactional memory system does this by allowing transactions to execute in parallel and monitoring their access to transaction variables. If the system detects a conflict between two transactions' access to the same variable, it will cause one of them to abort and "rollback" to the beginning of the transaction it was running; it will then automatically restart the transaction, and the overall state of the system will be as if it had never started the earlier run.


One goal of transactional memory is ease-of-programming and safety; a properly implemented TM system which is able to enforce that transactions are used correctly gives hard guarantees that there are no parallelism bugs (deadlocks, race conditions, etc) in the program, and only requires that the programmer designate the transactions (and sometimes transaction variables, if the system doesn't just consider all of memory to implicitly be transaction variables), without needing to identify exactly what locks are needed, acquire them in the correct order to prevent deadlock, etc, etc. "Transacitons are used correctly" implies that there is no sharing data between threads without going through transaction variables, no access to transactional data except in transactions, and no "un-rollbackable" operations inside transactions); library based software transactional memory systems for imperative languages like C, Java, etc generally are unable to enforce all of this, which can re-introduce the possibility of some of the parallelism bugs.

Another goal of transactional memory is increasing parallelism; if you have a whole bunch of parallel operations which access some data structure, all of which might write to it but few of which actually do, then lock-based synchronisation typically requires that all of the operations run serially to avoid the chance of data corruption. Transactional memory would allow almost all of the operations to run in parallel, only losing parallelism when some process actually does write to the data structure.

In practice (as of when I researched my honours project a few years ago), hardware-based transactional memory hasn't really taken off, and current software transactional memory systems have significant overheads. So software transactional memory is more aimed at "reasonable performance that scales with the available processors moderately well and is pretty easy to code", rather than giving you absolute maximal performance.

There's a lot of variability between different transactional memory systems though; I'm speaking at quite an abstract and simplified level here.

Ben
  • 56,956
  • 19
  • 113
  • 151
  • 1
    I think you're overselling here. TM algorithms are subtle, just as much as are traditional synchronization techniques. And the resulting bugs can be equally inscrutable. In any case you generally need a working traditional variant anyway as a fallback for transaction collisions, so there's no free lunch. STM has largely failed to catch on for these reasons. HTM in Haswell looks promising, but more so from a performance perspective than an ease of programming one. – Andy Ross Jun 29 '12 at 04:46
  • **If** the STM system is aiming to provide strong safety guarantees **and** there is a mechanism for enforcing that (a) transactionally shared data is only accessed from within transactions (b) transactions never have side effects, then STM programs are guaranteed to be free of race conditions and deadlocks. Having worked on STM implementations, this is true. Poor use of such an STM system can give you abysmal performance due to contention, but it can't give you deadlocks or race conditions that only cause problems under extremely precise timing conditions. – Ben Jun 29 '12 at 07:31
  • Those preconditions aren't true of library based STM implementations for C though, because without language integration there's no way to enforce (a) and (b), and with C it's pretty hard to enforce anything anyway. I don't know as much about HTM, but my understanding is that you again require language integration to give safety guarantees, so using HTM from C or assembler doesn't come with the free lunch, no. – Ben Jun 29 '12 at 07:33
  • That's fine. Just recognize that your perspective is one of an "STM fan". But STM, quite frankly, has largely failed in the market. I think you need to rethink this. HTM has very clear and practical (if limited in scope) performance advantages and is worth investigating. But no, neither is a panacea. – Andy Ross Jun 29 '12 at 14:46
  • HTM is irrelevant to me while the computers I have, and those available to the users I want to run my programs, don't have HTM hardware. But I'm not so much an "STM fan" as a "safe parallel computation" fan. If HTM gives me that, then awesome; what I have seen of the field suggests it will not any time soon (but it may well speed up the *implementations* of safe STM systems, which would be nifty). – Ben Jul 01 '12 at 04:02
  • 1
    As my programming projects are **far** more limited by available programmer time than by execution speed, I'm more interested in ease-of-programming technologies than performance technologies. STM systems that provide the safety guarantees I'm talking about actually exist now and are usable by me now. – Ben Jul 01 '12 at 04:07
  • I'd welcome you editing in some more information from a "HTM fan" perspective to my answer. I was trying to avoid talking about any specific systems, but still give a very rough overview of the current "state of play". I'm aware that my knowledge of the HTM side is well below that of the STM side, and also that others *are* interested in systems with a performance advantage that don't provide safety guarantees. – Ben Jul 01 '12 at 04:11
  • @AndyRoss: What do you mean by "failed in the market"? It has never really been *in* the market in the first place. It *is* fairly widely used in a few languages (Clojure and Haskell have it built in), but more generally, it has not failed, because it has not really been attempted at a large scale yet. If that constitutes "failed in the market", then I'd like to hear your case for HTM, because that's even less used today. A STM system implemented on top of C is going to be fairly useless, yes, but that doesn't mean "all STM implementations in all languages ever are doomed to fail". – jalf Jul 13 '12 at 16:57
  • 1
    @AndyRoss: as for your first comment, you might need to elaborate. STM algorithms are a lot easier to work with than traditional synchronization techniques. I can see how HTM-based programming could get a bit trickier, but for STM? I don't see it. I also don't see why you'd need a "traditional" fallback. Any reasonably well designed STM implementation won't need that. I suspect the problem is that *because* STM hasn't *yet* caught on, what we mean by it differs quite a bit. You are obviously thinking of some of the terrible low-tech C implementations which... are a pain to work with – jalf Jul 13 '12 at 17:05
  • 1
    But STM can be done *a lot* better, which seems to be what Ben and I have in mind. – jalf Jul 13 '12 at 17:05
4

At the implementation level, transactional memory is part of the cache layer. It allows software to "try" some operations on memory, and then "commit" them later only if no other multiprocessors in the system modified any of the memory that was read or written. In very parallel SMP environments where most accesses don't collide, this can be faster than having all threads locking the same (highly contended) synchronization primitives.

It makes the task of the application programmer more difficult, though, because the software has to be able to recover ("rollback") the transaction if the commit fails.

Andy Ross
  • 10,859
  • 1
  • 28
  • 29
4

From the gcc Wiki:

In general, implementations come in two forms: a Software Transactional Memory 
(STM) system uses locks or other standard atomic instructions to do its job. 
A Hardware Transactional Memory (HTM) system uses multi-word synchronization 
operations of the CPU to implement the requirements of the transaction directly 
(e.g., see the Rock processor). Because most HTM systems are likely to be best
 effort facilities (i.e., not all transactions can be executed using HTM), 
practical TM implementations that incorporate HTM also have a STM component 
and are thus termed Hybrid Transactional Memory systems. 
starbolin
  • 827
  • 5
  • 13