I am looking for a queue algorithm that fulfills the following properties:
- Processes communicate using only a shared dictionary (key-value-store)
- Does not use any atomic operations other than load and store (no CAS, for example)
- Supports multiple producers
- Supports a single consumer
- Producers can die at any time and queue must remain operational
- The consumer can also die at any time and be restarted later, but there will never be more than one consumer-process running at a time
This is meant as a general question about a suitable algorithm, since I'd like to use it in a couple of different scenarios. But to help visualize the requirements, here is an example use-case:
- I have a website with two pages: producer.html and consumer.html
- producer.html can be opened in multiple tabs simultaneously
- Each producer.html adds events to the queue
- One copy of consumer.html is open and consumes these events (to aggregate and stream them to a webserver, for example)
If the multiple producer-tabs are opened by the user rather than the page, these tabs do not have references to each other available, so the usual communication methods (postMessage or calling directly into the other tab's JS code) are out. One of the ways they can still communicate with each other is via LocalStorage as suggested here: Javascript; communication between tabs/windows with same origin. But LocalStorage is not "thread-safe" as detailed here.
Note: There may be other ways to implement cross-tab communication in the browser (Flash, ...), but these are NOT the aim of this question as they won't translate to my other use-cases. This is really just an example use-case for the general queue algorithm that I am trying to find.
A couple more parameters:
- The number of producers will never be very large (10s or 100s maybe), so the scaling of the number of reads and writes needed with respect to the number of producers is not really a concern.
- I don't know before hand how many producers I might have and there is no immediately obvious way to assign a number or index to them. (Many mutex algorithms (Lamport's Bakery, Eisenberg&McGuire, Szymański's, ...) maintain an array of state for each process, which wouldn't necessarily be a natural approach here, although I do not want to exclude these approaches ex ante, if they can be implemented using the shared dictionary in some way...)
- The algorithm should be 100% reliable. So, I'd like to avoid things like the delay in Lamport's first Fast Mutex algorithm (page 2 in the PDF) since I don't have any kind of real-time guarantees.
- It would be very helpful if the queue was FIFO, but it's not strictly required.
- The algorithm should not be encumbered by any patents, etc.
Update:
The Two-Lock Concurrent Queue Algorithm by Michael and Scott looks like it could work, but I would need two things to implement it:
- A locking mechanism using the shared dictionary that can survive the crash of a lock-holder
- A reliable way to allocate a new node (if I move the allocation into the locked section, I could just generate new random keys until I find one that's not in use yet, but there might be a better way?)
Update 2:
It seems, I wasn't being specific enough about the dictionary:
It's really nothing more than a trivial key-value-store. It provides the functions get(key)
to read the value of a key, put(key, value)
to change the value of a key, and delete(key)
to remove a key. In some of my use-cases, I can also iterate over keys, but if possible, I'd like to avoid it for generality. Keys are arbitrary and the producers and consumers can create or calculate them as needed. The dictionary does not provide any facilities for automatically generating unique keys.
Examples are HTML LocalStorage, Google AppEngine's Datastore, a Java Map, a Python dictionary, or even a file-system with only a single directory (where the keys would be the file-names and the values the content of the files).