1

Based on the discussion in this bug report, and a related SO question:

When using shared_memory in a subprocess the resource_tracker needs to be inherited from the parent process. If it is not then each subprocess erroneously gets its own resource_tracker.

I don't instantiate a resource_tracker anywhere in my code. What does it mean for a resource_tracker to be inherited? How do I instantiate the resource_tracker in the main process prior to creating the new subprocesses so that the resource_tracker gets inherited by the subprocesses?

David Parks
  • 25,796
  • 41
  • 148
  • 265

1 Answers1

2

When using shared_memory in a subprocess the resource_tracker needs to be inherited from the parent process. If it is not then each subprocess erroneously gets its own resource_tracker.

This statement is quite flawed given the current implementations of both ResourceTracker and SharedMemory. The former is implemented as a separate python process that communicates with the process that started it (i.e. the process that created the shared memory object(s)) via a pipe. The resource tracker has the read end of the pipe, while the process creating the shared memory objects gets the write end of it. So, any time the starting process creates a SharedMemory object, it sends, via the pipe, a message to the resource tracker to register the created resource. Similarly, if a resource needs to removed, the starting process will use the pipe again to send an unregister message. As result, the only way a child process could truly inherit the resource tracker of its parent is if it sent messages directly to the resource tracker using the write end of the pipe (which it should have access to). However, since the current implementation of SharedMemory creates a resource tracker even when a process is only consuming an already created shared memory object, your child processes would have to communicate with two separate resource trackers: the one started by their parent (via the same pipe), and the one that gets started when they instantiate a SharedMemory object for the first time. With that out of the way, let's tackle your questions:

I don't instantiate a resource_tracker anywhere in my code. What does it mean for a resource_tracker to be inherited?

First, you do not instantiate a resource tracker; one is instantiated for you when you instantiate a SharedMemory object for the first time. And currently, it does not matter whether or not you are producing or consuming a shared memory object. A resource tracker is always created for the process that instantiated the shared memory objects.

Second, it's really not a thing in the current implementation to inherit a resource tracker. Again, consuming processes shouldn't worry about the life cycle of shared memory objects. All they have to worry about is to make sure that the object actually exists. They can do this by handling a FileNotFoundError or OSError exception. If the current implementation of SharedMemory was not buggy, when consuming processes are done with a resource, all they need to do is call SharedMemory.close and move on to something else.

How do I instantiate the resource_tracker in the main process prior to creating the new subprocesses so that the resource_tracker gets inherited by the subprocesses?

I think the issue here is that your design is flipped. You should have your main process create the shared memory object and let the child processes consume it. The idea behind using shared memory objects is so that you can have multiple separate processes using the same memory chunks, which should in turn limit the amount of resources used by your parallel program. But the code in the linked SO post is doing the reverse. Since shared memory objects are kernel persistent resources, it makes sense to have as few of them as possible. So, if you employ a "one producer, multiple consumers" design, you can have your main process create the shared memory object along with its associated resource tracker, and then you let the child processes consume the memory. In this scenario, you could get some work done in the child processes without having to worry about the resource trackers associated with them. But just make sure that the child processes don't unlink the shared memory object before the parent process gets around to doing it. Better yet, if the fix in the bug report gets implemented making it unnecessary for consuming processes to spawn resource trackers, you can be confident in that your main process will be the only entity unlinking the shared memory object.

In sum, your child processes are not going to inherit their parent's resource tracker, as far as the current implementation goes. If those child processes end up actually creating shared memory objects, they will get their own resource trackers. But if efficiency is the goal, you would want your main process to create the shared memory object(s) that your child processes will then consume. In such a scenario, your main process, via its associated resource tracker, will be in charge of the cleanup step. And, if the fix is implemented, you can always be safe in assuming that only the main process will be unlinking the resources.

Dharman
  • 21,838
  • 18
  • 57
  • 107
Abdou
  • 10,940
  • 3
  • 27
  • 38
  • Brilliant answer, I really appreciate it. I think your suggestion of creating the shared memory object in the parent process will work in my case, I know the size of the object. However, it wouldn't be very convenient in many producer-consumer patterns in which the producer process generates arbitrarily sized objects. – David Parks Jul 22 '20 at 03:48
  • Generating arbitrarily sized objects should be possible without any errors or warnings, as long as the consuming processes don't spawn resource trackers. But if you're able to consume the generated shared memory objects without errors, then the warnings can be ignored (but still logged hopefully). – Abdou Jul 22 '20 at 18:14
  • You're suggesting that a single MAIN process create the shared memory object that the producer subprocesses use. If MAIN is the consumer, and it doesn't know how large the data will be, it cannot create the fixed-sized shared memory object. E.g. the producers (subprocesses) need to create the shared memory objects. But herein lies the catch-22, now we have a whole host of trackers running around that are going to be generating a lot of errors. I'm still not seeing a general purpose solution for the producer consumer pattern here. I'm not saying that as a complaint, just as observation. – David Parks Jul 22 '20 at 18:48
  • 1
    I think my comment was mostly that consuming processes should avoid unlinking things, since child processes (producers in your case) would take care of the blocks when they exit. You can also have the child processes send the arbitrary sizes to the main process to create the shared memory objects. Once created, the names of said shared memory objects can be sent back to the child processes to do work. Let's chat [here](https://chat.stackoverflow.com/rooms/info/218384/python-shared-memory-objects-and-resource-trackers?tab=general) if it's not clear. – Abdou Jul 22 '20 at 19:57