Create a new thin process, fork or threads?

Question

I'm working on a small project and I'm looking for advice.

Basically, I have a main process that is a server with a variety of dynamically growing data structures and some sockets here and there.

In some cases I need to create processes that run a small loop as long as necessary (they don't need any data structure, fd or socket of the main server). An important requirement is that they should be: fast, light and durable (even if the main server killed).

fork: I get durability over time but copying the main server means to copy all its data structures, fd etc. that would weigh down the new process unnecessarily.

thread: light and fast but not durable and above all very unstable (if a thread for some reason generates an error could block everything).

The ideal thing would be a magic system call, that executes an ex novo process that has a function as entry point but I think there is nothing like that.

Do you have any advice for me?

Since you said that "they don't need any data structure, fd or socket of the main server", then forking a new process won't be too expensive. It won't copy everything when you fork it, it will [copy on write](https://stackoverflow.com/questions/4597893/specifically-how-does-fork-handle-dynamically-allocated-memory-from-malloc/4597931#4597931) — Tianjiao Huang, May 09 '18 at 20:24
@TianjiaoHuang: The parent's memory will also be marked copy-on-write, so `fork` is probably not all that cheap. — Nemo, May 09 '18 at 20:27
`fork` is the only choice for "durable". Is it an option to keep some slave processes running and give them commands via IPC? (pipe, socket, shared memory...) — Nemo, May 09 '18 at 20:27
If the main process is killed, then what should these sub-whatevers do? What is orchestrating them? What are they communicating with? — Oliver Charlesworth, May 09 '18 at 20:29
Just look at your design wrt. inter-whatever comms. If your design relies on frequent signaling of large data items, then threads will be much more efficient because you can signal pointers instead of copying bulk data. If, however, there is a considerable degree of data isolation, then there is little advantage to threads over processes. Horses for courses. — Martin James, May 09 '18 at 20:32
There's no way for achieving durability without `fork`. If you don't want to use it you've to re-think about your problem and find another way. It's not true that if a thread "generates" an error could block everything because if you've enough isolation between threads and controls in your code you can easily manage errors. — simo-r, May 09 '18 at 20:32
..or you could just take the pragmatic approach of designing your system so that you can easily try both, and then just testmeasure. — Martin James, May 09 '18 at 20:36

score 2 · Accepted Answer · answered May 09 '18 at 21:49

fork: I get durability over time but copying the main server means to copy all its data structures, fd etc. that would weigh down the new process unnecessarily.

Not as much as you may think. Linux fork() has long been implemented via copy-on-write pages. The child process will have the same address space as the parent, but it will not have its own copies of any pages that neither process modifies. Moreover, the cost of copying modified pages is amortized over time. The initial fork() is pretty cheap.

thread: light and fast but not durable and above all very unstable (if a thread for some reason generates an error could block everything).

Given that analysis, threads are not a real option after all. Durability and stability are functional requirements. Minimum weight and to some extent even speed are efficiency issues. The former category trumps the latter pretty much every time.

The ideal thing would be a magic system call, that executes an ex novo process that has a function as entry point but I think there is nothing like that.

Since you're targeting Linux, have you considered clone()? It does exactly what you describe, though I'm doubtful that what you said fully captures the semantics you imagine for such a feature.

Alternatively, have you considered fork + exec? That would probably require some refactoring, but by performing an exec the child would shed the context shared with its parent as much as is possible, right after the (cheap) initial fork.

clone: it could be a solution but which flags to use to inherit as little as possible? From what I understand, clone behaves similar to fork, it replicates the process with the option to choose which part of its execution context to share. Fork + exec solution seems to be the best in my case, but I would like to avoid generating separate executable files. — Phocs, May 09 '18 at 22:10
As I said, @alessiovolpe, I doubt that what you wrote fully captures the semantics you have in mind. Clone *does* behave similarly to fork. If you read through the flag list, you will see that pretty much all of them request more sharing with the original process, so specifying no flags gives you as little sharing as possible. But no sharing means *copying* instead (with copy-on-write where applicable, like fork). But you get the specified function as entry point, and return from it terminates the child. — John Bollinger, May 09 '18 at 22:19

Create a new thin process, fork or threads?

1 Answers1