39

I'm writing a Unix domain socket server for Linux.

A peculiarity of Unix domain sockets I quickly found out is that, while creating a listening Unix socket creates the matching filesystem entry, closing the socket doesn't remove it. Moreover, until the filesystem entry is removed manually, it's not possible to bind() a socket to the same path again : bind() fails with EADDRINUSE if the path it is given already exists in the filesystem.

As a consequence, the socket's filesystem entry needs to be unlink()'ed on server shutdown to avoid getting EADDRINUSE on server restart. However, this cannot always be done (i.e.: server crash). Most FAQs, forum posts, Q&A websites I found only advise, as a workaround, to unlink() the socket prior to calling bind(). In this case however, it becomes desirable to know whether a process is bound to this socket before unlink()'ing it.

Indeed, unlink()'ing a Unix socket while a process is still bound to it and then re-creating the listening socket doesn't raise any error. As a result, however, the old server process is still running but unreachable : the old listening socket is "masked" by the new one. This behavior has to be avoided.

Ideally, using Unix domain sockets, the socket API should have exposed the same "mutual exclusion" behavior that is exposed when binding TCP or UDP sockets : "I want to bind socket S to address A; if a process is already bound to this address, just complain !" Unfortunately this is not the case...

Is there a way to enforce this "mutual exclusion" behavior ? Or, given a filesystem path, is there a way to know, via the socket API, whether any process on the system has a Unix domain socket bound to this path ? Should I use a synchronization primitive external to the socket API (flock(), ...) ? Or am I missing something ?

Thanks for your suggestions.

Note : Linux's abstract namespace Unix sockets seem to solve this issue, as there is no filesystem entry to unlink(). However, the server I'm writing aims to be generic : it must be robust against both types of Unix domain sockets, as I am not responsible for choosing listening addresses.

Simon Malinge
  • 493
  • 4
  • 5

2 Answers2

25

I know I am very late to the party and that this was answered a long time ago but I just encountered this searching for something else and I have an alternate proposal.

When you encounter the EADDRINUSE return from bind() you can enter an error checking routine that connects to the socket. If the connection succeeds, there is a running process that is at least alive enough to have done the accept(). This strikes me as being the simplest and most portable way of achieving what you want to achieve. It has drawbacks in that the server that created the UDS in the first place may actually still be running but "stuck" somehow and unable to do an accept(), so this solution certainly isn't fool-proof, but it is a step in the right direction I think.

If the connect() fails then go ahead and unlink() the endpoint and try the bind() again.

Kean
  • 552
  • 5
  • 8
10

I don't think there is much to be done beyond things you have already considered. You seem to have researched it well.

There are ways to determine if a socket is bound to a unix socket (obviously lsof and netstat do it) but they are complicated and system dependent enough that I question whether they are worth the effort to deal with the problems you raise.

You are really raising two problems - dealing with name collisions with other applications and dealing with previous instances of your own app.

By definition multiple instances of your pgm should not be trying to bind to the same path so that probably means you only want one instance to run at a time. If that's the case you can just use the standard pid filelock technique so two instances don't run simultaneously. You shouldn't be unlinking the existing socket or even running if you can't get the lock. This takes care of the server crash scenario as well. If you can get the lock then you know you can unlink the existing socket path before binding.

There is not much you can do AFAIK to control other programs creating collisions. File permissions aren't perfect, but if the option is available to you, you could put your app in its own user/group. If there is an existing socket path and you don't own it then don't unlink it and put out an error message and letting the user or sysadmin sort it out. Using a config file to make it easily changeable - and available to clients - might work. Beyond that you almost have to go some kind of discovery service, which seems like massive overkill unless this is a really critical application.

On the whole you can take some comfort that this doesn't actually happen often.

Duck
  • 25,228
  • 3
  • 57
  • 86
  • Thanks for your answer. Using a traditional lockfile system is admittedly the safest way to go. Also, as to whether a service discovery system is overkill or not : ironically enough, this server is planned to be part of a service discovery system by itself (service "registration" system seems more appropriate). This should answer your question ;-) – Simon Malinge Sep 14 '11 at 12:25