1

i have a requirement where many threads will call same shell script to perform a work, and then will write output(data as single text line) to a common text file. as here many threads will try to write data to same file, my question is whether unix provides a default locking mechanism so that all can not write at the same time.

psaha4
  • 339
  • 2
  • 16
  • "UNIX", meaning the POSIX standard? Because there are common tools, but none shell-accessible through means that POSIX itself promises will be available. – Charles Duffy Dec 23 '15 at 23:20
  • If you only care about commonly-available-in-Linux, that's much easier. :) – Charles Duffy Dec 23 '15 at 23:20
  • 1
    BTW, this arguably could be a duplicate of http://stackoverflow.com/questions/9926616/is-echo-atomic-when-writing-single-lines – Charles Duffy Dec 23 '15 at 23:29
  • @ Charles Duffy can you please elaborate. I am not very good at scripts and trying to understand unix systems. – psaha4 Dec 23 '15 at 23:31
  • Elaborate on which point? "UNIX" is a very vague term; there are many different flavors, and answers will vary depending on which one you're targeting. POSIX is a set of standards for minimum requirements that any UNIX should conform with, but everyone goes above and beyond it in various ways -- and for the functionality you're asking for here to be accessible from shell, you *need* extensions to that baseline standard. – Charles Duffy Dec 23 '15 at 23:32
  • 2
    Just write the lines to some socket. the listener could easily manage the correct order of write to the file. Or configure the `syslog` and send the lines to syslog. – jm666 Dec 23 '15 at 23:57
  • @jm666 *Just write the lines to some socket. the listener could easily manage the correct order of write to the file.* That's only guaranteed to work for `PIPE_BUF` or less bytes. And if you're at or below that, just `open` the file with `O_APPEND` and use one low-level call to `write`. – Andrew Henle Dec 24 '15 at 00:00
  • @AndrewHenle. the server binded on some port at each "accept" get exact peer. nothing is mixed together. And the server could manage the connections and the received lines... so, you probably misunderstand my comment, or i wrote it too short... – jm666 Dec 24 '15 at 00:07
  • @jm666 What happens in your example if the process doing the writing writes enough bytes that one write operation is split into multiple packets? It's not that simple, you can't just toss the bytes at a socket and expect the problem of interleaved data to disappear unless you're at or below whatever size is guaranteed to not be split. And once you're blow that size, `wite()`to the file with `O_APPEND` set is almost certainly going to work also - without the overhead of the pipes, sockets, and the process to handle that. The only advantage of pipes is processing doesn't block if done properly. – Andrew Henle Dec 25 '15 at 18:31

2 Answers2

4

Performing a short single write to a file opened for append is mostly atomic; you can get away with it most of the time (depending on your filesystem), but if you want to be guaranteed that your writes won't interrupt each other, or to write arbitrarily long strings, or to be able to perform multiple writes, or to perform a block of writes and be assured that their contents will be next to each other in the resulting file, then you'll want to lock.

While not part of POSIX (unlike the C library call for which it's named), the flock tool provides the ability to perform advisory locking ("advisory" -- as opposed to "mandatory" -- meaning that other potential writers need to voluntarily participate):

(
  flock -x 99 || exit # lock the file descriptor
  echo "content" >&99 # write content to that locked FD
) 99>>/path/to/shared-file

The use of file descriptor #99 is completely arbitrary -- any unused FD number can be chosen. Similarly, one can safely put the lock on a different file than the one to which content is written while the lock is held.

The advantage of this approach over several conventional mechanisms (such as using exclusive creation of a file or directory) is automatic unlock: If the subshell holding the file descriptor on which the lock is held exits for any reason, including a power failure or unexpected reboot, the lock will be automatically released.

Charles Duffy
  • 235,655
  • 34
  • 305
  • 356
  • so do you mean your suggested approach will not work for POSIX? If I want this approach to work, do I need to install "flock" in my unix environment? please suggest. I believe if this works then this will be the best solution for my small requirement. – psaha4 Dec 26 '15 at 20:30
  • 1
    All POSIX-compliant systems support the flock call, but not all have a flock command to allow shell access to that call, so yes, you may need to install tooling. – Charles Duffy Dec 26 '15 at 22:49
3

my question is whether unix provides a default locking mechanism so that all can not write at the same time.

In general, no. At least not something that's guaranteed to work. But there are other ways to solve your problem, such as lockfile, if you have it available:

Examples

Suppose you want to make sure that access to the file "important" is serialised, i.e., no more than one program or shell script should be allowed to access it. For simplicity's sake, let's suppose that it is a shell script. In this case you could solve it like this:

...

lockfile important.lock

...

access_"important"_to_your_hearts_content

...

rm -f important.lock

...

Now if all the scripts that access "important" follow this guideline, you will be assured that at most one script will be executing between the 'lockfile' and the 'rm' commands.

But, there's actually a better way, if you can use C or C++: Use the low-level open call to open the file in append mode, and call write() to write your data. With no locking necessary. Per the write() man page:

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

Like this:

// process-wide global file descriptor
int outputFD = open( fileName, O_WRONLY | O_APPEND, 0600 );
   .
   .
   .
// write a string to the file
ssize_t writeToFile( const char *data )
{
    return( write( outputFD, data, strlen( data ) );
}

In practice, you can write anything to the file - it doesn't have to be a NUL-terminated character string.

That's supposed to be atomic on writes up to PIPE_BUF bytes, which is usually something like 512, 4096, or 5120. Some Linux filesystems apparently don't implement that properly, so you may in practice be limited to about 1K on those file systems.

Community
  • 1
  • 1
Andrew Henle
  • 27,654
  • 3
  • 23
  • 49
  • `lockfile` is just an `O_EXCL` creation? If so, that leaves some questions open about cleanup on power loss or other unexpected failure while the lock is held. – Charles Duffy Dec 24 '15 at 00:00
  • @CharlesDuffy *that leaves some questions open about cleanup on power loss or other unexpected failure while the lock is held.* Yes, that is certainly a problem. I couldn't think of any other way under the OP's requested "script" solution. Hence my "better way". – Andrew Henle Dec 24 '15 at 00:03
  • Well -- `>>file` uses `O_APPEND`, so one doesn't strictly need C for that solution. :) – Charles Duffy Dec 24 '15 at 00:04
  • @CharlesDuffy - Only if you trust that none of the `write` system calls deep inside whatever command you've spawned are ever split in some way. For example, `fprintf()` will tend to make numerous separate `write()` calls for each `fprintf()` call the application makes. If multiple processes are writing to the file, every split is an opportunity for output from more than one process to wind up interleaved in the file. – Andrew Henle Dec 25 '15 at 18:22
  • bash has a sprintf equivalent in the -v argument to printf, so combining content into a single string to be emitted atomically is simple. – Charles Duffy Dec 26 '15 at 06:06