13

I know this pattern to read the umask in Python:

current_umask = os.umask(0)  # line1
os.umask(current_umask)      # line2
return current_umask         # line3

But this is not thread-safe.

A thread which executes between line1 and line2 will have a different umask.

Is there a thread-safe way to read the umask in Python?

Related: https://bugs.python.org/issue35275

guettli
  • 26,461
  • 53
  • 224
  • 476
  • Why do you need to call `os.umask()` in the first place? You usually do not always *need* to know the current umask. – Martijn Pieters Nov 12 '18 at 18:25
  • 1
    (And on a separate note: using `os.umask(0)` in a threaded environment not only runs the risk of a race condition, it also opens your app up to security bugs. You'd *at least* set a restrictive mask, like `os.umask(0o777)`. – Martijn Pieters Nov 12 '18 at 19:18
  • @MartijnPieters Why I want to know the umask? I want to compare two environments and check what's the difference between both. I want to improve my tool dumpenv: https://github.com/guettli/dumpenv – guettli Nov 13 '18 at 09:03
  • That's a much better reason than most to inspect the current umask! – Martijn Pieters Nov 13 '18 at 09:08

4 Answers4

10

if your system has Umask field in /proc/[pid]/status, you could read from on it:

import os

def getumask():
    pid = os.getpid()
    with open(f'/proc/{pid}/status') as f:
        for l in f:
            if l.startswith('Umask'):
                return int(l.split()[1], base=8)
        return None

tested under CentOS 7.5, Debian 9.6.

or, you could add a thread lock :)

georgexsh
  • 13,907
  • 2
  • 31
  • 54
5

umask is inherited by child processes. You could create a pipe, fork a child process, get the umask there and write the result to the pipe so the parent can read it.

Quite expensive, but without any special requirements like /proc virtual filesystem. An example with just low-level OS calls (all async-safe) and no error checking below:

import os
import struct

def get_umask():
    pipe = os.pipe()
    pid = os.fork()
    if pid == 0:
        os.close(pipe[0])
        umask = os.umask(0)
        os.write(pipe[1], struct.pack('H', umask))
        os.close(pipe[1])
        os._exit(0)
    else:
        os.close(pipe[1])
        value = os.read(pipe[0], 2)
        os.close(pipe[0])
        os.waitpid(pid, 0)
        return struct.unpack('H', value)[0]

print("umask {:03o}".format(get_umask()))
VPfB
  • 10,146
  • 1
  • 28
  • 54
  • Nice solution! I like this. – guettli Nov 14 '18 at 11:35
  • 1
    This code is not safe. From `man fork(2)`: "After a `fork()` in a multithreaded program, the child can safely call only async-signal-safe functions (see `signal-safety(7)`) until such time as it calls `execve(2)`." The problem is that Python interpreter itself is not async-signal-safe. In particular, nearly any Python code may cause the interpreter to allocate memory, and most memory allocation functions are not async-signal-safe. Therefore, it is never safe to call `os.fork()` from a multithreaded program. – abacabadabacaba Nov 16 '18 at 16:50
  • @abacabadabacaba I'm not sure. Interpreter's own memory operations should be protected by the GIL. – VPfB Nov 17 '18 at 08:16
  • 1
    @VPfB Implementation of GIL is not async-signal-safe either. – abacabadabacaba Nov 17 '18 at 13:25
  • @abacabadabacaba But without async-signal-safe GIL Python could not handle signals. – VPfB Nov 18 '18 at 09:09
  • @VPfB Low-level signal handler installed by Python interpreter [doesn't run Python code](https://docs.python.org/3/library/signal.html#execution-of-python-signal-handlers) or touch GIL. It only sets a flag. Main interpreter loop checks this flag periodically, and runs the high-level signal handler if the flag is set. – abacabadabacaba Nov 18 '18 at 15:43
  • @abacabadabacaba What you have described (GIL+flag can handle async signal) means that Python is capable of running a signal handler when the interpreter is ready to do that, i.e. "between two Python statements" with some simiplification. It is also the case in a multi-threaded program. But running an `os.fork` statement is not different, it is also "between two statements" and it is also obviously run when the interpreter is ready. That is my understanding. I'm not an expert. I'm just explaining, that I cannot follow your explanation. (to be continued)... – VPfB Nov 18 '18 at 18:51
  • @abacabadabacaba Could you please link an authoritative source for the claim that absolutely no form of fork in a multithreaded Python program is safe? If not, I will post a direct question. I want to know that for sure. – VPfB Nov 18 '18 at 18:51
  • @VPfB Well, I am not aware of such source. However, these are the facts that I know: 1. There was a number of bug reports filed against Python for deadlocks caused by using subprocess.Popen in multithreaded programs. Finally, Python developers changed subprocess.Popen to not use os.fork and instead to use a function written in C that does both fork and exec. 2. GIL is implemented using pthread mutexes and condition variables. All pthread functions for working with mutexes and condition variables are not async-signal safe. – abacabadabacaba Nov 18 '18 at 21:57
  • 1
    3. Python attempts to make os.fork safe, even reinitializing various important locks inside the child process. This may actually provide safety with some pthread implementations, such as that used by glibc on Linux. However, this is not guaranteed with all implementations, because no use of functions that are not async-signal safe is guaranteed to be safe after fork. – abacabadabacaba Nov 18 '18 at 21:57
  • fork at a random point is not a good idea indeed, a deadlock example: [bug 6721](https://bugs.python.org/issue6721). however, as the child process doesn't do anything special, I guess it is safe here, and this is a legit solution. – georgexsh Nov 19 '18 at 06:12
1

It is possible to determine umask by creating a temporary file and checking its permissions. This should work on all *nix systems:

def get_umask():
    import os, os.path, random, tempfile
    while True:
        # Generate a random name
        name = 'test'
        for _ in range(8):
            name += chr(random.randint(ord('a'), ord('z')))
        path = os.path.join(tempfile.gettempdir(), name)
        # Attempt to create a file with full permissions
        try:
            fd = os.open(path, os.O_RDONLY|os.O_CREAT|os.O_EXCL, 0o777)
        except FileExistsError:
            # File exists, try again
            continue
        try:
            # Deduce umask from the file's permission bits
            return 0o777 & ~os.stat(fd).st_mode
        finally:
            os.close(fd)
            os.unlink(path)
abacabadabacaba
  • 2,592
  • 1
  • 11
  • 16
  • @MartijnPieters `mkstemp` doesn't let you specify new file's permissions, so it doesn't work. – abacabadabacaba Nov 16 '18 at 22:53
  • Please use the python lib "tempfile", not this "TMPDIR = os.environ.get('TMPDIR', '/tmp')" https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir – guettli Nov 19 '18 at 10:28
1

The only truly, unambiguously thread-safe way I know is to call a new process.

import subprocess
umask_cmd = ('python', '-c', 'import os; print(os.umask(0777))')
umask = int(subprocess.check_output(umask_cmd))

Note that if you have bash or another shell, you could also call that. Since it might be on a weird system, I've chosen to use a python subprocess in umask_cmd, since you must have python. If you're on a non-weird *nix system, then you can use sh or bash instead.

Pi Marillion
  • 3,421
  • 16
  • 17