21

Is there a way to extract the moment of historic leap seconds from the time-zone database that is distributed on most linux distributions? I am looking for a solution in python, but anything that works on the command line would be fine too.

My use case is to convert between gps-time (which is basically the number of seconds since the first GPS-satellite was switched on in 1980) and UTC or local time. UTC is adjusted for leap-seconds every now and then, while gps-time increases linearly. This is equivalent to converting between UTC and TAI. TAI also ignores leap-seconds, so TAI and gps-time should always evolve with the same offset. At work, we use gps-time as the time standard for synchronizing astronomical observations around the world.

I have working functions that convert between gps-time and UTC, but I had to hard-code a table of leap seconds, which I get here (the file tzdata2013xx.tar.gz contains a file named leapseconds). I have to update this file by hand every few years when a new leapsecond is announced. I would prefer to get this information from the standard tzdata, which is automatically updated via system updates several times a year.

I am pretty sure the information is hidden in some binary files somewhere in /usr/share/zoneinfo/. I have been able to extract some of it using struct.unpack (man tzfile gives some info about the format), but I never got it working completely. Are there any standard packages that can access this information? I know about pytz, which seems to get the standard DST information from the same database, but it does not give access to leap seconds. I also found tai64n, but looking at its source code, it just contains a hard-coded table.

EDIT

Inspired by steveha's answer and some code in pytz/tzfile.py, I finally got a working solution (tested on py2.5 and py2.7):

from struct import unpack, calcsize
from datetime import datetime

def print_leap(tzfile = '/usr/share/zoneinfo/right/UTC'):
    with open(tzfile, 'rb') as f:
        # read header
        fmt = '>4s c 15x 6l'
        (magic, format, ttisgmtcnt, ttisstdcnt,leapcnt, timecnt,
            typecnt, charcnt) =  unpack(fmt, f.read(calcsize(fmt)))
        assert magic == 'TZif'.encode('US-ASCII'), 'Not a timezone file'
        print 'Found %i leapseconds:' % leapcnt

        # skip over some uninteresting data
        fmt = '>%(timecnt)dl %(timecnt)dB %(ttinfo)s %(charcnt)ds' % dict(
            timecnt=timecnt, ttinfo='lBB'*typecnt, charcnt=charcnt)
        f.read(calcsize(fmt))

        #read leap-seconds
        fmt = '>2l'
        for i in xrange(leapcnt):
            tleap, nleap = unpack(fmt, f.read(calcsize(fmt)))
            print datetime.utcfromtimestamp(tleap-nleap+1)

with result

In [2]: print_leap()
Found 25 leapseconds:
1972-07-01 00:00:00
1973-01-01 00:00:00
1974-01-01 00:00:00
...
2006-01-01 00:00:00
2009-01-01 00:00:00
2012-07-01 00:00:00

While this does solve my question, I will probably not go for this solution. Instead, I will include leap-seconds.list with my code, as suggested by Matt Johnson. This seems to be the authoritative list used as a source for tzdata, and is probably updated by NIST twice a year. This means I will have to do the update by hand, but this file is straightforward to parse and includes an expiration date (which tzdata seems to be missing).

Bas Swinckels
  • 16,651
  • 3
  • 38
  • 58
  • 2
    I know they are also published [here](https://github.com/eggert/tz/blob/master/leap-seconds.list), and I also know that they are compiled with `zic`, so they should be in the tzdata updates. As you noticed, [tzfile](http://man7.org/linux/man-pages/man5/tzfile.5.html) shows it in `tzh_leapcnt`, so you might be able to get it that way. I don't have a more direct answer for you at this time. Maybe someone else will. – Matt Johnson-Pint Oct 24 '13 at 17:02
  • tzdata stores offsets from UTC. Why would it contain leapseconds? – mattexx Oct 28 '13 at 22:53
  • 1
    @mattexx Don't ask me why, but the binary files of tzdata do contain leap-second information, maybe precisely to do the kind of time conversions I am interested in. The people maintaining this [database](http://en.wikipedia.org/wiki/Olson_database) are very meticulous when it comes to recording historic changes in time definitions, sometimes providing updates 10 times per year because some crazy dictator moved daylight savings time by a day. Keeping track of leapseconds is much easier, since IERS gives out regular bulletins, and they are usually announced half a year in advance. – Bas Swinckels Oct 28 '13 at 23:36
  • Do you know why you needed to add 4 to the offset in my program? I just double-checked the man page and I don't see where that 4 comes from. I'm glad you got a working solution of course! – steveha Oct 30 '13 at 20:06
  • 1
    The man page is a bit vague, but looking at my working code (which is a direct copy/past from `pytz/tzfile.py`) and at some random [tzfile.h](https://www.opensource.apple.com/source/Libc/Libc-498.1.1/stdtime/tzfile.h), it seems you were missing `charcnt` bytes (which was indeed 4 for this file). – Bas Swinckels Oct 30 '13 at 20:36
  • Yeah, that was it. I even found the text in the man page that I overlooked, which document this. I'm updating my answer, for the sake of anyone who looks at it in the future. – steveha Oct 30 '13 at 20:55
  • 1
    unrelated: here's [how to convert GPS time to UTC using "right" timezone without extracting leap seconds on Unix explicitly](http://stackoverflow.com/q/33415475/4279) – jfs Oct 30 '15 at 16:31
  • @J.F.Sebastian Thanks, interesting. That is indeed related to my original goal of converting gps time to UTC or local time. – Bas Swinckels Oct 30 '15 at 17:57

2 Answers2

10

I just did man 5 tzfile and computed an offset that would find the leap seconds info, then read the leap seconds info.

You can uncomment the "DEBUG:" print statements to see more of what it finds in the file.

EDIT: program updated to now be correct. It now uses the file /usr/share/zoneinfo/right/UTC and it now finds leap-seconds to print.

The original program wasn't skipping the timezeone abbreviation characters, which are documented in the man page but sort of hidden ("...and tt_abbrind serves as an index into the array of timezone abbreviation characters that follow the ttinfo structure(s) in the file.").

import datetime
import struct

TZFILE_MAGIC = 'TZif'.encode('US-ASCII')

def leap_seconds(f):
    """
    Return a list of tuples of this format: (timestamp, number_of_seconds)
        timestamp: a 32-bit timestamp, seconds since the UNIX epoch
        number_of_seconds: how many leap-seconds occur at timestamp

    """
    fmt = ">4s c 15x 6l"
    size = struct.calcsize(fmt)
    (tzfile_magic, tzfile_format, ttisgmtcnt, ttisstdcnt, leapcnt, timecnt,
        typecnt, charcnt) =  struct.unpack(fmt, f.read(size))
    #print("DEBUG: tzfile_magic: {} tzfile_format: {} ttisgmtcnt: {} ttisstdcnt: {} leapcnt: {} timecnt: {} typecnt: {} charcnt: {}".format(tzfile_magic, tzfile_format, ttisgmtcnt, ttisstdcnt, leapcnt, timecnt, typecnt, charcnt))

    # Make sure it is a tzfile(5) file
    assert tzfile_magic == TZFILE_MAGIC, (
            "Not a tzfile; file magic was: '{}'".format(tzfile_magic))

    # comments below show struct codes such as "l" for 32-bit long integer
    offset = (timecnt*4  # transition times, each "l"
        + timecnt*1  # indices tying transition time to ttinfo values, each "B"
        + typecnt*6  # ttinfo structs, each stored as "lBB"
        + charcnt*1)  # timezone abbreviation chars, each "c"

    f.seek(offset, 1) # seek offset bytes from current position

    fmt = '>{}l'.format(leapcnt*2)
    #print("DEBUG: leapcnt: {}  fmt: '{}'".format(leapcnt, fmt))
    size = struct.calcsize(fmt)
    data = struct.unpack(fmt, f.read(size))

    lst = [(data[i], data[i+1]) for i in range(0, len(data), 2)]
    assert all(lst[i][0] < lst[i+1][0] for i in range(len(lst)-1))
    assert all(lst[i][1] == lst[i+1][1]-1 for i in range(len(lst)-1))

    return lst

def print_leaps(leap_lst):
    # leap_lst is tuples: (timestamp, num_leap_seconds)
    for ts, num_secs in leap_lst:
        print(datetime.datetime.utcfromtimestamp(ts - num_secs+1))

if __name__ == '__main__':
    import os
    zoneinfo_fname = '/usr/share/zoneinfo/right/UTC'
    with open(zoneinfo_fname, 'rb') as f:
        leap_lst = leap_seconds(f)
    print_leaps(leap_lst)
steveha
  • 67,444
  • 18
  • 86
  • 112
  • One can have a file with the number of leap seconds since 1980 and then whenever a leap second occurs based on your `leap_second` function, one can increment the number. On Ubuntu, in the launchpad page for the tzdata package, one can download old versions of the package. – Ramchandra Apte Oct 30 '13 at 05:33
  • Thanks steveha, that was the kind of solution I was looking for. I don't have time now, I will have a better look at your solution in the next days. – Bas Swinckels Oct 30 '13 at 09:34
  • The file `/usr/share/zoneinfo/right/UTC` does include the leap-seconds. Running your code on that file gives an assertion-error, I guess because your `f.seek` is off by a few bytes. Changing this to `f.seek(offset + 4, 1)` appears to solve this. I added some working code to my question. Thanks for pointing me in the right direction. – Bas Swinckels Oct 30 '13 at 17:02
  • @BasSwinckels: `datetime()` doesn't support leap seconds. All leap seconds are 23:59:60 in UTC (so far) i.e., you should [display `2012-06-30 23:59:60Z` instead of `2012-07-01 00:00:00Z`](https://gist.github.com/zed/a912498be36c6a947c33) that is **different** moment in time. – jfs Sep 01 '14 at 10:59
  • @BasSwinckels: I've written [`utc_to_tai()` function](https://gist.github.com/zed/a912498be36c6a947c33#file-utc_to_tai-py) using @steveha's `leap_seconds()` function that allows to compute elapsed SI seconds between two events if their posix timestamps are known (normally, "seconds since Epoch" are not true elapsed SI seconds due to intercalary leap seconds). – jfs Sep 05 '14 at 23:14
3

PyEphem has a delta_t function which returns the difference between Terrestrial Time and Universal Time (seconds). You can subtract 32.184 from it to get Leap Seconds (ref).

import ephem, datetime
ephem.delta_t(datetime.datetime.now()) - 32.184
Out[2]: 35.01972996360122
mattexx
  • 6,031
  • 1
  • 29
  • 45
  • 6
    Thanks for the link, but looking at the source code of PyEphem, it seems to get its leap-seconds information from `libastro`. Looking at [that source](https://github.com/brandon-rhodes/pyephem/blob/master/libastro-3.7.5/deltat.c), there is again a hard-coded table. The latest version seems to be from 2011, which is out of date, since the last leap-second was in July 2012! That is why I wanted to use something based directly on tzdata, since it is actively updated several times per year. – Bas Swinckels Oct 28 '13 at 23:29
  • 1
    This computation actually gives the difference TT - UT1, not TT - UTC. – JPaget Jan 08 '15 at 23:17