3

I've got a bin file that was encoded in an application that I need to get access to and convert to a csv file. I've been given the documentation, but am not sure how to access the contents of this file in Python.

Here are some of the details about how the dataset was serialized

Datasets.bin is a list of DataSet classes serialized using Qt's QDataStream serialization using version QDataStream::Qt_4_7.

The format of the datasets.bin file is:

quint32 Magic Number    0x46474247
quint32 Version     1
quint32 DataSet Marker  0x44415441
qint32      # of DataSets       n
DataSet DataSet 1
DataSet DataSet 2
     .
     .
     .
     .
DataSet DataSet n


The format of each DataSet is:

quint32     Magic Number    0x53455455  
QString     Name
quint32     Flags           Bit field (Set Table)
QString     Id          [Optional]  
QColor      Color           [Optional]
qint32          Units           [Optional]
QStringList         Creator Ids     [Optional]
bool            Hidden          [Optional]
QList<double>   Thresholds      [Optional]
QString         Source          [Optional]
qint32          Role            [Optional]
QVector<QPointF>    data points

I've been looking in to the PyQt4 datastream documentation, but I can't seem to find any specific examples. Any help pointing me in the right direction would be great

ekhumoro
  • 98,079
  • 17
  • 183
  • 279
Michael Bawol
  • 353
  • 2
  • 9
  • Would it be easiest to use QDataStream, e.g. with Qt Python bindings? – Kevin Krammer Nov 23 '16 at 11:07
  • https://dl.dropboxusercontent.com/u/28824868/datasets.bin here is a link to the datasets file if anyone wants to test it out – Michael Bawol Nov 24 '16 at 14:58
  • @MichaelBawol. I tried to read that file with C++, and it fails at the first `Source` entry. So either the format is incomplete/wrong or the file is corrupted. Where are you getting the files from? Do you have a small toy example with a known set of values? – ekhumoro Nov 24 '16 at 22:04
  • There is an xml file that is stored with each dataset bin. https://dl.dropboxusercontent.com/u/28824868/session.xml The dataset comes from a custom software that I had built to download and manipulate sensor data. The company has since folded and provided me with the format. Here is the complete formatting documentation I was given. https://dl.dropboxusercontent.com/u/28824868/Spin%20Review%20File%20Format.zip – Michael Bawol Nov 25 '16 at 14:46
  • @MichaelBawol. I was able to create a C++ tool which can read the dataset file included with the formatting documentation. However, it cannot read the other dataset file, which I am now certain is either corrupted or in a different format. I have updated my answer based on what I have learned, but I still cannot see how to read the `QList/QVector` types properly in PyQt. I suspect that it just may not be possible. However, the current code in my answer does at least show *how* to tread the format correctly. – ekhumoro Nov 25 '16 at 20:26
  • @MichaelBawol. As I suspected (see my updated answer), it is not possible to read these particular files with PyQt because of the use of template classes. – ekhumoro Nov 26 '16 at 18:56
  • @ekhumoro thanks for your help – Michael Bawol Nov 27 '16 at 19:48

1 Answers1

2

PyQt cannot read all of the data the same way as in C++, because it cannot handle template classes (like QList<double> and QVector<QPointF>), which would require language-specific support that is not available in Python. This means a work-around must be used. Fortunately, the datastream format is quite straightforward, so reading arbitrary template classes can be reduced to a simple algorithm: read the length as a uint32, then iterate over a range and read the contained elements one-by-one into a list:

points = []
length = stream.readUInt32()
for index in range(length):
    point = QPoint()
    stream >> point
    points.append(point)

Below is a script that shows how to read the whole dataset format correctly:

from PyQt4 import QtCore, QtGui

FLAG_HASSOURCE = 0x0001
FLAG_HASROLE = 0x0002
FLAG_HASCOLOR = 0x0004
FLAG_HASID = 0x0008
FLAG_COMPRESS = 0x0010
FLAG_HASTHRESHOLDS = 0x0020
FLAG_HASUNITS = 0x0040
FLAG_HASCREATORIDS = 0x0080
FLAG_HASHIDDEN = 0x0100
FLAG_HASMETADATA = 0x0200

MAGIC_NUMBER = 0x46474247
FILE_VERSION = 1
DATASET_MARKER = 0x44415441
DATASET_MAGIC = 0x53455455

def read_data(path):
    infile = QtCore.QFile(path)
    if not infile.open(QtCore.QIODevice.ReadOnly):
        raise IOError(infile.errorString())

    stream = QtCore.QDataStream(infile)
    magic = stream.readUInt32()
    if magic != MAGIC_NUMBER:
        raise IOError('invalid magic number')
    version = stream.readUInt32()
    if version != FILE_VERSION:
        raise IOError('invalid file version')
    marker = stream.readUInt32()
    if marker != DATASET_MARKER:
        raise IOError('invalid dataset marker')
    count = stream.readInt32()
    if count < 1:
        raise IOError('invalid dataset count')

    stream.setVersion(QtCore.QDataStream.Qt_4_7)

    rows = []
    while not stream.atEnd():
        row = []

        magic = stream.readUInt32()
        if magic != DATASET_MAGIC:
            raise IOError('invalid dataset magic number')

        row.append(('Name', stream.readQString()))

        flags = stream.readUInt32()
        row.append(('Flags', flags))

        if flags & FLAG_HASID:
            row.append(('ID', stream.readQString()))
        if flags & FLAG_HASCOLOR:
            color = QtGui.QColor()
            stream >> color
            row.append(('Color', color))
        if flags & FLAG_HASUNITS:
            row.append(('Units', stream.readInt32()))
        if flags & FLAG_HASCREATORIDS:
            row.append(('Creators', stream.readQStringList()))
        if flags & FLAG_HASHIDDEN:
            row.append(('Hidden', stream.readBool()))
        if flags & FLAG_HASTHRESHOLDS:
            thresholds = []
            length = stream.readUInt32()
            for index in range(length):
                thresholds.append(stream.readDouble())
            row.append(('Thresholds', thresholds))
        if flags & FLAG_HASSOURCE:
            row.append(('Source', stream.readQString()))
        if flags & FLAG_HASROLE:
            row.append(('Role', stream.readInt32()))

        points = []
        length = stream.readUInt32()
        for index in range(length):
            point = QtCore.QPointF()
            stream >> point
            points.append(point)
        row.append(('Points', points))
        rows.append(row)

    infile.close()

    return rows

rows = read_data('datasets.bin')

for index, row in enumerate(rows):
    print('Row %s:' % index)
    for key, data in row:
        if isinstance(data, list) and len(data):
            print('  %s = [%s ... ] (%s items)' % (
                  key, repr(data[:3])[1:-1], len(data)))
        else:
            print('  %s = %s' % (key, data))
ekhumoro
  • 98,079
  • 17
  • 183
  • 279
  • "I don't know whether it will be able to read QList or QVector, because PyQt cannot directly support C++ template classes." - I think this is a good question on its own. – Trilarion Nov 24 '16 at 08:37
  • This somewhat worked. It accessed the contents, and appended some information to the row list. Although this information was sparse, I can see how it would be difficult without the file itself. I've included it above. I'm still digging in to this code to understand it better. – Michael Bawol Nov 24 '16 at 15:01