3

When I read a binary mkv the id of a cluster is E7 byte and the timestamp has an unsigned int value but when I read it id doesn't give me the correct timestamp.

double mkVSParser::get_clusters_timestamps(char *&package,unsigned long &size)
{
      uint8_t *data_to_find = new uint8_t;
      *data_to_find=0xE7;//the id
      char * buffer = new char[sizeof (uint8_t)];
      uint8_t current_data[sizeof (uint8_t)];

      for(int i=0;i<size;i++)//finde the first 0xE7 in an cluster
      {
          memcpy(&buffer[0],&package[i],sizeof (uint8_t));

          memcpy(&current_data[0],buffer,sizeof (uint8_t));

          if (memcmp(data_to_find, current_data, sizeof (uint8_t)) == 0)
          {
              unsigned int timestemp;
              std::cout<<"position of byte =="<<i<<"and id =="<<(unsigned int)package[i]<<std::endl;

              memcpy(&timestemp,&package[i+1],sizeof(unsigned int));

              std::cout<<"cluster timestemp ="<<timestemp<<std::endl;
              return 0;
          }

            }

      return 0;
}

Is there something that I missed?

  • 1
    Unrelated to your problem, but you have *other* problems as well. Like *memory leaks*. There's really no need to allocate `data_to_find` dynamically. Just use `uint8_t data_to_find = 0xE7;`. Also no need to create a single-element `current_data` or `buffer` either. Just declare them as single `uint8_t` variables. Use plain assignment to copy the values to the variables. And do a direct comparison (`current_data == data_to_find`) instead of calling `memcmp`. And unless you want to modify the pointer `package` or `size`, then don't pass them by reference (and pass `package` as `const char*`). – Some programmer dude Mar 09 '20 at 07:19

1 Answers1

1

MKV binary data is in EBML format and unsigned integer may be variable in size. Variable size int's may consist of variable number of octets (may have different size in bytes).

Each Variable Size Integer starts with a VINT_WIDTH followed by a VINT_MARKER. VINT_WIDTH is a sequence of zero or more bits of value 0, and is terminated by the VINT_MARKER, which is a single bit of value 1. The total length in bits of both VINT_WIDTH and VINT_MARKER is the total length in octets in of the Variable Size Integer.

The single bit 1 starts a Variable Size Integer with a length of one octet. The sequence of bits 01 starts a Variable Size Integer with a length of two octets. 001 starts a Variable Size Integer with a length of three octets, and so on, with each additional 0-bit adding one octet to the length of the Variable Size Integer.

Position of first '1' bit in first byte of variable size integer denotes size in bytes. If it's on the first position

1XXXXXXX (I use 'X' for other bits of the number here, besides the length part)

then the variable is one byte long and the rest of the bits after first '1' bit (7 lower bits in this case) are the binary representation of the number. Variable size int that starts with

0000001X XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

is seven bytes long as first '1' bit here is on the seventh position.

So first you need to read first byte of the number and find the position N of the first '1' bit and then read the whole number N bytes long ignoring that first '1' bit (like it's a zero bit).

constexpr uint8_t VarSizeIntLenMark(int length)
{
    return 1 << (8 - length); // set single bit at length's position
}

int VarSizeIntLen(const uint8_t* data)
{
    for (int i = 1; i <= 8; i++)
        if (VarSizeIntLenMark(i) & data[0]) return i;
    return 0;
}

uint64_t ReadVariableSizeInt(const uint8_t* data)
{
    int length = VarSizeIntLen(data[0]);
    uint64_t parsedValue = data[0] & (~VarSizeIntLenMark(length)); // invert VINT_MARKER bit
    for (int i = 1; i < length; i++) // read other bytes
        parsedValue = (parsedValue << 8) + data[i];
    return parsedValue;
}
Community
  • 1
  • 1
Oliort
  • 1,360
  • 11
  • 27