Fragmented MP4 - problem playing in browser

Question

I try to create fragmented MP4 from raw H264 video data so I could play it in internet browser's player. My goal is to create live streaming system, where media server would send fragmented MP4 pieces to browser. The server would buffer input data from RaspberryPi camera, which sends video as H264 frames. It would then mux that video data and make it available for client. The browser would play media data (that were muxed by server and sent i.e. through websocket) by using Media Source Extensions.

For test purpose I wrote the following pieces of code (using many examples I found in the intenet):

C++ application using avcodec which muxes raw H264 video to fragmented MP4 and saves it to a file:

#define READBUFSIZE 4096
#define IOBUFSIZE 4096
#define ERRMSGSIZE 128

#include <cstdint>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>

extern "C"
{
    #include <libavformat/avformat.h>
    #include <libavutil/error.h>
    #include <libavutil/opt.h>
}

enum NalType : uint8_t
{
    //NALs containing stream metadata
    SEQ_PARAM_SET = 0x7,
    PIC_PARAM_SET = 0x8
};

std::vector<uint8_t> outputData;

int mediaMuxCallback(void *opaque, uint8_t *buf, int bufSize)
{
    outputData.insert(outputData.end(), buf, buf + bufSize);
    return bufSize;
}

std::string getAvErrorString(int errNr)
{
    char errMsg[ERRMSGSIZE];
    av_strerror(errNr, errMsg, ERRMSGSIZE);
    return std::string(errMsg);
}

int main(int argc, char **argv)
{
    if(argc < 2)
    {
        std::cout << "Missing file name" << std::endl;
        return 1;
    }

    std::fstream file(argv[1], std::ios::in | std::ios::binary);
    if(!file.is_open())
    {
        std::cout << "Couldn't open file " << argv[1] << std::endl;
        return 2;
    }

    std::vector<uint8_t> inputMediaData;
    do
    {
        char buf[READBUFSIZE];
        file.read(buf, READBUFSIZE);

        int size = file.gcount();
        if(size > 0)
            inputMediaData.insert(inputMediaData.end(), buf, buf + size);
    } while(!file.eof());
    file.close();

    //Initialize avcodec
    av_register_all();
    uint8_t *ioBuffer;
    AVCodec *codec = avcodec_find_decoder(AV_CODEC_ID_H264);
    AVCodecContext *codecCtxt = avcodec_alloc_context3(codec);
    AVCodecParserContext *parserCtxt = av_parser_init(AV_CODEC_ID_H264);
    AVOutputFormat *outputFormat = av_guess_format("mp4", nullptr, nullptr);
    AVFormatContext *formatCtxt;
    AVIOContext *ioCtxt;
    AVStream *videoStream;

    int res = avformat_alloc_output_context2(&formatCtxt, outputFormat, nullptr, nullptr);
    if(res < 0)
    {
        std::cout << "Couldn't initialize format context; the error was: " << getAvErrorString(res) << std::endl;
        return 3;
    }

    if((videoStream = avformat_new_stream( formatCtxt, avcodec_find_encoder(formatCtxt->oformat->video_codec) )) == nullptr)
    {
        std::cout << "Couldn't initialize video stream" << std::endl;
        return 4;
    }
    else if(!codec)
    {
        std::cout << "Couldn't initialize codec" << std::endl;
        return 5;
    }
    else if(codecCtxt == nullptr)
    {
        std::cout << "Couldn't initialize codec context" << std::endl;
        return 6;
    }
    else if(parserCtxt == nullptr)
    {
        std::cout << "Couldn't initialize parser context" << std::endl;
        return 7;
    }
    else if((ioBuffer = (uint8_t*)av_malloc(IOBUFSIZE)) == nullptr)
    {
        std::cout << "Couldn't allocate I/O buffer" << std::endl;
        return 8;
    }
    else if((ioCtxt = avio_alloc_context(ioBuffer, IOBUFSIZE, 1, nullptr, nullptr, mediaMuxCallback, nullptr)) == nullptr)
    {
        std::cout << "Couldn't initialize I/O context" << std::endl;
        return 9;
    }

    //Set video stream data
    videoStream->id = formatCtxt->nb_streams - 1;
    videoStream->codec->width = 1280;
    videoStream->codec->height = 720;
    videoStream->time_base.den = 60; //FPS
    videoStream->time_base.num = 1;
    videoStream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
    formatCtxt->pb = ioCtxt;

    //Retrieve SPS and PPS for codec extdata
    const uint32_t synchMarker = 0x01000000;
    unsigned int i = 0;
    int spsStart = -1, ppsStart = -1;
    uint16_t spsSize = 0, ppsSize = 0;
    while(spsSize == 0 || ppsSize == 0)
    {
        uint32_t *curr =  (uint32_t*)(inputMediaData.data() + i);
        if(*curr == synchMarker)
        {
            unsigned int currentNalStart = i;
            i += sizeof(uint32_t);
            uint8_t nalType = inputMediaData.data()[i] & 0x1F;
            if(nalType == SEQ_PARAM_SET)
                spsStart = currentNalStart;
            else if(nalType == PIC_PARAM_SET)
                ppsStart = currentNalStart;

            if(spsStart >= 0 && spsSize == 0 && spsStart != i)
                spsSize = currentNalStart - spsStart;
            else if(ppsStart >= 0 && ppsSize == 0 && ppsStart != i)
                ppsSize = currentNalStart - ppsStart;
        }
        ++i;
    }

    videoStream->codec->extradata = inputMediaData.data() + spsStart;
    videoStream->codec->extradata_size = ppsStart + ppsSize;

    //Write main header
    AVDictionary *options = nullptr;
    av_dict_set(&options, "movflags", "frag_custom+empty_moov", 0);
    res = avformat_write_header(formatCtxt, &options);
    if(res < 0)
    {
        std::cout << "Couldn't write container main header; the error was: " << getAvErrorString(res) << std::endl;
        return 10;
    }

    //Retrieve frames from input video and wrap them in container
    int currentInputIndex = 0;
    int framesInSecond = 0;
    while(currentInputIndex < inputMediaData.size())
    {
        uint8_t *frameBuffer;
        int frameSize;
        res = av_parser_parse2(parserCtxt, codecCtxt, &frameBuffer, &frameSize, inputMediaData.data() + currentInputIndex,
            inputMediaData.size() - currentInputIndex, AV_NOPTS_VALUE, AV_NOPTS_VALUE, 0);
        if(frameSize == 0) //No more frames while some data still remains (is that even possible?)
        {
            std::cout << "Some data left unparsed: " << std::to_string(inputMediaData.size() - currentInputIndex) << std::endl;
            break;
        }

        //Prepare packet with video frame to be dumped into container
        AVPacket packet;
        av_init_packet(&packet);
        packet.data = frameBuffer;
        packet.size = frameSize;
        packet.stream_index = videoStream->index;
        currentInputIndex += frameSize;

        //Write packet to the video stream
        res = av_write_frame(formatCtxt, &packet);
        if(res < 0)
        {
            std::cout << "Couldn't write packet with video frame; the error was: " << getAvErrorString(res) << std::endl;
            return 11;
        }

        if(++framesInSecond == 60) //We want 1 segment per second
        {
            framesInSecond = 0;
            res = av_write_frame(formatCtxt, nullptr); //Flush segment
        }
    }
    res = av_write_frame(formatCtxt, nullptr); //Flush if something has been left

    //Write media data in container to file
    file.open("my_mp4.mp4", std::ios::out | std::ios::binary);
    if(!file.is_open())
    {
        std::cout << "Couldn't open output file " << std::endl;
        return 12;
    }

    file.write((char*)outputData.data(), outputData.size());
    if(file.fail())
    {
        std::cout << "Couldn't write to file" << std::endl;
        return 13;
    }

    std::cout << "Media file muxed successfully" << std::endl;
    return 0;
}

(I hardcoded a few values, such as video dimensions or framerate, but as I said this is just a test code.)

Simple HTML webpage using MSE to play my fragmented MP4

<!DOCTYPE html>
<html>
<head>
    <title>Test strumienia</title>
</head>
<body>
    <video width="1280" height="720" controls>
    </video>
</body>
<script>
var vidElement = document.querySelector('video');

if (window.MediaSource) {
  var mediaSource = new MediaSource();
  vidElement.src = URL.createObjectURL(mediaSource);
  mediaSource.addEventListener('sourceopen', sourceOpen);
} else {
  console.log("The Media Source Extensions API is not supported.")
}

function sourceOpen(e) {
  URL.revokeObjectURL(vidElement.src);
  var mime = 'video/mp4; codecs="avc1.640028"';
  var mediaSource = e.target;
  var sourceBuffer = mediaSource.addSourceBuffer(mime);
  var videoUrl = 'my_mp4.mp4';
  fetch(videoUrl)
    .then(function(response) {
      return response.arrayBuffer();
    })
    .then(function(arrayBuffer) {
      sourceBuffer.addEventListener('updateend', function(e) {
        if (!sourceBuffer.updating && mediaSource.readyState === 'open') {
          mediaSource.endOfStream();
        }
      });
      sourceBuffer.appendBuffer(arrayBuffer);
    });
}
</script>
</html>

Output MP4 file generated by my C++ application can be played i.e. in MPC, but it doesn't play in any web browser I tested it with. It also doesn't have any duration (MPC keeps showing 00:00).

To compare output MP4 file I got from my C++ application described above, I also used FFMPEG to create fragmented MP4 file from the same source file with raw H264 stream. I used the following command:

ffmpeg -r 60 -i input.h264 -c:v copy -f mp4 -movflags empty_moov+default_base_moof+frag_keyframe test.mp4

This file generated by FFMPEG is played correctly by every web browser I used for tests. It also has correct duration (but also it has trailing atom, which wouldn't be present in my live stream anyway, and as I need a live stream, it won't have any fixed duration in the first place).

MP4 atoms for both files look very similiar (they have identical avcc section for sure). What's interesting (but not sure if it's of any importance), both files have different NALs format than input file (RPI camera produces video stream in Annex-B format, while output MP4 files contain NALs in AVCC format... or at least it looks like it's the case when I compare mdat atoms with input H264 data).

I assume there is some field (or a few fields) I need to set for avcodec to make it produce video stream that would be properly decoded and played by browsers players. But what field(s) do I need to set? Or maybe problem lies somewhere else? I ran out of ideas.

EDIT 1: As suggested, I investigated binary content of both MP4 files (generated by my app and FFMPEG tool) with hex editor. What I can confirm:

both files have identical avcc section (they match perfectly and are in AVCC format, I analyzed it byte after byte and there's no mistake about it)
both files have NALs in AVCC format (I looked closely at mdat atoms and they don't differ between both MP4 files)

So I guess there's nothing wrong with the extradata creation in my code - avcodec takes care of it properly, even if I just feed it with SPS and PPS NALs. It converts them by itself, so no need for me to do it by hand. Still, my original problem remains.

EDIT 2: I achieved partial success - MP4 generated by my app now plays in Firefox. I added this line to the code (along with rest of stream initialization):

videoStream->codec->time_base = videoStream->time_base;

So now this section of my code looks like this:

//Set video stream data
videoStream->id = formatCtxt->nb_streams - 1;
videoStream->codec->width = 1280;
videoStream->codec->height = 720;
videoStream->time_base.den = 60; //FPS
videoStream->time_base.num = 1;
videoStream->codec->time_base = videoStream->time_base;
videoStream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
formatCtxt->pb = ioCtxt;

score 3 · Accepted Answer · answered Jan 14 '19 at 19:41

I finally found the solution. My MP4 now plays in Chrome (while still playing in other tested browsers).

In Chrome chrome://media-internals/ shows MSE logs (of a sort). When I looked there, I found a few of following warnings for my test player:

ISO-BMFF container metadata for video frame indicates that the frame is not a keyframe, but the video frame contents indicate the opposite.

That made me think and encouraged to set AV_PKT_FLAG_KEY for packets with keyframes. I added following code to section with filling AVPacket structure:

    //Check if keyframe field needs to be set
    int allowedNalsCount = 3; //In one packet there would be at most three NALs: SPS, PPS and video frame
    packet.flags = 0;
    for(int i = 0; i < frameSize && allowedNalsCount > 0; ++i)
    {
        uint32_t *curr =  (uint32_t*)(frameBuffer + i);
        if(*curr == synchMarker)
        {
            uint8_t nalType = frameBuffer[i + sizeof(uint32_t)] & 0x1F;
            if(nalType == KEYFRAME)
            {
                std::cout << "Keyframe detected at frame nr " << framesTotal << std::endl;
                packet.flags = AV_PKT_FLAG_KEY;
                break;
            }
            else
                i += sizeof(uint32_t) + 1; //We parsed this already, no point in doing it again

            --allowedNalsCount;
        }
    }

A KEYFRAME constant turns out to be 0x5 in my case (Slice IDR).

score 0 · Answer 2 · answered Jan 10 '19 at 03:58

0

MP4 atoms for both files look very similiar (they have identical avcc section for sure)

Double check that, The code supplied suggests otherwise to me.

What's interesting (but not sure if it's of any importance), both files have different NALs format than input file (RPI camera produces video stream in Annex-B format, while output MP4 files contain NALs in AVCC format... or at least it looks like it's the case when I compare mdat atoms with input H264 data).

It is very important, mp4 will not work with annex b.

answered Jan 10 '19 at 03:58

szatmary

27,213
7
39
54

"The code supplied suggests otherwise to me" - what part of my code suggests that, exactly? And how? Could you please give some more details? "mp4 will not work with annex b" - But it works in case of MP4 file generated by ffmpeg tool... – PookyFan Jan 10 '19 at 08:14
Extradata is more than just sps/pps. It has a header, nalsizelength, etc. I don’t see that in your code. Also ffmpeg will not write annex b to the mp4. That’s why it works. You can look at the mdat ffmpeg produces, you will find no start codes. – szatmary Jan 10 '19 at 08:28
Ok, I'll double check with MP4 analysis tool when I get back home, but I'm pretty sure that both MP4 files have identical avcc section and their mdat atoms contain NALs in AVCC format. But just to be extra sure I'll confirm it once more. For the time being, do you have any ideas how to improve my C++ code? Do I need to set extradata like for AVCC format? For now it's set to SPS and PPS content because that's how it should be done in case of Annex-B. I know there are more thing to set when it's AVCC format, but then again - my input format is Annex-B, not AVCC... – PookyFan Jan 10 '19 at 08:35

score 0 · Answer 3 · answered Jan 10 '19 at 10:21

0

You need to fill in extradata with AVC Decoder Configuration Record, not just SPS/PPS

Here's how the record should look like: AVCDCR

answered Jan 10 '19 at 10:21

Sokolio

1

Are you sure I need to set it like this even though my input NAL format is Annex-B? According to my research, extradata format you suggest applies to AVCC NAL format (explained here for example: http://aviadr1.blogspot.com/2010/05/h264-extradata-partially-explained-for.html ). I know the output format is AVCC in my case, but if avcodec changed my NAL format during muxing, wouldn't it automatically convert extdata to correct format as well? – PookyFan Jan 10 '19 at 10:31
I'm not sure. But perhaps you could take a look at the hex dump of your mp4 fragments and look for avcc atom. If its contents start with 0x01 then it means that ffmpeg managed to convert your SPS/PPS pair to AVCC. – Sokolio Jan 10 '19 at 10:36

gemstone · Answer 4 · 2019-06-12T11:41:44.560

We can find this explanation in [Chrome Source] (https://chromium.googlesource.com/chromium/src/+/refs/heads/master/media/formats/mp4/mp4_stream_parser.cc#799) "chrome media source code":

// Copyright 2014 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.


  // Use |analysis.is_keyframe|, if it was actually determined, for logging
  // if the analysis mismatches the container's keyframe metadata for
  // |frame_buf|.
  if (analysis.is_keyframe.has_value() &&
      is_keyframe != analysis.is_keyframe.value()) {
    LIMITED_MEDIA_LOG(DEBUG, media_log_, num_video_keyframe_mismatches_,
                      kMaxVideoKeyframeMismatchLogs)
        << "ISO-BMFF container metadata for video frame indicates that the "
           "frame is "
        << (is_keyframe ? "" : "not ")
        << "a keyframe, but the video frame contents indicate the "
           "opposite.";
    // As of September 2018, it appears that all of Edge, Firefox, Safari
    // work with content that marks non-avc-keyframes as a keyframe in the
    // container. Encoders/muxers/old streams still exist that produce
    // all-keyframe mp4 video tracks, though many of the coded frames are
    // not keyframes (likely workaround due to the impact on low-latency
    // live streams until https://crbug.com/229412 was fixed).  We'll trust
    // the AVC frame's keyframe-ness over the mp4 container's metadata if
    // they mismatch. If other out-of-order codecs in mp4 (e.g. HEVC, DV)
    // implement keyframe analysis in their frame_bitstream_converter, we'll
    // similarly trust that analysis instead of the mp4.
    is_keyframe = analysis.is_keyframe.value();
  }

As the code comment show, chrome trust the AVC frame's keyframe-ness over the mp4 container's metadata. So nalu type in H264/HEVC should be more important than mp4 container box sdtp & trun description.

Gemstone, it's better to post here what the link is about and what solution it gives. That link may be dead tomorrow and your answer loses all of its value. I took the liberty to copy the whole code block here (adding their top copyright notice). And if you can explain why this info is relevant, it'd be super. — brasofilo, Jun 12 '19 at 02:44
You can [edit] your answer and incorporate this info, then we can clean our comments and all is fine and dandy (y) — brasofilo, Jun 12 '19 at 05:02
That's cool gemstone, but shouldn't video play just fine even if there is a mismatch between keyframe-ness in MP4 metadata and AVC keyframe indicator? I see that is_keyframe variable is set to whatever AVC has to say about keyframe-ness, yet my video wouldn't play unless I set it manually so there would be no mismatch. — PookyFan, Jun 16 '19 at 13:06

Fragmented MP4 - problem playing in browser

4 Answers4

Linked