56

There does not seem to be any explanation online as to what these are. People talk about them a lot. I just want to know what they are and why they are significant. Using -video_track_timescale, how would I determine a number for it? Is it random? Should it be 0?

llogan
  • 87,794
  • 21
  • 166
  • 190
Please Help
  • 659
  • 1
  • 7
  • 11

1 Answers1

116

Modern containers govern the time component of presentation of video (and audio) frames using timestamps, rather than framerate. So, instead of recording a video as 25 fps, and thus implying that each frame should be drawn 0.04 seconds apart, they store a timestamp for each frame e.g.

 Frame      pts_time
   0          0.00
   1          0.04
   2          0.08
   3          0.12
   ...

For the sake of precise resolution of these time values, a timebase is used i.e. a unit of time which represents one tick of a clock, as it were. So, a timebase of 1/75 represents 1/75th of a second. The Presentation TimeStamps are then denominated in terms of this timebase. Timescale is simply the reciprocal of the timebase. FFmpeg shows the timescale as the tbn value in the readout of a stream.

Timebase = 1/75; Timescale = 75
 Frame        pts           pts_time
   0          0          0 x 1/75 = 0.00
   1          3          3 x 1/75 = 0.04 
   2          6          6 x 1/75 = 0.08
   3          9          9 x 1/75 = 0.12
   ...

This method of regulating time allows variable frame-rate video.

Gyan
  • 63,018
  • 7
  • 100
  • 141
  • thank you so much, your explanation helped me a lot! but in the second part, you chose the number 0, 3, 6, and 9 for the PTS and the timebase 1/75. 1) Can I just pick any PTS method I want? like sum 1000 or sum 1? 2) Did you just choose 1/75 because it complements 25/1 FPS? Or can I also pick any number? like 1000 or 1/90000 or should it be something related to the FPS? why a numerator/denominator and not a simple number? What reasons should I care to pick up these numbers? – Leandro Moreira Nov 11 '17 at 22:53
  • 2
    The timebase can be any number that at least respects the FPS i.e. for 25 fps, it should be at least 1/25. If it is 1/15, then depending on the muxer, ffmpeg will either drop frames or alter the output framerate to 15. Framerates can be fractional hence a rational number. Timebases are rational because they represent fractions of a second. – Gyan Nov 12 '17 at 05:23
  • 20
    The reason for the typical use of 90,000 as a common base of calculation is that it is a number which is divisible by 24, by 25, and by 30 (in each case the result is an integer - there is no remainder, decimal or fraction), thus the maths is equally suitable for handling 24 frames per second, 25 fps, and 30 fps. – Ed999 Dec 03 '17 at 03:43
  • 5
    @Ed999 is correct that 90000 is an integral multiple of 24, 25 and 30 but that is not the reason. 600 suffices for that purpose, and Quicktime writers typically use that value for timescale. – Gyan Jun 26 '19 at 14:13
  • 1
    The H.222 standard, for MPEG-TS where this timebase is used, states, "*The value of the system clock frequency is measured in Hz and shall meet the following constraints: 27 000 000 – 810 ≤ system_clock_frequency ≤ 27 000 000 + 810*" and later on,. – Gyan Jun 26 '19 at 14:13
  • 3
    "*For notational convenience, equations in which PCR, PTS, or DTS appear, lead to values of time which are accurate to some integral multiple of (300 × 2^33/system_clock_frequency) seconds. This is due to the encoding of PCR timing information as 33 bits of 1/300 of the system clock frequency plus 9 bits for the remainder, and encoding as 33 bits of the system clock frequency divided by 300 for PTS and DTS.*" – Gyan Jun 26 '19 at 14:13
  • 3
    27 Mhz / 300 = 90000 Hz. – Gyan Jun 26 '19 at 14:13
  • @Gyan - One recognised benefit of using a value of 90,000 is that it is divisible by 48,000 without raising any problem over the remainder, hence the success of 48,000 Hz as an audio sampling frequency (where the popular sampling frequency of 44,100 Hz can cause loss of sync because of the impossibility of the calculation: 90,000 divided by 44,100 equals a very dodgy old remainder, that makes it impossible to exactly match the audio and video frame rates, leading to sync problems). – Ed999 Jun 29 '19 at 02:00
  • @Gyan - I take the point about the relationship of 27 million Hz and 90,000. But if the frequency can drift across a range of 1,620 Hz (i.e. plus or minus 810 Hz) I am not quite clear how the relationship to the (fixed) value of 90,000 represents the necessary degree of mathematical precision. – Ed999 Jun 29 '19 at 02:03
  • 1
    MPEG-TS is a transport stream, not a file format. The receiver runs their own 27 Mhz clock and establishes a phase sync with the counter in the received stream. Jitter within the limits given is then expected to be accommodated. – Gyan Jun 29 '19 at 05:51
  • The hell people went through to get it work! You have to consider 27 Mhz clock, you have to consider different frame rates, and somehow fit everything together. – sanmai Apr 27 '21 at 23:15
  • Just stepped over this and wanted to add a link to a related question answerred within 15 minutes from a mathematics perspective: https://stackoverflow.com/questions/47117098/why-is-27000-a-magic-number-for-video-point-frame-description-in-full-numbers?rq=1 – Harry May 05 '21 at 17:46