15

I'm updating an application in which measurement of the time of presentation of a stimulus on a screen requires the greatest amount of accuracy. It is currently written with DirectDraw, which got put out to pasture a long while ago, and there's a need to update our graphics library.

The way which we measure the presentation time utilizes detecting the end of the Vertical Blank period. Specifically I need to know with, the greatest possible accuracy, when whatever was flipped onto the primary surface (or presented in the swap chain) is actually being drawn by the screen. Detecting the scan line can increase the certainty of that measurement, but I would be able to work with only detecting when the vertical blank period ended immediately after the Flip or Present was called.

Direct 3D 9 has the IDirect3DDevice9::GetRasterStatus Method that returns a D3DRASTER_STATUS struct which includes a InVBlank boolean, that describes if the device is in a vertical blank, as well as the current scan line. DirectDraw has similar functions (IDirectDraw::GetVerticalBlankStatus, also IDirectDraw::GetScanLine which returns DDERR_VERTICALBLANKINPROGRESS during Vertical Blank can be used to detect the VB).

However I have not been able to find any similar function in Direct3D11. Does anyone know if this functionality was moved or removed between Direct3D9 and Direct3D11, and if the latter, why?

Rob
  • 751
  • 6
  • 8
  • I don't know about DirectX, but can't you just flush the pipeline like in OpenGL. So, that it is ensured, that all calls to DX11 are executed? And also doesn't Present() only return after it copied/flipped the buffers? – Pillum Apr 17 '12 at 23:55
  • 2
    You can use [IDXGIOutput::WaitForVBlank](http://msdn.microsoft.com/en-us/library/bb174559%28v=vs.85%29) to wait for the vertical sync on D3D 10 and 11. Maybe this can help. – pearcoding May 20 '12 at 10:05

4 Answers4

8

Sorry for the late reply, but I notice there is still no accepted answer so perhaps you never found one that worked. Nowadays on Windows, the DesktopWindowManager service (dwm.exe) coordinates everything and can't really be bypassed. Ever since Windows 8, this service can't be disabled.

So DWM is always going to control the frame rate, render queue management, and final composition for all of the various IDXGISurface(n) objects and IDXGIOutput(n) monitors and there isn't much use in tracking VSync for an offscreen render target, unless I'm missing something (no sarcasm intended). As for your question, I wasn't sure if your goal was to:

  1. obtain extremely precise timing info, but just for diagnostic, profiling, or informational use, or
  2. whether the app was then going to (attempt to) use those results to (attempt to) schedule its own present cycles.

If it's the latter, I believe you can effectively only do this if the D3D app is running in full-screen exclusive mode. That's the only case where the DWM—in the guise of DXGI–will truly trust a client to handle its own Present timing.

The (barely) good news here is that if your interest in VSync is informational only—which is to say that you fall into bullet category (1.) from above—then you can indeed get all the timing data you'd ever want, and at QueryPerformanceFrequency resolution, which is typically around 320 ns.¹

Here's how to get that high-res video timing info. But again, just to be clear, despite the apparent success in obtaining the information as shown below, any attempt to use these interesting results, for example, to condition some deterministic--and thus potentially useful--outcome on the readings you obtain will be destined to fail, that is, entirely thwarted by DWM intermediation:

DWM_TIMING_INFO

Specifies Desktop Window Manager (DWM) composition timing information. Used by the DwmGetCompositionTimingInfo function.

typedef struct _DWM_TIMING_INFO
{
    UINT32    cbSize;                 // size of this DWM_TIMING_INFO structure
    URATIO    rateRefresh;            // monitor refresh rate
    QPC_TIME  qpcRefreshPeriod;       // monitor refresh period
    URATIO    rateCompose;            // composition rate
    QPC_TIME  qpcVBlank;              // query performance counter value before the vertical blank
    CFRAMES   cRefresh;               // DWM refresh counter
    UINT      cDXRefresh;             // DirectX refresh counter
    QPC_TIME  qpcCompose;             // query performance counter value for a frame composition
    CFRAMES   cFrame;                 // frame number that was composed at qpcCompose
    UINT      cDXPresent;             // DirectX present number used to identify rendering frames
    CFRAMES   cRefreshFrame;          // refresh count of the frame that was composed at qpcCompose
    CFRAMES   cFrameSubmitted;        // DWM frame number that was last submitted
    UINT      cDXPresentSubmitted;    // DirectX present number that was last submitted
    CFRAMES   cFrameConfirmed;        // DWM frame number that was last confirmed as presented
    UINT      cDXPresentConfirmed;    // DirectX present number that was last confirmed as presented
    CFRAMES   cRefreshConfirmed;      // target refresh count of the last frame confirmed as completed by the GPU
    UINT      cDXRefreshConfirmed;    // DirectX refresh count when the frame was confirmed as presented
    CFRAMES   cFramesLate;            // number of frames the DWM presented late
    UINT      cFramesOutstanding;     // number of composition frames that have been issued but have not been confirmed as completed
    CFRAMES   cFrameDisplayed;        // last frame displayed
    QPC_TIME  qpcFrameDisplayed;      // QPC time of the composition pass when the frame was displayed
    CFRAMES   cRefreshFrameDisplayed; // vertical refresh count when the frame should have become visible
    CFRAMES   cFrameComplete;         // ID of the last frame marked as completed
    QPC_TIME  qpcFrameComplete;       // QPC time when the last frame was marked as completed
    CFRAMES   cFramePending;          // ID of the last frame marked as pending
    QPC_TIME  qpcFramePending;        // QPC time when the last frame was marked as pending
    CFRAMES   cFramesDisplayed;       // number of unique frames displayed
    CFRAMES   cFramesComplete;        // number of new completed frames that have been received
    CFRAMES   cFramesPending;         // number of new frames submitted to DirectX but not yet completed
    CFRAMES   cFramesAvailable;       // number of frames available but not displayed, used, or dropped
    CFRAMES   cFramesDropped;         // number of rendered frames that were never displayed because composition occurred too late
    CFRAMES   cFramesMissed;          // number of times an old frame was composed when a new frame should have been used but was not available
    CFRAMES   cRefreshNextDisplayed;  // frame count at which the next frame is scheduled to be displayed
    CFRAMES   cRefreshNextPresented;  // frame count at which the next DirectX present is scheduled to be displayed
    CFRAMES   cRefreshesDisplayed;    // total number of refreshes that have been displayed for the application since the DwmSetPresentParameters function was last called
    CFRAMES   cRefreshesPresented;    // total number of refreshes that have been presented by the application since DwmSetPresentParameters was last called
    CFRAMES   cRefreshStarted;        // refresh number when content for this window started to be displayed
    ULONGLONG cPixelsReceived;        // total number of pixels DirectX redirected to the DWM
    ULONGLONG cPixelsDrawn;           // number of pixels drawn
    CFRAMES   cBuffersEmpty;          // number of empty buffers in the flip chain
}
DWM_TIMING_INFO;

(Note: To horizontally compress the above source code for display on this website, assume the following abbreviations are prepended:)

typedef UNSIGNED_RATIO URATIO;
typedef DWM_FRAME_COUNT CFRAMES;

Now for apps running in windowed mode, you can certainly grab this detailed information as often as you like. If you only need it for passive profiling, then getting the data from DwmGetCompositionTimingInfo is the modern way to do it.

And speaking of modern, since the question hinted at modernizing, you'll want to consider using a IDXGISwapChain1 obtained from IDXGIFactory2::CreateSwapChainForComposition to enable the use of the new DirectComposition component.

DirectComposition enables rich and fluid transitions by achieving a high framerate, using graphics hardware, and operating independently of the UI thread. DirectComposition can accept bitmap content drawn by different rendering libraries, including Microsoft DirectX bitmaps, and bitmaps rendered to a window (HWND bitmaps). Also, DirectComposition supports a variety of transformations, such as 2D affine transforms and 3D perspective transforms, as well as basic effects such as clipping and opacity.

Anyway, it seems less likely that detailed timing information might usefully inform an app's runtime behavior; maybe it will help you predict your next VSync, but one does wonder what significance "keen awareness of the blanking period" might have for some particular DWM-subjugated offscreen swap chain.

Because your app's surface is just one of many that the DWM is juggling, the DWM is going to be doing all kinds of dynamic adaptation of its own, under an assumption of each client behaving consistently. Unpredictable adaptations are uncooperative in such a regime, and will likely just end up confounding both parties.




Notes:
1. The resolution of QPC is many orders of magnitude higher than that of the DateTime tick, despite the the latter's suggestive use of a 100 ns. unit denomination. Think of DateTime.Now.Ticks as a repackaging of the (millisecond-denoted) Environment.TickCount, but converted to 100-ns units. For the highest possible resolution, use static method Stopwatch.GetTimestamp() instead of DateTime.Now.Ticks.
Glenn Slayden
  • 14,572
  • 3
  • 90
  • 97
  • 2
    *I found this to be extremely useful information* The OP's language suggests an application that, like my own, presents a stimulus and measures a physiologic response that's time-locked to the stimulus. Decoding brainstem visual evoked potentials, for example, depends on knowing when photons hit the retina with microsecond accuracy. Predicting vsync is of no interest in this case, so this long-awaited answer is spot-on! – Craig.Feied May 10 '18 at 19:15
  • You didn't mention if your app is running in full-screen exclusive mode or not. As noted, that's the only way to properly achieve your goal. If windowed in DWM, you might be able to get by if you set configure the Windows display settings with a high refresh rate, turn off various meddlesome 'features' in your graphics card, and then have your program try to predict passively based on what the DWM reports to you. Obviously, this route will be extremely hacky. – Glenn Slayden May 11 '18 at 00:25
  • Glenn, when running in full-screen exclusive mode, do you know if there is a way to get DirectX vblank information while running a Unity app? – tofutim Aug 15 '18 at 20:47
  • @tofutim If you're asking about how to bypass some Unity limitation, I have no idea about that, never looked into Unity at all. Myself being DWM-bound, I also don't know much about fullscreen exclusive mode but would assume that full hardware access and/or DirectX capabilities are unlocked and available in that scenario. – Glenn Slayden Aug 17 '18 at 20:02
4

Another alternative:

There's D3DKMTGetScanLine() which works with D3D9, D3D10, D3D11, D3D12, and even OpenGL.

It's actually a GDI32 function so you piggyback off the Window's existing graphics hAdaptor to poll the VBlank/Scanline -- no need to create a Direct3D frame buffer. That's why this API works fine with OpenGL, Mantle, and non-Direct3D renderers too, despite the D3D prefix of this API call.

It also tells you VBlank status & Raster scan line.

It's useful for beam-racing applications in supreme "latency-is-critical" applications. Some virtual reality renders use beam racing, when even a mere 20ms of lag can mean the difference between pleasant VR and dizzying/pukeworthy VR.

Beam racing is rendering on the fly, following the scanout of a display. In speciallized latency-critical applications, you can reduce latency from Direct3D Present() to pixels hitting your eyeballs, to absolute minimum (as little as 3ms).

To understand what beam racing is, https://www.wired.com/2009/03/racing-the-beam/ -- it was common back in the day when graphics chips had no frame buffers -- making beam racing necessary for improved graphics on Atari 2600, Nintendo, Commodore 64, etc...

For a more modern implementation of beam racing, see Lagless VSYNC ON Algorithm for Emulators.

Mark Rejhon
  • 810
  • 7
  • 12
0

"Specifically I need to know with, the greatest possible accuracy, when whatever was flipped onto the primary surface (or presented in the swap chain) is actually being drawn by the screen."

Good luck.

There is actually no guarantee that anything you put into the present queue will ever be shown on screen (!!); you can manually drop frames w/ buffer sequencing present flags, or NVIDIA can do it for you (... thanks?)

Buffer Sequencing in DXGI

The DXGI Swapchain's flip queue is generally FIFO, but popular new driver overrides (i.e. FastSync) that users concerned with latency will most assuredly have enabled, favor CPU-side throughput over such trivial things as displaying any of the frames you draw :)

Normally you could count on IDXGISwapChain::Present (...) to begin blocking when the swapchain is full of undisplayed images and the driver is staging commands n-many frames ahead of the GPU, but with FastSync forced, Present never blocks and the render-ahead-queue flushes its work by overwriting any completed frames in the Swapchain that are waiting on VBLANK.

Back-to-back presents that complete quicker than screen refresh are under no obligation to (and will not) scan-out, thus their status in relation to VBLANK is meaningless.

Unless you implement rate limiting yourself to prevent the CPU from immediately staging the next frame after any call to Present, you need a different paradigm for measuring frame status altogether.

D3D9Ex / DXGI Supports Presentation Statistics in Flip / Fullscreen Exclusive:

Frames do not actually present to a user unless the following APIs say they do:

IDXGISwapChain::GetFrameStatistics (...) and IDXGISwapChain::GetLastPresentCount (...)

You can use frame stats to compute the length of the render queue / present latency in real-time, and your timing goals likely can be satisfied by tracking a present # against the accounting information for successfully sync'd frames.

Kaldaien
  • 56
  • 2
-2

The question here is why? It looks like you want to solve a symptom of your issue; maybe that's a distraction from your real issue. Waiting for vsync was a useful technique on Amiga or DOS. It is totally wrong on any compositing or multithreading OS.

First, what do you want to achieve? Tearing-free rendering is done by setting a swap interval on either D3D or OpenGL. It is harmful to try to do better than the OS there. Just think about cases like multiple monitors or what happens if more than one app tries to sync.

If you are a client to some other process and want to run your timing on VSync, Windows unfortunately offers no object to wait on as far as I know. Your best bet is to still rely on the Present call and estimate what is happening.

There are two cases: You are either rendering (presenting) faster or slower than vsync. If you are faster, Present should block for you already. If present never waits and your time between calls is more than 1/60 sec., you probably want to render less often.

The most common case why people care about VSync is video. You can render a lot faster than vsync but want to wait for just the right time to present. The only thing to do there is to run a few frames as fast as you can and from that estimate you frame timing. Use some jitter and feedback... or use built in hardware video that is happy enough to be kernel friends with the video driver.

Glenn Slayden
  • 14,572
  • 3
  • 90
  • 97
starmole
  • 4,680
  • 21
  • 46
  • 5
    The question is perfectly legit and answering with "why you ask this question?" is rather unnerving. – Lorenzo Pistone Jul 16 '14 at 10:49
  • @LorenzoPistone: perhaps, did you read further than the first couple sentences? Despite the rocky intro and lack of paragraph breaking (I fixed the latter), this answer is a gem, providing a wealth of information on a thorny and obscure D3D issue that really does have no good solutions. – Glenn Slayden Feb 26 '17 at 07:03
  • Answers like Glen's doesn't consider the possibility of beam racing rendering mechanisms -- used in reducing input lag in some virtual reality renderers in the super-latency-critical applications where +20ms can mean the difference between nirvana and puke/nausea. There are critical use cases of needing to know raster positions. Also, some emulators use beam racing now to reduce input lag (google "lagless VSYNC" for more info). – Mark Rejhon Apr 03 '18 at 23:36
  • 1
    @MarkRejhon Did you mean this answer, or did you post your comment in the wrong place? This answer is by 'starmole', I only edited it. – Glenn Slayden May 11 '18 at 23:46
  • No, I posted here correctly. Scan line API's (scanlines are called rasters) are important for niche "beam racing" applications. Such as GetRasterStatus() or D3DKMTGetScanLine(). To understand, see famous book by Atari 2600's "Racing The Beam" -- where graphics are done in realtime as the display is scanned out. It's essentially the most lagless possible way to go from API to photons, and this is now used in certain niche applications such as some VR apps. https://www.wired.com/2009/03/racing-the-beam/ So I was addressing your "The question is why?" when there's a legitimate need. – Mark Rejhon May 12 '18 at 22:55
  • Sometimes VSYNC ON + Present() isn't suitable. Niche techniques needs VSYNC OFF or front-buffer rendering (or even NVIDIA Fast Sync / AMD Enhanced Sync) -- all the above make Present non-blocking. Sometimes there's an alternate need of VBlank or ScanLine is needed, when doing ultralowlag synchronization or ultralowlag beam-racing. ... Direct vblank-event APIs are needed for things like https://www.imgtec.com/blog/reducing-latency-in-vr-by-using-single-buffered-strip-rendering/ ... in ultra-latency-critical applications that also needs to avoid stutter and tearing, but can't use VSYNC ON. – Mark Rejhon May 12 '18 at 23:01
  • @MarkRejhon But what I'm saying is that this is not my answer, so *I am not the person* who asked, "The question is why?" – Glenn Slayden Jan 14 '20 at 03:47