12

We are running into issues with an old closed-source game engine failing to compile shaders when memory nears 2GB.

The issue is usually with D3DXCreateEffect. Usually it returns HResult "out of memory", sometimes d3dx9_25.dll prints random errors in a popup, or it just outright segfault.

I believe the issue is lack of Large Address Awareness: I noticed one of the d3dx9_25.dll crashes doing something that would hint as such. It took a valid pointer that looked like 0x8xxxxxx3, checked that bits 0x80000003 are lit and if yes, it bit inverts the pointer and derefs it. The resulting pointer pointed to unallocated memory. Forcing the engine to malloc 2GB before compilation makes the shaders fail to compile every time.

Unfortunately our knowledge of DX9 is very limited, I've seen that DX9 has a flag D3DXCONSTTABLE_LARGEADDRESSAWARE but I'm not sure where exactly Its supposed to go. The only API call the game uses that I can find relies on it is D3DXGetShaderConstantTable, but the issues happen before it is ever called. Injecting the flag (1 << 17) = 0x20000 to D3DXCreateEffect makes the shader fail compilation in another way.

  1. Is D3DXCreateEffect supposed to accept the Large Address Aware flag? I found a wine test using it, but digging into DX9 assembly, the error it throws is caused by an internal function returning HResult Invalid Call when any bit out of FFFFF800 in flags is set, which leads me to believe CreateEffect is not supposed to accept this flag.

  2. Is there anywhere else I should be injecting the Large Address Aware flag before this? I understand that a call to D3DXGetShaderConstantTable will need to be fixed to use D3DXGetShaderConstantTableEx, but its not even reached yet.

user1108591
  • 130
  • 6
  • I believe this question is apt, the only issue is that your problem seems to be one that can be summarized in a few sentences, try to ask about DX9's large address aware practices before including so much unnecessary detail. – Matias Chara Oct 12 '20 at 13:45
  • Is migrating to x86-64 not plausible? You'll probably also need to update your DX9 version to the latest DX9.0 build. – Mgetz Oct 12 '20 at 14:13
  • The engine is closed-source, we rely on a lot of assembly/hooking to fix such "deep" bugs. We've been using it for over a decade and the lack of memory is starting to become an issue with the amount of content we added. – user1108591 Oct 12 '20 at 14:15
  • 3
    Have you considered migrating? It sounds like you're at the end of what the engine can reasonablly support. – Mgetz Oct 12 '20 at 14:17
  • Not an option, Its a volunteer project with no actual funding. – user1108591 Oct 12 '20 at 14:19
  • If the game engine is programmed with future technologies in mind, it should be a relatively simple task to migrate to a newer API like DirectX 11 or 12. – AStopher Oct 12 '20 at 14:24
  • 1
    I think you've got it the wrong way around. Windows plays it safe-by-default and will not pass a pointer >2GB to old apps. That way, old apps which pull tricks like that pointer negation will continue to work. "Large Address Aware" is a flag to tell Windows "I'm not doing anything weird, I can handle >2GB". The fact that you can allocate 2GB means that your app claims it's LAA. – MSalters Oct 12 '20 at 14:28
  • 1
    Also, I think you may overlook the "3" in `0x80000003`. That hints at an unaligned pointer. Negating it won't make it aligned, but inverting all bits does. – MSalters Oct 12 '20 at 14:31
  • 4
    x86 `neg` is 2's complement negation (subtract from `0`), C unary `-`. x86 `not` is 1's complement negation, flip all bits, C `~`. We call that bitwise NOT, not negation, to distinguish from mathematical / 2's complement negation. – Peter Cordes Oct 12 '20 at 14:40
  • 1
    You're right, I remembered it wrongly. I checked again and the the instruction is indeed ``not``. – user1108591 Oct 12 '20 at 14:47
  • If I remember correctly `ps_1_1` is shader in assembly language. Is that shader written in asm? If it's written in HLSL then you might want to use either `ps_2_0` or `ps_3_0` – Asesh Oct 12 '20 at 14:54
  • The target function is in HLSL/ These shaders have not been working great for over 10 years. The thing special about this function is that its the first function in the file to use some preprocessor definitions (``#if NVIDIA``), the source of this definition comes from the engine itself, likely with the ``CreateEffect`` parameter of [this](https://docs.microsoft.com/en-us/windows/win32/direct3d9/d3dxmacro) type. Nothing points to it not working with LargeAddressAware flag, and this breaks without nearing even 500mb. – user1108591 Oct 12 '20 at 15:04

2 Answers2

6

LargeAddressAware is a bit of a hack, so it may or may not help your case. It really only helps if your application needs a little more room close to 2GB of VA, not if if needs a lot more.

A key problem with the legacy DirectX SDK Direct3D 9 era effects system is that it assumed the high-bit of the effect "handle" was free so it could use it, and without the bit the handle was an address to a string. This assumption is not true for LargeAddressAware.

To enable this, you define D3DXFX_LARGEADDRESS_HANDLE before including d3dx9.h headers. You then must use the D3DXFX_LARGEADDRESSAWARE flag when creating all effects. You must also not use the alias trick where you can use a "string name" instead of a "handle" on all the effect methods. Instead you have to use GetParameterByName to get the handle and use that instead.

What I can't remember is when the LAA flag was added to Effects for Direct3D 9.

If you are using d3dx9_25.dll then that's the April 2005 release of the DirectX SDK. If you are using "Pixel Shader Model 1.x" then you can't use any version newer than d3dx9_31.dll (October 2006)--later versions of the DirectX SDK let you use D3DXSHADER_USE_LEGACY_D3DX9_31_DLL which just passed through shader compilation to the older version for this scenario.

A key reason that many 32-bit games would fail and then work with LAA enabled was because of virtual memory fragmentation. Improving your VA memory layout can making your allocations more uniform can help too.

Chuck Walbourn
  • 28,931
  • 1
  • 45
  • 72
  • 1
    LargeAddressAware will give you about 4GB on x64, which is pretty common these days. – MSalters Oct 12 '20 at 21:24
  • 1
    Yes, I know that LAA will let you allocate up to 4 GB of VA on Windows x64--FWIW I spent most of 2004-2008 helping gamedevs do that. That doesn't mean that most of those legacy 32-bit games run stably anywhere near that. LAA on Windows x64 helped a lot of games that were stuck in 32-bit and just needed to not crash when they peaked out over ~1.7 GB. – Chuck Walbourn Oct 12 '20 at 21:27
  • As far as I understood from the header, ``D3DXFX_LARGEADDRESS_HANDLE`` only helps you make sure in compile time that you're not using the alias trick, correct? – user1108591 Oct 12 '20 at 22:08
  • Yes, the ``D3DXFX_LARGEADDRESS_HANDLE`` is to enforce the behavior at compile-time. – Chuck Walbourn Oct 16 '20 at 19:09
1

The issue we were having with CreateEffect not accepting the LargeAddressAware flag is pretty obvious in hindsight, the dx9 version the engine is using (d3dx9_25.dll) simply did not have this feature yet.

Our options, other than optimizing our memory usage are:

  1. Convert all our pixel shaders 1.x to 2.0 and force the engine to load a newer version of d3dx9, hope the engine is not relying on bugs of d3dx9_25.dll or the alias trick, then inject the LargeAddressAware flag bit there.

  2. Wrap malloc, either avoiding giving handles large addresses (I am unsure if this is also required inside the dll as well) or stick enough other data in large addresses so dx9 related mallocs don't reach it.

user1108591
  • 130
  • 6
  • 1
    The third option would be to find a game engine that isn't obsolete and port everything over; which probably sounds like a lot of work initially; but could improve the game a lot (performance and features), and could save time in the long run (because really, once you find a quick work-around for this problem you're just going to run into another problem that you won't be able to fix so easily). – Brendan Oct 15 '20 at 17:01
  • This isn't a commercial project. Porting over 15 years worth of content, made by *literally hundreds* of volunteers by a handful that's left (out of them, maybe three software people) isn't an option for us. – user1108591 Oct 15 '20 at 22:22
  • Would it be possible to leave the low 2GiB free for memory dx9 needs to use, by hinting allocation of other stuff into the high 2GiB of virtual address space? That could help if you have any large allocations that in your process that doesn't need DX9-compatible addresses. (If Windows allocators have any way to do that; on Linux you'd use `mmap(0x8000000, ... MAP_ANONYMOUS)` *without* MAP_FIXED. But the hint address would have to be managed manually for each allocation, there's no "anywhere in high half" hint. – Peter Cordes Oct 16 '20 at 02:53
  • Yes, we already had a wrapper for malloc and I do have a working POC that solves the issue. At this moment the problem only effects a very small amount of people (likely people with unusual drivers/anti-virus that take additional RAM), so we're checking if we can trick the engine to load things more dynamically first, as that's a less hacky solution. – user1108591 Oct 16 '20 at 10:34