Fixing asynchronous COM bug at application startup
My article “Asynchronous COM for Windows Vista and Win7 — memory overwrite bug” is describing an error that appeared in these operation systems. Reliable work of the asynchronous COM was extremely important for the software that was under development (and for sale as well by this time). Multiple appeals to Microsoft with a request to fix this were fruitless. There was only one option left – correction of the asynchronous COM support error by ourself. It turned out that making changes in ole32.dll code “on the fly” at the start of our applications could be the most effective method for fixing asynchronous COM bug.
I described the technologies that are necessary for this in my articles “Self-modifying programs — applying patch” and “Functions call interception via replacement of header bytes by JMP or CALL instructions”. There’s a problem with correction of CAsyncUnknownMgr::IBegin_QueryMultipleInterfaces function: it is not exported. A “search by template” method was used for locating the function within the ole32.dll image.
Code search by template
The idea of the method is simple. It is necessary to create an array of code bytes (signature), the length of which is sufficient to definitely find the desired section. Those signature bytes that match offsets and may vary depending to base address of the dll image in the memory are marked as not used in comparison.
The following signature was generated for CAsyncUnknownMgr::IBegin_QueryMultipleInterfaces search:
WORD tmpltVistaIBeginMQI[] = { 0x008D, 0x004D, 0x00F8, // lea ecx,[ebp-8] 0x0051, // push ecx 0x0057, // push edi 0x006A, 0x000C, // push 0Ch 0x0033, 0x00C0, // xor eax,eax 0x0050, // push eax 0x0053, // push ebx 0x00E8, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, // call __allmul (766143D3h) 0x0052, // push edx 0x0050, // push eax 0x00E8, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, // call ULongLongToUInt (7661537Ah) 0x003B, 0x00C7, // cmp eax,edi 0x0050, // push eax 0x007D, 0x0006, // jge CAsyncUnknownMgr::IBegin_QueryMultipleInterfaces+0A5h (7668340Ch) 0x008B, 0x004E, 0x0028, // mov ecx,dword ptr [esi+28h] 0x0051, // push ecx 0x00EB, 0x00D6, // jmp CAsyncUnknownMgr::IBegin_QueryMultipleInterfaces+7Bh (766833E2h) 0x0057, // push edi 0x00FF, 0x0035, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, // push dword ptr [g_hHeap (766EE304h)] 0x00FF, 0x0015, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, // call dword ptr [pfnHeapAlloc (766EE8B0h)] }; #define SIZE_OF_BEGINMQI_TEMPLATE \ (sizeof(tmpltVistaIBeginMQI) / sizeof(tmpltVistaIBeginMQI[0]))
Two bytes are used for each code byte that we want to find in the image. The first one contains an instruction code. If the second contains 0xFF, comparison is not used during the search in this position.
The search of the code for correction is implemented by a string:
LPBYTE pFragment = FindCodeByTemplate( PBYTE(mi.lpBaseOfDll) // base address of ole32.dll , mi.SizeOfImage // ole32.dll image size , tmpltVistaIBeginMQI , SIZE_OF_BEGINMQI_TEMPLATE);
FindCodeByTemplate implementation can be found in this sources file, which is available for download.
Fixing the bug
The function code turned to be appropriate for correction. Since the fragment ends with a far call instruction, there is an opportunity to redirect the call to our code with corrections (stub) by means of overwriting offset bytes. After implementation of the memory allocation the correction code can execute a ret instruction, and CAsyncUnknownMgr :: IBegin_QueryMultipleInterfaces will continue to work without noticing anything at all. At the same time the size of the allocated block becomes correct.
The stub code is not complicated (the full version of the code can be downloaded via this link):
IBeginMQIStub PROC C push dword ptr [ebp - 8] // size of block to allocate push dword ptr [esp + 12] // flags push dword ptr [esp + 12] // hHeap call [pfnHeapAlloc] // call allocator ret 12 // return to IBegin_QueryMultipleInterfaces IBeginMQIStub ENDP
And here’s actually the bytes modification code of IBegin_QueryMultipleInterfaces function for transferting control to the IBeginMQIStub:
PVOID *ppfnHeapAlloc = (PVOID *)(pFragment + SIZE_OF_BEGINMQI_TEMPLATE - 4); ppfnHeapAlloc = (PVOID *)*ppfnHeapAlloc; pfnHeapAlloc = *ppfnHeapAlloc; BYTE Code[6]; DWORD offs = DWORD(LPBYTE(&IBeginMQIStub) - (pFragment + SIZE_OF_BEGINMQI_TEMPLATE - 1)); Code[0] = 0xE8; // call near *(DWORD *)&Code[1] = offs; // offs Code[5] = 0x90; // nop WriteProcessMemory( hProcess , pFragment + SIZE_OF_BEGINMQI_TEMPLATE - 6 , Code, sizeof(Code) , NULL );
Conclusion
We have managed to solve the problem quite easily via the methods that had been described in this and previous articles. Sometimes their application is justified. Moreover, sometimes there is no other choice but to do this. However, one shouldn’t overuse the code modification. Errors in stub implementation can be difficult to debug and dangerous for the stability of your applications.