Fixing asynchronous COM bug at application startup

Fixing asynchronous COM bug at application startupMy article “Asynchronous COM for Windows Vista and Win7 — memory overwrite bug” is describing an error that appeared in these operation systems. Reliable work of the asynchronous COM was extremely important for the software that was under development (and for sale as well by this time). Multiple appeals to Microsoft with a request to fix this were fruitless. There was only one option left – correction of the asynchronous COM support error by ourself. It turned out that making changes in ole32.dll code “on the fly” at the start of our applications could be the most effective method for fixing asynchronous COM bug.

I described the technologies that are necessary for this in my articles “Self-modifying programs — applying patch” and “Functions call interception via replacement of header bytes by JMP or CALL instructions”. There’s a problem with correction of CAsyncUnknownMgr::IBegin_QueryMultipleInterfaces function: it is not exported. A “search by template” method was used for locating the function within the ole32.dll image.

Code search by template

The idea of the method is simple. It is necessary to create an array of code bytes (signature), the length of which is sufficient to definitely find the desired section. Those signature bytes that match offsets and may vary depending to base address of the dll image in the memory are marked as not used in comparison.

The following signature was generated for CAsyncUnknownMgr::IBegin_QueryMultipleInterfaces search:

WORD tmpltVistaIBeginMQI[] =
{
    0x008D, 0x004D, 0x00F8,                         // lea         ecx,[ebp-8]
    0x0051,                                         // push        ecx
    0x0057,                                         // push        edi
    0x006A, 0x000C,                                 // push        0Ch
    0x0033, 0x00C0,                                 // xor         eax,eax
    0x0050,                                         // push        eax
    0x0053,                                         // push        ebx
    0x00E8, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF,         // call        __allmul (766143D3h)
    0x0052,                                         // push        edx
    0x0050,                                         // push        eax
    0x00E8, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF,         // call        ULongLongToUInt (7661537Ah)
    0x003B, 0x00C7,                                 // cmp         eax,edi
    0x0050,                                         // push        eax
    0x007D, 0x0006,                                 // jge         CAsyncUnknownMgr::IBegin_QueryMultipleInterfaces+0A5h (7668340Ch)
    0x008B, 0x004E, 0x0028,                         // mov         ecx,dword ptr [esi+28h]
    0x0051,                                         // push        ecx
    0x00EB, 0x00D6,                                 // jmp         CAsyncUnknownMgr::IBegin_QueryMultipleInterfaces+7Bh (766833E2h)
    0x0057,                                         // push        edi
    0x00FF, 0x0035, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, // push        dword ptr [g_hHeap (766EE304h)]
    0x00FF, 0x0015, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, // call        dword ptr [pfnHeapAlloc (766EE8B0h)]
};

#define SIZE_OF_BEGINMQI_TEMPLATE \
 (sizeof(tmpltVistaIBeginMQI) / sizeof(tmpltVistaIBeginMQI[0]))

Two bytes are used for each code byte that we want to find in the image. The first one contains an instruction code. If the second contains 0xFF, comparison is not used during the search in this position.

The search of the code for correction is implemented by a string:

LPBYTE pFragment = 
  FindCodeByTemplate(
    PBYTE(mi.lpBaseOfDll) // base address of ole32.dll
    , mi.SizeOfImage // ole32.dll image size
    , tmpltVistaIBeginMQI
    , SIZE_OF_BEGINMQI_TEMPLATE);

FindCodeByTemplate implementation can be found in this sources file, which is available for download.

Fixing the bug

The function code turned to be appropriate for correction. Since the fragment ends with a far call instruction, there is an opportunity to redirect the call to our code with corrections (stub) by means of overwriting offset bytes. After implementation of the memory allocation the correction code can execute a ret instruction, and CAsyncUnknownMgr :: IBegin_QueryMultipleInterfaces will continue to work without noticing anything at all. At the same time the size of the allocated block becomes correct.

The stub code is not complicated (the full version of the code can be downloaded via this link):

IBeginMQIStub PROC C
        push    dword ptr [ebp - 8]   // size of block to allocate
        push    dword ptr [esp + 12]  // flags
        push    dword ptr [esp + 12]  // hHeap
        call    [pfnHeapAlloc]        // call allocator
        ret     12                    // return to IBegin_QueryMultipleInterfaces
IBeginMQIStub ENDP

And here’s actually the bytes modification code of IBegin_QueryMultipleInterfaces function for transferting control to  the IBeginMQIStub:

PVOID *ppfnHeapAlloc = (PVOID *)(pFragment + SIZE_OF_BEGINMQI_TEMPLATE - 4);
ppfnHeapAlloc = (PVOID *)*ppfnHeapAlloc;
pfnHeapAlloc = *ppfnHeapAlloc;

BYTE  Code[6];
DWORD offs = 
  DWORD(LPBYTE(&IBeginMQIStub) - 
  (pFragment + SIZE_OF_BEGINMQI_TEMPLATE - 1));

Code[0] = 0xE8;            // call near
*(DWORD *)&Code[1] = offs; // offs
Code[5] = 0x90;            // nop

WriteProcessMemory(
  hProcess
  , pFragment + SIZE_OF_BEGINMQI_TEMPLATE - 6
  , Code, sizeof(Code)
  , NULL
);

Conclusion

We have managed to solve the problem quite easily via the methods that had been described in this and previous articles. Sometimes their application is justified. Moreover, sometimes there is no other choice but to do this. However, one shouldn’t overuse the code modification. Errors in stub implementation can be difficult to debug and dangerous for the stability of your applications.

Tags:, ,

Leave a Reply