Eilam E.Reversing.Secrets of reverse engineering.2005
.pdf
Breaking Protections 391
see that it points somewhere into NTDLL’s header (the specific value is likely to change with each new update of the operating system). Taking a quick look at the NTDLL headers using DUMPBIN shows you that the address in EAX is the beginning of NTDLL’s export directory. Going to the structure definition for IMAGE_EXPORT_DIRECTORY, you will find that offset +18 is the Number OfFunctions member. Here’s the final preparation of the block size:
00403649 MOV EAX,DWORD PTR [EBP-88]
0040364F MOV ECX,DWORD PTR [EBP-78]
00403652 LEA EAX,DWORD PTR [ECX+EAX*8+8]
The total block size is calculated according to the following formula: BlockSize = NTDLLCodeSize + (TotalExports + 1) * 8. You’re still not sure what Defender is doing here, but you know that it has something to do with NTDLL’s code section and with its export directory.
The function proceeds into another iteration of the NTDLL export list, again computing that strange checksum for each function name. In this loop there are two interesting lines that write into the newly allocated memory block:
0040380F MOV DWORD PTR DS:[ECX+EAX*8],EDX
00403840 MOV DWORD PTR DS:[EDX+ECX*8+4],EAX
The preceding lines are executed for each exported function in NTDLL. They treat the allocated memory block as an array. The first writes the current function’s checksum, and the second writes the exported function’s RVA (Relative Virtual Address) into the same memory address plus 4. This indicates that the newly allocated memory block contains an array of data structures, each 8 bytes long. Offset +0 contains a function name’s checksum, and offset +4 contains its RVA.
The following is the next code sequence that seems to be of interest:
004038FD |
MOV EAX,DWORD PTR [EBP-C8] |
00403903 |
MOV ESI,DWORD PTR [EBP+8] |
00403906 |
ADD ESI,DWORD PTR [EAX+2C] |
00403909 |
MOV EAX,DWORD PTR [EBP-D8] |
0040390F |
MOV EDX,DWORD PTR [EBP-C] |
00403912 |
LEA EDI,DWORD PTR [EDX+EAX*8+8] |
00403916 |
MOV EAX,ECX |
00403918 |
SHR ECX,2 |
0040391B |
REP MOVS DWORD PTR ES:[EDI],DWORD PTR [ESI] |
0040391D |
MOV ECX,EAX |
0040391F |
AND ECX,3 |
00403922 |
REP MOVS BYTE PTR ES:[EDI],BYTE PTR [ESI] |
This sequence performs a memory copy, and is a commonly seen “sentence” in assembly language. The REP MOVS instruction repeatedly copies DWORDs
392Chapter 11
from the address at ESI to the address at EDI until ECX is zero. For each DWORD that is copied ECX is decremented once, and ESI and EDI are both incremented by four (the sequence is copying 32 bits at a time). The second REP MOVS performs a byte-by-byte copying of the last 3 bytes if needed. This is needed only for blocks whose size isn’t 32-bit-aligned.
Let’s see what is being copied in this sequence. ESI is loaded with [EBP+8] which is NTDLL’s base address, and is incremented by the value at [EAX+2C]. Going back a bit you can see that EAX contains that same PE header address you were looking at earlier. If you go back to the PE headers you dumped earlier from WinDbg, you can see that Offset +2c is BaseOf Code. EDI is loaded with an address within your newly allocated memory block, at the point right after the table you’ve just filed. Essentially, this sequence is copying all the code in NTDLL into this memory buffer.
So here’s what you have so far. You have a memory block that is allocated in runtime, with a specific effort being made to put it at a random address. This code contains a table of checksums of the names of all exported functions from NTDLL alongside their RVAs. Right after this table (in the same block) you have a copy of the entire NTDLL code section. Figure 11.15 provides a graphic visualization of this interesting and highly unusual data structure.
Now, if I saw this kind of code in an average application I would probably think that I was witnessing the work of a mad scientist. In a serious copy protection this makes a lot of sense. This is a mechanism that allocates a memory block at a random virtual address and creates what is essentially an obfuscated interface into the operating system module. You’ll soon see just how effective this interface is at interfering with reversing efforts (which one can only assume is the only reason for its existence).
The huge function proceeds into calling another function, at 4030E5. This function starts out with two interesting loops, one of which is:
00403108 CMP ESI,190BC2
0040310E JE SHORT Defender.0040311E
00403110 |
ADD ECX,8 |
00403113 MOV ESI,DWORD PTR [ECX]
00403115 |
CMP ESI,EBX |
00403117 JNZ SHORT Defender.00403108
This loop goes through the export table and compares each string checksum with 190BC2. It is fairly easy to see what is happening here. The code is looking for a specific API in NTDLL. Because it’s not searching by strings but by this checksum you have no idea which API the code is looking for—the API’s name is just not available. Here’s what happens when the entry is found:
0040311E MOV ECX,DWORD PTR [ECX+4]
00403121 |
ADD ECX,EDI |
00403123 MOV DWORD PTR [EBP-C],ECX
Breaking Protections 393
Function Name
Function’s RVA
Checksum
Function Name
Function’s RVA
Checksum
Function Name
Function’s RVA
Checksum
Copy of NTDLL Code Section
Copy of NTDLL Code Section
Figure 11.15 The layout of Defender’s memory copy of NTDLL.
The function is taking the +4 offset of the found entry (remember that offset +4 contains the function’s RVA) and adding to that the address where NTDLL’s code section was copied. Later in the function a call is made into the function at that address. No doubt this is a call into a copied version of an NTDLL API. Here’s what you see at that address:
7D03F0F2 |
MOV EAX,35 |
|
7D03F0F7 |
MOV |
EDX,7FFE0300 |
7D03F0FC |
CALL DWORD PTR [EDX] |
|
7D03F0FE |
RET |
20 |
394 Chapter 11
The code at 7FFE0300 to which this function calls is essentially a call to the NTDLL API KiFastSystemCall, which is just a generic interface for calling into the kernel. Notice that you have this function’s name because even though Defender copied the entire code section, the code explicitly referenced this function by address. Here is the code for KiFastSystemCall—it’s just two lines.
7C90EB8B |
MOV EDX,ESP |
7C90EB8D SYSENTER
Effectively, all KiFastSystemCall does is invoke the SYSENTER instruction. The SYSENTER instruction performs a kernel-mode switch, which means that the program executes a system call. It should be noted that this would all be slightly different under Windows 2000 or older systems, because Microsoft has changed its system calling mechanism after Windows 2000 (in Windows 2000 and older system calls using an INT 2E instruction). Windows XP, Windows Server 2003, and certainly newer operating systems such as the system currently code-named Longhorn all employ the new system call mechanism. If you’re debugging under an older OS and you’re seeing something slightly different at this point, that’s to be expected.
You’re now running into somewhat of a problem. You obviously can’t step into SYSENTER because you’re using a user-mode debugger. This means that it would be very difficult to determine which system call the program is trying to make! You have several options.
■■Switch to a kernel debugger, if one is available, and step into the system call to find out what Defender is doing.
■■Go back to the checksum/RVA table from before and pick up the RVA for the current system call—this would hopefully be the same RVA as in the NTDLL.DLL export directory. You can then do a DUMPBIN on NTDLL and determine which API it is you’re looking at.
■■Find which system call this is by its order in the exports list. The checksum/RVA table has apparently maintained the same order for the exports as in the original NTDLL export directory. Knowing the index of the call being made, you could look at the NTDLL export directory and try to determine which system call this is.
In this case, I think it would be best to go for the kernel debugger option, and I will be using NuMega SoftICE because it is the easiest to install and doesn’t require two computers. If you don’t have a copy of SoftICE and are unable to install WinDbg due to hardware constraints, I’d recommend that you go through one of the other options I’ve suggested. It would probably be easiest to use the function’s RVA. In any case, I’d recommend that you get set
Breaking Protections 395
up with a kernel debugger if you’re serious about reversing—certain reversing scenarios are just undoable without a kernel debugger.
In this case, stepping into SYSENTER in SoftICE bring you into the KiFast CallEntry in NTOSKRNL. This flows right into KiSystemService, which is the generic system call dispatcher in Windows—all system calls go through it. Quickly tracing over most of the function, you get to the CALL EBX instruction near the end. This CALL EBX is where control is transferred to the specific system service that was called. Here, stepping into the function reveals that the program has called NtAllocateVirtualMemory again! You can hit F12 several times to jump back up to user mode and run into the next call from Defender. This is another API call that goes through the bizarre copied NTDLL interface. This time Defender is calling NtCreateThread. You can ignore this new thread for now and keep on stepping through the same function. It immediately returns after creating the new thread.
The sequence that comes right after the call to the thread-creating function again iterates through the checksum table, but this time it’s looking for checksum 006DEF20. Immediately afterward another function is called from the copied NTDLL. You can step into this one as well and will find that it’s a call to NtDelayExecution. In case you’re not familiar with it, NtDelay Execution is the native API equivalent of the Win32 API SleepEx. SleepEx simply relinquishes the CPU for the time period requested. In this case, NtDelayExecution is being called immediately after a thread has been created. It would appear that Defender wants to let the newly created thread start running immediately.
Immediately after NtDelayExecution returns, Defender calls into another (internal) function at 403A41. This address is interesting because this function starts approximately 30 bytes after the place from which it’s called. Also, SoftICE isn’t recognizing any valid instructions after the CALL instruction until the beginning of the function itself. It almost looks like Defender is skipping a little chunk of data that’s sitting right in the middle of the function! Indeed, dumping 4039FA, the address that immediately follows the CALL instruction reveals the following:
004039FA K.E.R.N.E.L.3.2...D.L.L.
So, it looks like the Unicode string KERNEL32.DLL is sitting right in the middle of this function. Apparently all the CALL instruction is doing is just skipping over this string to make sure the processor doesn’t try to “execute” it. The code after the string again searches through our table, looking for two values: 6DEF20 and 1974C. You may recall that 6DEF20 is the name checksum for NtDelayExecution. We’re not sure which API is represented by 1974C—we’ll soon find out.
396 Chapter 11
SoftICE’s Disappearance
The first call being made in this sequence is again to NtDelayExecution, but here you run into a little problem. When we hit F10 to step over the call to NtDelayExecution SoftICE just disappears! When you look at the Command Prompt window, you see that Defender has just exited and that it hasn’t printed any of its messages. It looks like SoftICE’s presence has somehow altered Defender’s behavior.
Seeing how the program was calling into NtDelayExecution when it unexpectedly disappeared, you can only make one assumption. The thread that was created earlier must be doing something, and by relinquishing the CPU Defender is probably trying to get the other thread to run. It looks like you must shift your reversing efforts to this thread to see what it’s trying to do.
Reversing the Secondary Thread
Let’s go back to the thread creation code in the initialization routine to find out what code is being executed by this thread. Before attempting this, you must learn a bit on how NtCreateThread works. Unlike CreateThread, the equivalent Win32 API, NtCreateThread is a rather low-level function. Instead of just taking an lpStartAddress parameter as CreateThread does, NtCreateThread takes a CONTEXT data structure that accurately defines the thread’s state when it first starts running.
A CONTEXT data structure contains full-blown thread state information. This includes the contents of all CPU registers, including the instruction pointer. To tell a newly created thread what to do, Defender will need to initialize the CONTEXT data structure and set the EIP member to the thread’s entry point. Other than the instruction pointer, Defender must also manually allocate a stack space for the thread and set the ESP register in the CONTEXT structure to point to the beginning of the newly created thread’s stack space (this explains the NtAllocateVirtualMemory call that immediately preceded the call to NtCreateThread). This long sequence just gives you an idea on how much effort is saved by calling the Win32 CreateThread API.
In the case of this thread creation, you need to find the place in the code where Defender is setting the Eip member in the CONTEXT data structure. Taking a look at the prototype definition for NtCreateThread, you can see that the CONTEXT data structure is passed as the sixth parameter. The function is passing the address [EBP-310] as the sixth parameter, so one can only assume that this is the address where CONTEXT starts. From looking at the definition of CONTEXT in WinDbg, you can see that the Eip member is at offset +b8. So, you know that the thread routine should be copied into [EBP-258] (310 – b8 = 258). The following line seems to be what you’re looking for:
MOV DWORD PTR SS:[EBP-258],Defender.00402EEF
Breaking Protections 397
Looking at the address 402EEF, you can see that it indeed contains code. This must be our thread routine. A quick glance shows that this function contains the exact same prologue as the previous function you studied in Listing 11.7, indicating that this function is also encrypted. Let’s restart the program and place a breakpoint on this function (there is no need for a kernel-mode debugger for this part). The best position for your breakpoint is at 402FF4, right before the decrypter starts executing the decrypted code. Once you get there, you can take a look at the decrypted thread procedure code. It is quite interesting, so I’ve included it in its entirety (see Listing 11.8).
00402FFE |
XOR EAX,EAX |
|
|
00403000 |
INC EAX |
|
|
00403001 |
JE Defender.004030C7 |
||
00403007 |
RDTSC |
|
|
00403009 |
MOV DWORD |
PTR |
SS:[EBP-8],EAX |
0040300C |
MOV DWORD |
PTR |
SS:[EBP-4],EDX |
0040300F |
MOV EAX,DWORD |
PTR DS:[406000] |
|
00403014 |
MOV DWORD |
PTR |
SS:[EBP-50],EAX |
00403017 |
MOV EAX,DWORD |
PTR SS:[EBP-50] |
|
0040301A |
CMP DWORD |
PTR |
DS:[EAX],0 |
0040301D |
JE SHORT Defender.00403046 |
||
0040301F |
MOV EAX,DWORD |
PTR SS:[EBP-50] |
|
00403022 |
CMP DWORD |
PTR |
DS:[EAX],6DEF20 |
00403028 |
JNZ SHORT |
Defender.0040303B |
|
0040302A |
MOV EAX,DWORD |
PTR SS:[EBP-50] |
|
0040302D |
MOV ECX,DWORD |
PTR DS:[40601C] |
|
00403033 |
ADD ECX,DWORD |
PTR DS:[EAX+4] |
|
00403036 |
MOV DWORD |
PTR |
SS:[EBP-44],ECX |
00403039 |
JMP SHORT |
Defender.0040304A |
|
0040303B |
MOV EAX,DWORD |
PTR SS:[EBP-50] |
|
0040303E |
ADD EAX,8 |
|
|
00403041 |
MOV DWORD |
PTR |
SS:[EBP-50],EAX |
00403044 |
JMP SHORT |
Defender.00403017 |
|
00403046 |
AND DWORD |
PTR |
SS:[EBP-44],0 |
0040304A |
AND DWORD |
PTR |
SS:[EBP-4C],0 |
0040304E |
AND DWORD |
PTR |
SS:[EBP-48],0 |
00403052 |
LEA EAX,DWORD |
PTR SS:[EBP-4C] |
|
00403055 |
PUSH EAX |
|
|
00403056 |
PUSH 0 |
|
|
00403058 |
CALL DWORD PTR SS:[EBP-44] |
||
0040305B |
RDTSC |
|
|
0040305D |
MOV DWORD |
PTR |
SS:[EBP-18],EAX |
00403060 |
MOV DWORD |
PTR |
SS:[EBP-14],EDX |
00403063 |
MOV EAX,DWORD |
PTR SS:[EBP-18] |
|
00403066 |
SUB EAX,DWORD |
PTR SS:[EBP-8] |
|
00403069 |
MOV ECX,DWORD |
PTR SS:[EBP-14] |
|
0040306C |
SBB ECX,DWORD |
PTR SS:[EBP-4] |
|
|
|
|
|
Listing 11.8 Disassembly of the function at address 00402FFE in Defender. (continued)
398 Chapter 11
0040306F |
MOV DWORD |
PTR |
SS:[EBP-60],EAX |
00403072 |
MOV DWORD |
PTR |
SS:[EBP-5C],ECX |
00403075 |
JNZ SHORT |
Defender.00403080 |
|
00403077 |
CMP DWORD |
PTR |
SS:[EBP-60],77359400 |
0040307E |
JBE SHORT |
Defender.004030C2 |
|
00403080 |
MOV EAX,DWORD |
PTR DS:[406000] |
|
00403085 |
MOV DWORD |
PTR |
SS:[EBP-58],EAX |
00403088 |
MOV EAX,DWORD |
PTR SS:[EBP-58] |
|
0040308B |
CMP DWORD |
PTR |
DS:[EAX],0 |
0040308E |
JE SHORT Defender.004030B7 |
||
00403090 |
MOV EAX,DWORD |
PTR SS:[EBP-58] |
|
00403093 |
CMP DWORD |
PTR |
DS:[EAX],1BF08AE |
00403099 |
JNZ SHORT |
Defender.004030AC |
|
0040309B |
MOV EAX,DWORD |
PTR SS:[EBP-58] |
|
0040309E |
MOV ECX,DWORD |
PTR DS:[40601C] |
|
004030A4 |
ADD ECX,DWORD |
PTR DS:[EAX+4] |
|
004030A7 |
MOV DWORD |
PTR |
SS:[EBP-54],ECX |
004030AA |
JMP SHORT |
Defender.004030BB |
|
004030AC |
MOV EAX,DWORD |
PTR SS:[EBP-58] |
|
004030AF |
ADD EAX,8 |
|
|
004030B2 |
MOV DWORD |
PTR |
SS:[EBP-58],EAX |
004030B5 |
JMP SHORT |
Defender.00403088 |
|
004030B7 |
AND DWORD |
PTR |
SS:[EBP-54],0 |
004030BB |
PUSH 0 |
|
|
004030BD |
PUSH -1 |
|
|
004030BF |
CALL DWORD PTR SS:[EBP-54] |
||
004030C2 |
JMP Defender.00402FFE |
||
|
|
|
|
Listing 11.8 |
(continued) |
|
|
This is an interesting function that appears to run an infinite loop (notice the JMP at 4030C2 to 402FFE, and how the code at 00403001 sets EAX to 1 and then checks if its zero). The function starts with an RDTSC and stores the timestamp counter at [EBP-8]. You can then proceed to search through your good old copied NTDLL table, again for the highly popular 6DEF20—you already know that this is NtDelayExecution. The function calls NtDelayExecution with the second parameter pointing to 8 bytes that are all filled with zeros. This is important because the second parameter in NtDelayExecution is the delay interval (it’s a 64-bit value). Setting it to zero means that all the function does is it relinquishes the CPU. The thread will continue running as soon as all the other threads have relinquished the CPU or have used up the CPU time allocated to them.
As soon as NtDelayExecution returns the function invokes RDTSC again. This time the output from RDTSC is stored in [EBP-18]. You can then enter a 64-bit subtraction sequence in 00403063. First, the low 32-bit words are subtracted from one another, and then the high 32-bit words are subtracted from
Breaking Protections 399
one another using SBB (subtract with borrow). SBB subtracts the two integers and treats the carry flag (CF) as a borrow indicator in case the first subtraction generated a borrow. For more information on 64-bit arithmetic refer to the section on 64-bit arithmetic in Appendix B.
The result of the subtraction is compared to 77359400. If it is below, the function just loops back to the beginning. If not (or if the SBB instruction produces a nonzero result, indicating that the high part has changed), the function goes through another exported function search, this time looking for a function whose string checksum is 1BF08AE, and then calls this API. You’re not sure which API this is at this point, but stepping over this code is very insightful. It turns out that when you step through this code the check almost always fails (whether this is true or not depends on how fast your CPU is and how quickly you step through the code). Once you get to that API call, stepping into it in SoftICE you see that the program is calling NtTerminateProcess.
At this point, you’re starting to get a clear picture of what our thread is all about. It is essentially a timing monitor that is meant to detect whether the process is being “paused” and simply terminate it on the spot if it is. For this, Defender is utilizing the RDTSC instruction and is just checking for a reasonable number of ticks. If between the two invocations of RDTSC too much time has passed (in this case too much time means 77359400 clock ticks or 2 billion clock ticks in decimal), the process is terminated using a direct call to the kernel.
Defeating the “Killer” Thread
It is going to be effectively impossible to debug Defender while this thread is running, because the thread will terminate the process whenever it senses that a debugger has stalled the process. To continue with the cracking process, you must neutralize this thread. One way to do this is to just avoid calling the thread creation function, but a simpler way is to just patch the function in memory (after it is decoded) so that it never calls NtTerminateProcess. You do this by making two changes in the code. First, you replace the JNZ at 00403075 with NOPs (this check confirms that the result of the subtraction is 0 in the high-order word). Then you replace the JNZ at address 0040307E with a JMP, so that the final code looks like the following:
00403075 NOP
00403076 NOP
00403077 CMP DWORD PTR SS:[EBP-60],77359400
0040307E JMP SHORT Defender.004030C2
This means that the function never calls NtTerminateProcess, regardless of the time that passes between the two invocations of RDTSC. Note that applying this patch to the executable so that you don’t have to reapply it every time you launch the program is somewhat more difficult because this function is
400Chapter 11
encrypted—you must either modify the encrypted data or eliminate the encryption altogether. Neither of these options is particularly easy, so for now you’ll just reapply the patch in memory each time you launch the program.
Loading KERNEL32.DLL
You might remember that before taking this little detour to deal with that RDTSC thread you were looking at a KERNEL32.DLL string right in the middle of the code. Let’s find out what is done with this string.
Immediately after the string appears in the code the program is retrieving pointers for two NTDLL functions, one with a checksum of 1974C, and another with the familiar 6DEF20 (the checksum for NtDelayExecution). The code first calls NtDelayExecution and then the other function. In stepping into the second function in SoftICE, you see a somewhat more confusing picture. This API isn’t just another direct call down into the kernel, but instead it looks like this API is actually implemented in NTDLL, which means that it’s now implemented inside your copied code. This makes it much more difficult to determine which API this is.
The approach you’re going to take is one that I’ve already proposed earlier in this discussion as a way to determine which API is being called through the obfuscated interface. The idea is that when the checksum/RVA table was initialized, APIs were copied into the table in the order in which they were read from NTDLL’s export directory. What you can do now is determine the entry number in the checksum/RVA table once an API is found using its checksum. This number should also be a valid index into NTDLL’s export directory and will hopefully reveal exactly which API you’re dealing with.
To do this, you must but a breakpoint right after Defender finds this API (remember, it’s looking for 1973C in the table). Once your breakpoint hits you subtract the pointer to the beginning of the table from the pointer to the current entry, and divide the result by 8 (the size of each entry). This gives you the API’s index in the table. You can now use DUMPBIN or a similar tool to dump NTDLL’s export table and look for an API that has your index. In this case, the index you get is 0x3E (for example, when I was doing this the table started at 53830000 and the entry was at 538301F0, but you already know that these are randomly chosen addresses). A quick look at the export list for NTDLL.DLL from DUMPBIN provides you with your answer.
ordinal hint RVA |
name |
|
. |
|
|
. |
|
|
70 |
3E 000161CA |
LdrLoadDll |
The API being called is LdrLoadDll, which is the native API equivalent of LoadLibrary. You already know which DLL is being loaded because you saw the string earlier: KERNEL32.DLL.
