Eilam E.Reversing.Secrets of reverse engineering.2005
.pdf
Antireversing Techniques 351
This transformation can be improved upon in several different ways, depending on how much performance and code size you’re willing to give up. In a native code environment such as IA-32 assembly language, it might be beneficial to add some kind of disassembler-confusion macros such as the ones described earlier in this chapter. If made reasonably polymorphic, such macros would not be trivial to remove, and would really complicate the reversing process for this kind of a function. That’s because these macros would prevent reversers from being able to generate a full listing of the obfuscated at any given moment. Reversing a table interpretation function such as the one in Listing 10.3 without having a full view of the entire function is undoubtedly an unpleasant reversing task.
Other than the confusion macros, another powerful enhancement for the obfuscation of the preceding function would be to add an additional lookup table, as is demonstrated in Listing 10.4.
00401040 |
sub |
esp,28h |
|
00401043 |
mov |
edx,dword |
ptr [esp+2Ch] |
00401047 |
push |
ebx |
|
00401048 |
push |
ebp |
|
00401049 |
mov |
ebp,dword |
ptr [esp+38h] |
0040104D |
push |
esi |
|
0040104E |
push |
edi |
|
0040104F |
mov |
edi,dword |
ptr [esp+10h] |
00401053 |
xor |
eax,eax |
|
00401055 |
xor |
ebx,ebx |
|
00401057 |
mov |
dword ptr |
[esp+14h],1 |
0040105F |
mov |
dword ptr |
[esp+18h],8 |
00401067 |
mov |
dword ptr |
[esp+1Ch],4 |
0040106F |
mov |
dword ptr |
[esp+20h],6 |
00401077 |
mov |
dword ptr |
[esp+24h],2 |
0040107F |
mov |
dword ptr |
[esp+28h],9 |
00401087 |
mov |
dword ptr |
[esp+2Ch],3 |
0040108F |
mov |
dword ptr |
[esp+30h],7 |
00401097 |
mov |
dword ptr |
[esp+34h],5 |
0040109F |
lea |
ecx,[esp+14h] |
|
004010A3 |
mov |
esi,dword |
ptr [ecx] |
004010A5 |
add |
esi,0FFFFFFFFh |
|
004010A8 |
cmp |
esi,8 |
|
004010AB |
ja |
004010A3 |
|
004010AD |
jmp |
dword ptr |
[esi*4+401100h] |
004010B4 |
xor |
dword ptr |
[edx],ebx |
004010B6 |
add |
ecx,18h |
|
004010B9 |
jmp |
004010A3 |
|
004010BB |
mov |
edi,dword |
ptr [edx] |
004010BD |
add |
ecx,8 |
|
004010C0 |
jmp |
004010A3 |
|
|
|
|
|
Listing 10.4 The data-processing function from Listing 10.2 transformed using an arraybased version of the table interpretation obfuscation method. (continued)
352 Chapter 10
004010C2 |
cmp |
ebp,3 |
|
|
004010C5 |
ja |
004010E8 |
|
|
004010C7 |
add |
ecx,14h |
|
|
004010CA |
jmp |
004010A3 |
|
|
004010CC |
mov |
ebx,edi |
|
|
004010CE |
sub |
ecx,14h |
|
|
004010D1 |
jmp |
004010A3 |
|
|
004010D3 |
sub |
ebp,4 |
|
|
004010D6 |
sub |
ecx,4 |
|
|
004010D9 |
jmp |
004010A3 |
|
|
004010DB |
mov |
esi,dword ptr [esp+44h] |
||
004010DF |
xor |
dword ptr [edx],esi |
||
004010E1 |
sub |
ecx,10h |
|
|
004010E4 |
jmp |
004010A3 |
|
|
004010E6 |
xor |
eax,edi |
|
|
004010E8 |
add |
ecx,10h |
|
|
004010EB |
jmp |
004010A3 |
|
|
004010ED |
add |
edx,4 |
|
|
004010F0 |
sub |
ecx,18h |
|
|
004010F3 |
jmp |
004010A3 |
|
|
004010F5 |
pop |
edi |
|
|
004010F6 |
pop |
esi |
|
|
004010F7 |
pop |
ebp |
|
|
004010F8 |
pop |
ebx |
|
|
004010F9 |
add |
esp,28h |
|
|
004010FC |
ret |
|
|
|
The function’s jump table: |
|
|
||
0x00401100 |
004010c2 |
004010bb |
004010db |
004010b4 |
0x00401110 |
004010cc |
004010e6 |
004010ed |
004010d3 |
0x00401120 |
004010f5 |
|
|
|
|
|
|
|
|
Listing 10.4 (continued)
The function in Listing 10.4 is an enhanced version of the function from Listing 10.3. Instead of using direct indexes into the jump table, this implementation uses an additional table that is filled in runtime. This table contains the actual jump table indexes, and the index into that table is handled by the program in order to obtain the correct flow of the code. This enhancement makes this function significantly more unreadable to human reversers, and would also seriously complicate matters for a deobfuscator because it would require some serious data-flow analysis to determine the current value of the index to the array.
The original implementation in [Wang] is more focused on preventing static analysis of the code by deobfuscators. The approach chosen in that study is to use pointer aliases as a means of confusing automated deobfuscators. Pointer aliases are simply multiple pointers that point to the same memory location. Aliases significantly complicate any kind of data-flow analysis process
Antireversing Techniques 353
because the analyzer must determine how memory modifications performed through one pointer would affect the data accessed using other pointers that point to the same memory location. In this case, the idea is to create several pointers that point to the array of indexes and have to write to several locations within at several stages. It would be borderline impossible for an automated deobfuscator to predict in advance the state of the array, and without knowing the exact contents of the array it would not be possible to properly analyze the code.
In a brief performance comparison I conducted, I measured a huge runtime difference between the original function and the function from Listing 10.4: The obfuscated function from Listing 10.4 was about 3.8 times slower than the original unobfuscated function in Listing 10.2. Scattering 11 copies of the OBFUSCATE macro increased this number to about 12, which means that the heavily obfuscated version runs about 12 times slower than its unobfuscated counterpart! Whether this kind of extreme obfuscation is worth it depends on how concerned you are about your program being reversed, and how concerned you are with the runtime performance of the particular function being obfuscated. Remember that there’s usually no reason to obfuscate the entire program, only the parts that are particularly sensitive or important. In this particular situation, I think I would stick to the array-based approach from Listing 10.4—the OBFUSCATE macros wouldn’t be worth the huge performance penalty they incur.
Inlining and Outlining
Inlining is a well-known compiler optimization technique where functions are duplicated to any place in the program that calls them. Instead of having all callers call into a single copy of the function, the compiler replaces every call into the function with an actual in-place copy of it. This improves runtime performance because the overhead of calling a function is completely eliminated, at the cost of significantly bloating the size of the program (because functions are duplicated). In the context of obfuscating transformations, inlining is a powerful tool because it eliminates the internal abstractions created by the software developer. Reversers have no information on which parts of a certain function are actually just inlined functions that might be called from numerous places throughout the program.
One interesting enhancement suggested in [Collberg3] is to combine inlining with outlining in order to create a highly potent transformation. Outlining means that you take a certain code sequence that belongs in one function and create a new function that contains just that sequence. In other words it is the exact opposite of inlining. As an obfuscation tool, outlining becomes effective when you take a random piece of code and create a dedicated function for it. When done repetitively, such a process can really add to the confusion factor experienced by a human reverser.
354 Chapter 10
Interleaving Code
Code interleaving is a reasonably effective obfuscation technique that is highly potent, yet can be quite costly in terms of execution speed and code size. The basic concept is quite simple: You take two or more functions and interleave their implementations so that they become exceedingly difficult to read.
Function1()
{
Function1_Segment1;
Function1_Segment2;
Function1_Segment3;
}
Function2()
{
Function2_Segment1;
Function2_Segment2;
Function2_Segment3;
}
Function3()
{
Function3_Segment1;
Function3_Segment2;
Function3_Segment3;
}
Here is what these three functions would look like in memory after they are interleaved.
Function1_Segment3; |
|
End of Function1 |
|
Function1_Segment1; (This is |
the Function1 entry-point) |
Opaque Predicate -> Always |
jumps to Function1_Segment2 |
Function3_Segment2; |
|
Opaque Predicate -> Always |
jumps to Segment3 |
Function3_Segment1; (This is |
the Function3 entry-point) |
Opaque Predicate -> Always |
jumps to Function3_Segment2 |
Function2_Segment2; |
|
Opaque Predicate -> Always |
jumps to Function2_Segment3 |
Function1_Segment2; |
|
Opaque Predicate -> Always |
jumps to Function1_Segment3 |
Function2_Segment3; |
|
End of Function2 |
|
Function3_Segment3; |
|
End of Function3 |
|
Function2_Segment1; (This is |
the Function2 entry-point) |
Opaque Predicate -> Always |
jumps to Function2_Segment2 |
Antireversing Techniques 355
Notice how each function segment is followed by an opaque predicate that jumps to the next segment. You could theoretically use an unconditional jump in that position, but that would make automated deobfuscation quite trivial. As for fooling a human reverser, it all depends on how convincing your opaque predicates are. If a human reverser can quickly identify the opaque predicates from the real program logic, it won’t take long before these functions are reversed. On the other hand, if the opaque predicates are very confusing and look as if they are an actual part of the program’s logic, the preceding example might be quite difficult to reverse. Additional obfuscation can be achieved by having all three functions share the same entry point and adding a parameter that tells the new function which of the three code paths should be taken. The beauty of this is that it can be highly confusing if the three functions are functionally irrelevant.
Ordering Transformations
Shuffling the order of operations in a program is a free yet decently effective method for confusing reversers. The idea is to simply randomize the order of operations in a function as much as possible. This is beneficial because as reversers we count on the locality of the code we’re reversing—we assume that there’s a logical order to the operations performed by the program.
It is obviously not always possible to change the order of operations performed in a program; many program operations are codependent. The idea is to find operations that are not codependent and completely randomize their order. Ordering transformations are more relevant for automated obfuscation tools, because it wouldn’t be advisable to change the order of operations in the program source code. The confusion caused by the software developers would probably outweigh the minor influence this transformation has on reversers.
Data Transformations
Data transformation are obfuscation transformations that focus on obfuscating the program’s data rather than the program’s structure. This makes sense because as you already know figuring out the layout of important data structures in a program is a key step in gaining an understanding of the program and how it works. Of course, data transformations also boil down to code modifications, but the focus is to make the program’s data as difficult to understand as possible.
Modifying Variable Encoding
One interesting data-obfuscation idea is to modify the encoding of some or all program variables. This can greatly confuse reversers because the intuitive
356Chapter 10
meaninings of variable values will not be immediately clear. Changing the encoding of a variable can mean all kinds of different things, but a good example would be to simply shift it by one bit to the left. In a counter, this would mean that on each iteration the counter would be incremented by 2 instead of 1, and the limiting value would have to be doubled, so that instead of:
for (int i=1; i < 100; i++)
you would have:
for (int i=2; i < 200; i += 2)
which is of course functionally equivalent. This example is trivial and would do very little to deter reversers, but you could create far more complex encodings that would cause significant confusion with regards to the variable’s meaning and purpose. It should be noted that this type of transformation is better applied at the binary level, because it might actually be eliminated (or somewhat modified) by a compiler during the optimization process.
Restructuring Arrays
Restructuring arrays means that you modify the layout of some arrays in a way that preserves their original functionality but confuses reversers with regard to their purpose. There are many different forms to this transformation, such as merging more than one array into one large array (by either interleaving the elements from the arrays into one long array or by sequentially connecting the two arrays). It is also possible to break one array down into several smaller arrays or to change the number of dimensions in an array. These transformations are not incredibly potent, but could somewhat increase the confusion factor experienced by reversers. Keep in mind that it would usually be possible for an automated deobfuscator to reconstruct the original layout of the array.
Conclusion
There are quite a few options available to software developers interested in blocking (or rather slowing down) reversers from digging into their programs. In this chapter, I’ve demonstrated the two most commonly used approaches for dealing with this problem: antidebugger tricks and code obfuscation. The bottom line is that it is certainly possible to create code that is extremely difficult to reverse, but there is always a cost. The most significant penalty incurred by most antireversing techniques is in runtime performance; They just slow the program down. The magnitude of investment in antireversing measures will eventually boil down to simple economics: How performance-sensitive is the program versus how concerned are you about piracy and reverse engineering?
C H A P T E R
11
Breaking Protections
Cracking is the “dark art” of defeating, bypassing, or eliminating any kind of copy protection scheme. In its original form, cracking is aimed at software copy protection schemes such as serial-number-based registrations, hardware keys (dongles), and so on. More recently, cracking has also been applied to digital rights management (DRM) technologies, which attempt to protect the flow of copyrighted materials such as movies, music recordings, and books. Unsurprisingly, cracking is closely related to reversing, because in order to defeat any kind of software-based protection mechanism crackers must first determine exactly how that protection mechanism works.
This chapter provides some live cracking examples. I’ll be going over several programs and we’ll attempt to crack them. I’ll be demonstrating a wide variety of interesting cracking techniques, and the level of difficulty will increase as we go along.
Why should you learn and understand cracking? Well, certainly not for stealing software! I think the whole concept of copy protections and cracking is quite interesting, and I personally love the mind-game element of it. Also, if you’re interested in protecting your own program from cracking, you must be able to crack programs yourself. This is an important point: Copy protection technologies developed by people who have never attempted cracking are never effective!
Actual cracking of real copy protection technologies is considered an illegal activity in most countries. Yes, this chapter essentially demonstrates cracking,
357
358Chapter 11
but you won’t be cracking real copy protections. That would not only be illegal, but also immoral. Instead, I will be demonstrating cracking techniques on special programs called crackmes. A crackme is a program whose sole purpose is to provide an intellectual challenge to crackers, and to teach cracking basics to “newbies”. There are many hundreds of crackmes available online on several different reversing Web sites.
Patching
Let’s take the first steps in practical cracking. I’ll start with a very simple crackme called KeygenMe-3 by Bengaly. When you first run KeygenMe-3 you get a nice (albeit somewhat intimidating) screen asking for two values, with absolutely no information on what these two values are. Figure 11.1 shows the KeygenMe-3 dialog.
Typing random values into the two text boxes and clicking the “OK” button produces the message box in Figure 11.2. It takes a trained eye to notice that the message box is probably a “stock” Windows message box, probably generated by one of the standard Windows message box APIs. This is important because if this is indeed a conventional Windows message box, you could use a debugger to set a breakpoint on the message box APIs. From there, you could try to reach the code in the program that’s telling you that you have a bad serial number. This is a fundamental cracking technique—find the part in the program that’s telling you you’re unauthorized to run it. Once you’re there it becomes much easier to find the actual logic that determines whether you’re authorized or not.
Figure 11.1 KeygenMe-3’s main screen.
Breaking Protections 359
Figure 11.2 KeygenMe-3’s invalid serial number message.
Unfortunately for crackers, sophisticated protection schemes typically avoid such easy-to-find messages. For instance, it is possible for a developer to create a visually identical message box that doesn’t use the built-in Windows message box facilities and that would therefore be far more difficult to track. In such case, you could let the program run until the message box was displayed and then attach a debugger to the process and examine the call stack for clues on where the program made the decision to display this particular message box.
Let’s now find out how KeygenMe-3 displays its message box. As usual, you’ll try to use OllyDbg as your reversing tool. Considering that this is supposed to be a relatively simple program to crack, Olly should be more than enough.
As soon as you open the program in OllyDbg, you go to the Executable Modules view to see which modules (DLLs) are statically linked to it. Figure 11.3 shows the Executable Modules view for KeygenMe-3.
Figure 11.3 OllyDbg’s Executable Modules window showing the modules loaded in the key4.exe program.
360 Chapter 11
This view immediately tells you the Key4.exe is a “lone gunner,” apparently with no extra DLLs other than the system DLLs. You know this because other than the Key4.exe module, the rest of the modules are all operating system components. This is easy to tell because they are all in the C:\WINDOWS\ SYSTEM32 directory, and also because at some point you just learn to recognize the names of the popular operating system components. Of course, if you’re not sure it’s always possible to just look up a binary executable’s properties in Windows and obtain some details on it such as who created it and the like. For example, if you’re not sure what lpk.dll is, just go to C:\WINDOWS\SYSTEM32 and look up its properties. In the Version tab you can see its version resource information, which gives you some basic details on the executable (assuming such details were put in place by the module’s author). Figure 11.4 shows the Version tab for lpk. from Windows XP Service Pack 2, and it is quite clearly an operating system component.
You can proceed to examine which APIs are directly called by Key4.exe by clicking View Names on Key4.exe in the Executable Modules window. This brings you to the list of functions imported and exported from Key4.exe. This screen is shown in Figure 11.5.
Figure 11.4 Version information for lpk.dll.
