Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Eilam E.Reversing.Secrets of reverse engineering.2005

.pdf
Скачиваний:
65
Добавлен:
23.08.2013
Размер:
8.78 Mб
Скачать

Breaking Protections 411

In this sequence, the first value pushed into the stack is the starting address of the encrypted data and the second value pushed is the ending address. You go to Olly’s dump window and dump data starting at 401E32. Now, you need to create a brute-forcer program and copy that decrypted data into it.

Before you actually write the program, you need to get a better understanding of the encryption algorithm used by Defender. A quick glance at a decryption sequence shows that it’s not just XORing the key against each DWORD in the code. It’s also XORing each 32-bit block with the previous unencrypted block. This is important because it means the decryption process must begin at the same position in the data where encryption started—otherwise the decryption process will generate corrupted data. We now have enough information to write our little decryption loop for the brute-forcer program.

for (DWORD dwCurrentBlock = 0; dwCurrentBlock <= dwBlockCount; dwCurrentBlock++)

{

dwDecryptedData[dwCurrentBlock] = dwEncryptedData[dwCurrentBlock] ^ dwCurrentKey;

dwDecryptedData[dwCurrentBlock] ^= dwPrevBlock; dwPrevBlock = dwEncryptedData[dwCurrentBlock];

}

This loop must be executed for each key! After decryption is completed you search for your token in the decrypted block. If you find it, you’ve apparently hit the correct key. If not, you increment your key by one and try to decrypt and search for the token again. Here’s the token searching logic.

PBYTE pbCurrent = (PBYTE) memchr(dwDecryptedData, Sequence[0], sizeof(dwEncryptedData));

while (pbCurrent)

{

if (memcmp(pbCurrent, Sequence, sizeof(Sequence)) == 0)

{

printf (“Found our sequence! Key is 0x%08x.\n”, dwCurrentKey); _exit(1);

}

pbCurrent++;

pbCurrent = (PBYTE) memchr(pbCurrent, Sequence[0], sizeof(dwEncryptedData) - (pbCurrent - (PBYTE) dwDecryptedData));

}

Realizing that all of this must be executed 4,294,967,296 times, you can start to see why this is going to take a little while to complete. Now, consider that this is merely a 32-bit key! A 64-bit key would have taken 4,294,967,296 _ 232 iterations to complete. At 4,294,967,296 iterations per-minute, it would still take about 8,000 years to go over all possible keys.

412 Chapter 11

Now, all that’s missing is the encrypted data and the token sequence. Here are the two arrays you’re dealing with here:

DWORD dwEncryptedData[] = {

 

 

0x5AA37BEB,

0xD7321D42,

0x2618DDF9,

0x2F1794E3,

0x1DE51172,

0x8BDBD150,

0xBB2954C1,

0x678CB4E3,

0x5DD701F9,

0xE11679A6,

0x501CD9A0,

0x685251B9,

0xD6F355EE,

0xE401D07F,

0x10C218A5,

0x22593307,

0x10133778,

0x22594B07,

0x1E134B78,

0xC5093727,

0xB016083D,

0x8A4C8DAC,

0x1BB759E3,

0x550A5611,

0x140D1DF4,

0xE8CE15C5,

0x47326D27,

0xF3F1AD7D,

0x42FB734C,

0xF34DF691,

0xAB07368B,

0xE5B2080F,

0xCDC6C492,

0x5BF8458B,

0x8B55C3C9 };

 

unsigned char Sequence[] = {0xC7, 0x45, 0xFC, 0x00, 0x00, 0x00, 0x00 };

At this point you’re ready to build this program and run it (preferably with all compiler optimizations enabled, to quicken the process as much as possible). After a few minutes, you get the following output.

Found our sequence! Key is 0xb14ac01a.

Very nice! It looks like you found what you were looking for. B14AC01A is our key. This means that the correct serial can be calculated using Serial=LOW PART(NameSerial) * VolumeSerial – B14AC01A. The question now is why is the serial 64 bits long? Is it possible that the upper 32 bits are unused?

Let’s worry about that later. For now, you can create a little keygen program that will calculate a NameSerial and this algorithm and give you a (hopefully) valid serial number that you can feed into Defender. The algorithm is quite trivial. Converting a name string to a 64-bit number is done using the algorithm described in Figure 11.16. Here’s a C implementation of that algorithm.

__int64 NameToInt64(LPWSTR pwszName)

{

__int64 Result = 0; int iPosition = 0; while (*pwszName)

{

Result += (__int64) *pwszName << (__int64) (*pwszName % 48); pwszName++;

iPosition++;

}

return Result;

}

Breaking Protections 413

The return value from this function can be fed into the following code:

char name[256]; char fsname[256]; DWORD complength;

DWORD VolumeSerialNumber;

GetVolumeInformation(“C:\\”, name, sizeof(name), &VolumeSerialNumber, &complength, 0, fsname, sizeof(fsname));

printf (“Volume serial number is: 0x%08x\n”, VolumeSerialNumber); printf (“Computing serial for name: %s\n”, argv[1]);

WCHAR wszName[256]; mbstowcs(wszName, argv[1], 256);

unsigned __int64 Name = NameToInt64(wszName); ULONG FirstNum = (ULONG) Name * VolumeSerialNumber;

unsigned __int64 Result = FirstNum - (ULONG) 0xb14ac01a;

printf (“Name number is: %08x%08x\n”, (ULONG) (Name >> 32), (ULONG) Name);

printf (“Name * VolumeSerialNumber is: %08x\n”, FirstNum); printf (“Serial number is: %08x%08x\n”,

(ULONG) (Result >> 32), (ULONG) Result);

This is the code for the keygen program. When you run it with the name John Doe, you get the following output.

Volume serial number is: 0x6c69e863

Computing serial for name: John Doe

Name number is: 000000212ccaf4a0

Name * VolumeSerialNumber is: 15cd99e0

Serial number is: 000000006482d9c6

Naturally, you’ll see different values because your volume serial number is different. The final number is what you have to feed into Defender. Let’s see if it works! You type “John Doe” and 000000006482D9C6 (or whatever your serial number is) as the command-line parameters and launch Defender. No luck. You’re still getting the “Sorry” message. Looks like you’re going to have to step into that encrypted function and see what it does.

The encrypted function starts with a NtDelayExecution and proceeds to call the inverse twin of that 64-bit left-shifter function you ran into earlier. This one does the same thing only with right shifts (32 of them to be exact). Defender is doing something you’ve seen it do before: It’s computing LOW PART(NameSerial) * VolumeSerial – HIGHPART(TypedSerial). It then does something that signals some more bad news: It returns the result from the preceding calculation to the caller.

This is bad news because, as you probably remember, this function’s return value is used for decrypting the function that called it. It looks like the high part of the typed serial is also somehow taking part in the decryption process.

414Chapter 11

You’re going to have to brute-force the calling function as well—it’s the only way to find this key.

In this function, the encrypted code starts at 401FED and ends at 40207F. In looking at the encryption/decryption local variable, you can see that it’s at the same offset [EBP-4] as in the previous function. This is good because it means that you’ll be looking for the same byte sequence:

unsigned char Sequence[] = {0xC7, 0x45, 0xFC, 0x00, 0x00, 0x00, 0x00 };

Of course, the data is different because it’s a different function, so you copy the new function’s data over into the brute-forcer program and let it run. Sure enough, after about 10 minutes or so you get the answer:

Found our sequence! Key is 0x8ed105c2.

Let’s immediately fix the keygen to correctly compute the high-order word of the serial number and try it out. Here’s the corrected keygen code.

unsigned __int64 Name = NameToInt64(wszName);

ULONG FirstNum = (ULONG) Name * VolumeSerialNumber; unsigned __int64 Result = FirstNum - (ULONG) 0xb14ac01a; Result |= (unsigned __int64) (FirstNum - 0x8ed105c2) << 32;

printf (“Name number is: %08x%08x\n”, (ULONG) (Name >> 32), (ULONG) Name);

printf (“Name * VolumeSerialNumber is: %08x\n”, FirstNum); printf (“Serial number is: %08x%08x\n”,

(ULONG) (Result >> 32), (ULONG) Result);

Running this corrected keygen with “John Doe” as the username, you get the following output:

Volume serial number is: 0x6c69e863

Computing serial for name: John Doe

Name number is: 000000212ccaf4a0

Name * VolumeSerialNumber is: 15cd99e0

Serial number is: 86fc941e6482d9c6

As expected, the low-order word of the serial number is identical, but you now have a full result, including the high-order word. You immediately try and run this data by Defender: Defender “John Doe” 86fc941e6482d9c6 (again, this number will vary depending on the volume serial number). Here’s Defender’s output:

Defender Version 1.0 - Written by Eldad Eilam

That is correct! Way to go!

Breaking Protections 415

Congratulations! You’ve just cracked Defender! This is quite impressive, considering that Defender is quite a complex protection technology, even compared to top-dollar commercial protection systems. If you don’t fully understand every step of the process you just undertook, fear not. You should probably practice on reversing Defender a little bit and quickly go over this chapter again. You can take comfort in the fact that once you get to the point where you can easily crack Defender, you are a world-class cracker. Again, I urge you to only use this knowledge in good ways, not for stealing. Be a good cracker, not a greedy cracker.

Protection Technologies in Defender

Let’s try and summarize the protection technologies you’ve encountered in Defender and attempt to evaluate their effectiveness. This can also be seen as a good “executive summary” of Defender for those who aren’t in the mood for 50 pages of disassembled code.

First of all, it’s important to understand that Defender is a relatively powerful protection compared to many commercial protection technologies, but it could definitely be improved. In fact, I intentionally limited its level of protection to make it practical to crack within the confines of this book. Were it not for these constraints, cracking would have taken a lot longer.

Localized Function-Level Encryption

Like many copy protection and executable packing technologies, Defender stores most of its key code in an encrypted form. This is a good design because it at least prevents crackers from elegantly loading the program in a disassembler such as IDA Pro and easily analyzing the entire program. From a livedebugging perspective encryption is good because it prevents or makes it more difficult to set breakpoints on the code.

Of course, most protection schemes just encrypt the entire program using a single key that is readily available somewhere in the program. This makes it exceedingly easy to write an “unpacker” program that automatically decrypts the entire program and creates a new, decrypted version of the program.

The beauty of Defender’s encryption approach is that it makes it much more difficult to create automatic unpackers because the decryption key for each encrypted code block is obtained at runtime.

Relatively Strong Cipher Block Chaining

Defender uses a fairly solid, yet simple encryption algorithm called Cipher Block Chaining (CBC) (see Applied Cryptography, Second Edition by Bruce Schneier [Schneier2]). The idea is to simply XOR each plaintext block with the

416Chapter 11

previous, encrypted block, and then to XOR the result with the key. This algorithm is quite secure and should not be compared to a simple XOR algorithm, which is highly vulnerable. In a simple XOR algorithm, the key is fairly easily retrievable as soon as you determine its length. All you have to do is find bytes that you know are encrypted within your encrypted block and XOR them with the encrypted data. The result is the key (assuming that you have at least as many bytes as the length of the key).

Of course, as I’ve demonstrated, a CBC is vulnerable to brute-force attacks, but for this it would be enough to just increase the key length to 64-bits or above. The real problem in copy protection technologies is that eventually the key must be available to the program, and without special hardware it is impossible to hide the key from cracker’s eyes.

Reencrypting

Defender reencrypts each function before that function returns to the caller. This creates an (admittedly minor) inconvenience to crackers because they never get to the point where they have the entire program decrypted in memory (which is a perfect time to dump the entire decrypted program to a file and then conveniently reverse it from there).

Obfuscated Application/Operating System Interface

One of the key protection features in Defender is its obfuscated interface with the operating system, which is actually quite unusual. The idea is to make it very difficult to identify calls from the program into the operating system, and almost impossible to set breakpoints on operating system APIs. This greatly complicates cracking because most crackers rely on operating system calls for finding important code areas in the target program (think of the Message BoxA call you caught in our KeygenMe3 session).

The interface attempts to attach to the operating system without making a single direct API call. This is done by manually finding the first system component (NTDLL.DLL) using the TEB, and then manually searching through its export table for APIs.

Except for a single call that takes place during initialization, APIs are never called through the user-mode component. All user-mode OS components are copied to a random memory address when the program starts, and the OS is accessed through this copied code instead of using the original module. Any breakpoints placed on any user-mode API would never be hit. Needless to say, this has a significant memory consumption impact on the program and a certain performance impact (because the program must copy significant amounts of code every time it is started).

Breaking Protections 417

To make it very difficult to determine which API the program is trying to call APIs are searched using a checksum value computed from their names, instead of storing their actual names. Retrieving the API name from its checksum is not possible.

There are several weaknesses in this technique. First of all, the implementation in Defender maintained the APIs order from the export table, which simplified the process of determining which API was being called. Randomly reorganizing the table during initialization would prevent crackers from using this approach. Also, for some APIs, it is possible to just directly step into the kernel in a kernel debugger and find out which API is being called. There doesn’t seem to be a simple way to work around this problem, but keep in mind that this is primarily true for native NTDLL APIs, and is less true for Win32 APIs.

One more thing—remember how you saw that Defender was statically linked to KERNEL32.DLL and had an import entry for IsDebuggerPresent? The call to that API was obviously irrelevant—it was actually in unreachable code. The reason I added that call was that older versions of Windows (Windows NT 4.0 and Windows 2000) just wouldn’t let Defender load without it. It looks like Windows expects all programs to make at least one system call.

Processor Time-Stamp Verification Thread

Defender includes what is, in my opinion, a fairly solid mechanism for making the process of live debugging on the protected application very difficult. The idea is to create a dedicated thread that constantly monitors the hardware time-stamp counter and kills the process if it looks like the process has been stopped in some way (as in by a debugger). It is important to directly access the counter using a low-level instruction such as RDTSC and not using some system API, so that crackers can’t just hook or replace the function that obtains this value.

Combined with a good encryption on each key function a verification thread makes reversing the program a lot more annoying than it would have been otherwise. Keep in mind that without encryption this technique wouldn’t be very effective because crackers can just load the program in a disassembler and read the code.

Why was it so easy for us to remove the time-stamp verification thread in our cracking session? As I’ve already mentioned, I’ve intentionally made Defender somewhat easier to break to make it feasible to crack in the confines of this chapter. The following are several modifications that would make a time-stamp verification thread far more difficult to remove (of course it would always remain possible to remove, but the question is how long it would take):

418Chapter 11

■■Adding periodical checksum calculations from the main thread that verify the verification thread. If there’s a checksum mismatch, someone has patched the verification thread—terminate immediately.

■■Checksums must be stored within the code, rather than in some centralized location. The same goes for the actual checksum verifications— they must be inlined and not implemented in one single function. This would make it very difficult to eliminate the checks or modify the checksum.

■■Store a global handle to the verification thread. With each checksum verification ensure the thread is still running. If it’s not, terminate the program immediately.

One thing that should be noted is that in its current implementation the verification thread is slightly dangerous. It is reliable enough for a cracking exercise, but not for anything beyond that. The relatively short period and the fact that it’s running in normal priority means that it’s possible that it will terminate the process unjustly, without a debugger.

In a commercial product environment the counter constant should probably be significantly higher and should probably be calculated in runtime based on the counter’s update speed. In addition, the thread should be set to a higher priority in order to make sure higher priority threads don’t prevent it from receiving CPU time and generate false positives.

Runtime Generation of Decryption Keys

Generating decryption keys in runtime is important because it means that the program could never be automatically unpacked. There are many ways to obtain keys in runtime, and Defender employs two methods.

Interdependent Keys

Some of the individual functions in Defender are encrypted using interdependent keys, which are keys that are calculated in runtime from some other program data. In Defender’s case I’ve calculated a checksum during the reencryption process and used that checksum as the decryption key for the next function. This means that any change (such as a patch or a breakpoint) to the encrypted function would prevent the next function (in the runtime execution order) from properly decrypting. It would probably be worthwhile to use a cryptographic hash algorithm for this purpose, in order to prevent attackers from modifying the code, and simply adding a couple of bytes that would keep the original checksum value. Such modification would not be possible with cryptographic hash algorithms—any change in the code would result in a new hash value.

Breaking Protections 419

User-Input-Based Decryption Keys

The two most important functions in Defender are simply inaccessible unless you have a valid serial number. This is similar to dongle protection where the program code is encrypted using a key that is only available on the dongle. The idea is that a user without the dongle (or a valid serial in Defender’s case) is simply not going to be able to crack the program. You were able to crack Defender only because I purposely used short 32-bit keys in the Chained Block Cipher. Were I to use longer, 64-bit or 128-bit keys, cracking wouldn’t have been possible without a valid serial number.

Unfortunately, when you think about it, this is not really that impressive. Supposing that Defender were a commercial software product, yes, it would have taken a long time for the first cracker to crack it, but once the algorithm for computing the key was found, it would only take a single valid serial number to find out the key that was used for encrypting the important code chunks. It would then take hours until a keygen that includes the secret keys within it would be made available online. Remember: Secrecy is only a temporary state!

Heavy Inlining

Finally, one thing that really contributes to the low readability of Defender’s assembly language code is the fact that it was compiled with very heavy inlining. Inlining refers to the process of inserting function code into the body of the function that calls them. This means that instead of having one copy of the function that everyone can call, you will have a copy of the function inside the function that calls it. This is a standard C++ feature and only requires the inline keyword in the function’s prototype.

Inlining significantly complicates reversing in general and cracking in particular because it’s difficult to tell where you are in the target program—clearly defined function calls really make it easier for reversers. From a cracking standpoint, it is more difficult to patch an inlined function because you must find every instance of the code, instead of just patching the function and have all calls go to the patched version.

Conclusion

In this chapter, you uncovered the fascinating world of cracking and saw just closely related it is to reversing. Of course, cracking has no practical value other than the educational value of learning about copy protection technologies. Still, cracking is a serious reversing challenge, and many people find it

420Chapter 11

very challenging and enjoyable. If you enjoyed the reversing sessions presented in this chapter, you might enjoy cracking some of the many crackmes available online. One recommended Web site that offers crackmes at a variety of different levels (and for a variety of platforms) is www.crackmes.de. Enjoy!

As a final reminder, I would like to reiterate the obvious: Cracking commercial copy protection mechanisms is considered illegal in most countries. Please honor the legal and moral right of software developers and other copyright owners to reap the fruit of their efforts!