Добавил:

Andrey Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный электротехнический университет "ЛЭТИ"

Предмет:

Электротехника

Файл:

Eilam E.Reversing.Secrets of reverse engineering.2005

.pdf

Скачиваний:

Добавлен:

23.08.2013

Размер:

8.78 Mб

Скачать

☆

<<< < Предыдущая 26 27 28 29 30 31 32 33 34 35 36 3738 / 6238 39 40 41 42 43 44 45 46 47 48 49 50 > Следующая >>>

Antireversing Techniques 341

0040103F	. 50		PUSH	EAX
00401040	E8	BBFFFFFF	CALL	compiler.main

Olly is clearly ignoring the junk byte and using the conditional jump as a marker to the real code starting position, which is why it is providing an accurate listing. It is possible that Olly contains specific code for dealing with these kinds of tricks. Regardless, at this point it becomes clear that you can take advantage of Olly’s use of the jump’s target address to confuse it; if OllyDbg uses conditional jumps to mark the beginning of valid code sequences, you can just create a conditional jump that points to the beginning of the invalid sequence. The following code snippet demonstrates this idea:

_asm

{

mov eax, 2 cmp eax, 3 je Junk jne After

Junk:

_emit 0xf

After:

mov eax, [SomeVariable] push eax

call AFunction

}

This sequence is an improved implementation of the same approach. It is more likely to confuse recursive traversal disassemblers because they will have to randomly choose which of the two jumps to use as indicators of valid code. The reason why this is not trivial is that both codes are “valid” from the disassembler’s perspective. This is a theoretical problem: the disassembler has no idea what constitutes valid code. The only measurement it has is whether it finds invalid opcodes, in which case a clever disassembler should probably consider the current starting address as invalid and look for an alternative one.

Let’s look at the listing Olly produces from the above code.

00401031	. B8	02000000		MOV EAX,2
00401036	. 83F8		03	CMP EAX,3
00401039	. 74	02		JE SHORT compiler.0040103D
0040103B	. 75	01		JNZ SHORT compiler.0040103E
0040103D	> 0F8B		45F850E8	JPO E8910888
00401043	? B9	FFFFFF68		MOV ECX,68FFFFFF
00401048	? DC60		40	FSUB QWORD PTR DS:[EAX+40]
0040104B	? 00E8			ADD AL,CH
0040104D	? 0300			ADD EAX,DWORD PTR DS:[EAX]
0040104F	? 0000			ADD BYTE PTR DS:[EAX],AL

342 Chapter 10

This time OllyDbg swallows the bait and uses the invalid 0040103D as the starting address from which to disassemble, which produces a meaningless assembly language listing. What’s more, IDA Pro produces an equally unreadable output—both major recursive traversers fall for this trick. Needless to say, linear sweepers such as SoftICE react in the exact same manner.

One recursive traversal disassembler that is not falling for this trick is PEBrowse Professional. Here is the listing produced by PEBrowse:

0x401031: B802000000		mov	eax,0x2
0x401036: 83F803		cmp	eax,0x3
0x401039: 7402		jz	0x40103d	; (*+0x4)
0x40103B:	7501	jnz	0x40103e	;	(*+0x3)
0x40103D:	0F8B45F850E8	jpo	0xe8910888	;	<==0x00401039(*-0x4)

;***********************************************************************

0x40103E: 8B45F8		mov	eax,dword ptr [ebp-0x8] ; VAR:0x8
0x401041:	50	push	eax
0x401042:	E8B9FFFFFF	call	0x401000

;***********************************************************************

Apparently (and it’s difficult to tell whether this is caused by the presence of special heuristics designed to withstand such code sequences or just by a fluke) PEBrowse Professional is trying to disassemble the code from both 40103D and from 40103E, and is showing both options. It looks like you’ll need to improve on your technique a little bit—there must not be a direct jump to the valid code address if you’re to fool every disassembler. The solution is to simply perform an indirect jump using a value loaded in a register. The following code confuses every disassembler I’ve tested, including both linear- sweep-based tools and recursive-traversal-based tools.

_asm

{

mov eax, 2 cmp eax, 3 je Junk

mov eax, After jmp eax

Junk:

_emit 0xf

After:

mov eax, [SomeVariable] push eax

call AFunction

}

The reason this trick works is quite trivial—because the disassembler has no idea that the sequence mov eax, After, jmp eax is equivalent to jmp After, the disassembler is not even trying to begin disassembling from the After address.

Antireversing Techniques 343

The disadvantage of all of these tricks is that they count on the disassembler being relatively dumb. Luckily, most Windows disassemblers are dumb enough that you can fool them. What would happen if you ran into a clever disassembler that actually analyzes each line of code and traces the flow of data? Such a disassembler would not fall for any of these tricks, because it would detect your opaque predicate; how difficult is it to figure out that a conditional jump that is taken when 2 equals 3 is never actually going to be taken? Moreover, a simple data-flow analysis would expose the fact that the final JMP sequence is essentially equivalent to a JMP After, which would probably be enough to correct the disassembly anyhow.

Still even a cleverer disassembler could be easily fooled by exporting the real jump addresses into a central, runtime generated data structure. It would be borderline impossible to perform a global data-flow analysis so comprehensive that it would be able to find the real addresses without actually running the program.

Applications

Let’s see how one would use the previous techniques in a real program. I’ve created a simple macro called OBFUSCATE, which adds a little assembly language sequence to a C program (see Listing 10.1). This sequence would temporarily confuse most disassemblers until they resynchronized. The number of instructions it will take to resynchronize depends not only on the specific disassembler used, but also on the specific code that comes after the macro.

#define paste(a, b) a##b
#define pastesymbols(a, b)		paste(a, b)
#define OBFUSCATE() \
_asm { mov	eax, __LINE__	* 0x635186f1	};\
_asm { cmp	eax, __LINE__	* 0x9cb16d48	};\
_asm { je	pastesymbols(Junk,__LINE__)		};\
_asm { mov	eax, pastesymbols(After, __LINE__)		};\
_asm { jmp	eax		};\
_asm { pastesymbols(Junk, __LINE__):			};\
_asm { _emit (0xd8 + __LINE__ % 8)			};\
_asm { pastesymbols(After,		__LINE__):	};

Listing 10.1 A simple code obfuscation macro that aims at confusing disassemblers.

This macro was tested on the Microsoft C/C++ compiler (version 13), and contains pseudorandom values to make it slightly more difficult to search and replace (the MOV and CMP instructions and the junk byte itself are all random, calculated using the current code line number). Notice that the junk byte ranges from D8 to DF—these are good opcodes to use because they are all

344Chapter 10

multibyte opcodes. I’m using the __LINE__ macro in order to create unique symbol names in case the macro is used repeatedly in the same function. Each occurrence of the macro will define symbols with different names. The paste and pastesymbols macros are required because otherwise the compiler just won’t properly resolve the __LINE__ constant and will use the string

__LINE__ instead.

If distributed throughout the code, this macro (and you could very easily create dozens of similar variations) would make the reversing process slightly more tedious. The problem is that too many copies of this code would make the program run significantly slower (especially if the macro is placed inside key loops in the program that run many times). Overusing this technique would also make the program significantly larger in terms of both memory consumption and disk space usage.

It’s important to realize that all of these techniques are limited in their effectiveness. They most certainly won’t deter an experienced and determined reverser from reversing or cracking your application, but they might complicate the process somewhat. The manual approach for dealing with this kind of obfuscated code is to tell the disassembler where the code really starts. Advanced disassemblers such as IDA Pro or even OllyDbg’s built-in disassembler allow users to add disassembly hints, which enable the program to properly interpret the code.

The biggest problem with these macros is that they are repetitive, which makes them exceedingly vulnerable to automated tools that just search and destroy them. A dedicated attacker can usually write a program or script that would eliminate them in 20 minutes. Additionally, specific disassemblers have been created that overcome most of these obfuscation techniques (see “Static Disassembly of Obfuscated Binaries” by Christopher Kruegel, et al. [Kruegel]). Is it worth it? In some cases it might be, but if you are looking for powerful antireversing techniques, you should probably stick to the control flow and data-flow obfuscating transformations discussed next.

Code Obfuscation

You probably noticed that the antireversing techniques described so far are all platform-specific “tricks” that in my opinion do nothing more than increase the attacker’s “annoyance factor”. Real code obfuscation involves transforming the code in such a way that makes it significantly less human-readable, while still retaining its functionality. These are typically non-platform-specific transformations that modify the code to hide its original purpose and drown the reverser in a sea of irrelevant information. The level of complexity added by an obfuscating transformation is typically called potency, and can be measured using conventional software complexity metrics such as how many predicates the program contains and the depth of nesting in a particular code sequence.

Antireversing Techniques 345

OBFUSCATION TOOLS

Let’s take a quick look at the existing obfuscation tools that can be used to obfuscate programs on the fly. There are quite a few bytecode obfuscators for Java and .NET, and I will be discussing and evaluating some of them in Chapter 12. As for obfuscation of native IA-32 code, there aren’t that many generic tools that process entire executables and effectively obfuscate them. One notable product that is quite powerful is EXECryptor by StrongBit Technology (www.strongbit.com). EXECryptor processes PE executables and applies a variety of obfuscating transformations on the machine code. Code obfuscated by EXECryptor really becomes significantly more difficult to reverse compared to plain IA-32 code. Another powerful technology is the StarForce suite of copy protection products, developed by StarForce Technologies (www.star-force. com). The StarForce products are more than just powerful obfuscation products: they are full-blown copy protection products that provide either hardwarebased or pure software-based copy protection functionality.

Beyond the mere additional complexity introduced by adding additional logic and arithmetic to a program, an obfuscating transformation must be resilient (meaning that it cannot be easily undone). Because many of these transformations add irrelevant instructions that don’t really produce valuable data, it is possible to create deobfuscators. A deobfuscator is a program that implements various data-flow analysis algorithms on an obfuscated program which sometimes enable it to separate the wheat from the chaff and automatically remove all irrelevant instructions and restore the code’s original structure. Creating resilient obfuscation transformations that are resistant to deobfuscation is a major challenge and is the primary goal of many obfuscators.

Finally, an obfuscating transformation will typically have an associated cost. This can be in the form of larger code, slower execution times, or increased memory runtime consumption. It is important to realize that some transformations do not incur any kind of runtime costs, because they involve a simple reorganization of the program that is transparent to the machine, but makes the program less human-readable.

In the following sections, I will be going over the common obfuscating transformations. Most of these transformations were meant to be applied programmatically by running an obfuscator on an existing program, either at the source code or the binary level. Still, many of these transformations can be applied manually, while the program is being written or afterward, before it is shipped to end users. Automatic obfuscation is obviously far more effective because it can obfuscate the entire program and not just small parts of it. Additionally, automatic obfuscation is typically performed after the program is compiled, which means that the original source code is not made any less readable (as is the case when obfuscation is performed manually).

346 Chapter 10

Control Flow Transformations

Control flow transformations are transformations that alter the order and flow of a program in a way that reduces its human readability. In “Manufacturing Cheap, Resilient, and Stealthy Opaque Constructs” by Christian Collberg, Clark Thomborson, and Douglas Low [Collberg1], control flow transformations are categorized as computation transformations, aggregation transformations, and ordering transformations.

Computation transformations are aimed at reducing the readability of the code by modifying the program’s original control flow structure in ways that make for a functionally equivalent program that is far more difficult to translate back into a high-level language. This is can be done either by removing control flow information from the program or by adding new control flow statements that complicate the program and cannot be easily translated into a high-level language.

Aggregation transformations destroy the high-level structure of the program by breaking the high-level abstractions created by the programmer while the program was being written. The basic idea is to break such abstractions so that the high-level organization of the code becomes senseless.

Ordering transformations are somewhat less powerful transformations that randomize (as much as possible) the order of operations in a program so that its readability is reduced.

Opaque Predicates

Opaque predicates are a fundamental building block for control flow transformations. I’ve already introduced some trivial opaque predicates in the previous section on antidisassembling techniques. The idea is to create a logical statement whose outcome is constant and is known in advance. Consider, for example the statement if (x + 1 == x). This statement will obviously never be satisfied and can be used to confuse reversers and automated decompilation tools into thinking that the statement is actually a valid part of the program.

With such a simple statement, it is going to be quite easy for both humans and machines to figure out that this is a false statement. The objective is to create opaque predicates that would be difficult to distinguish from the actual program code and whose behavior would be difficult to predict without actually stepping into the code. The interesting thing about opaque predicates (and about several other aspects of code obfuscation as well) is that confusing an automated deobfuscator is often an entirely different problem from confusing a human reverser.

Consider for example the concurrency-based opaque predicates suggested in [Collberg1]. The idea is to create one or more threads that are responsible for

Antireversing Techniques 347

constantly generating new random values and storing them in a globally accessible data structure. The values stored in those data structures consistently adhere to simple rules (such as being lower or higher than a certain constant). The threads that contain the actual program code can access this global data structure and check that those values are within the expected range. It would make quite a challenge for an automated deobfuscator to figure this structure out and pinpoint such fake control flow statements. The concurrent access to the data would hugely complicate the matter for an automated deobfuscator (though an obfuscator would probably only be aware of such concurrency in a bytecode language such as Java). In contrast, a person would probably immediately suspect a thread that constantly generates random numbers and stores them in a global data structure. It would probably seem very fishy to a human reverser.

Now consider a far simple arrangement where several bogus data members are added into an existing program data structure. These members are constantly accessed and modified by code that’s embedded right into the program. Those members adhere to some simple numeric rules, and the opaque predicates in the program rely on these rules. Such implementation might be relatively easy to detect for a powerful deobfuscator (depending on the specific platform), but could be quite a challenge for a human reverser.

Generally speaking, opaque predicates are more effective when implemented in lower-level machine-code programs than in higher-level bytecode program, because they are far more difficult to detect in low-level machine code. The process of automatically identifying individual data structures in a native machine-code program is quite difficult, which means that in most cases opaque predicates cannot be automatically detected or removed. That’s because performing global data-flow analysis on low-level machine code is not always simple or even possible. For reversers, the only way to deal with opaque predicates implemented on low-level native machine-code programs is to try and manually locate them by looking at the code. This is possible, but not very easy.

In contrast, higher-level bytecode executables typically contain far more details regarding the specific data structures used in the program. That makes it much easier to implement data-flow analysis and write automated code that detects opaque predicates.

The bottom line is that you should probably focus most of your antireversing efforts on confusing the human reversers when developing in lower-level languages and on automated decompilers/deobfuscators when working with bytecode languages such as Java.

For a detailed study of opaque constructs and various implementation ideas see [Collberg1] and General Method of Program Code Obfuscation by Gregory Wroblewski [Wroblewski].

348 Chapter 10

Confusing Decompilers

Because bytecode-based languages are highly detailed, there are numerous decompilers that are highly effective for decompiling bytecode executables. One of the primary design goals of most bytecode obfuscators is to confuse decompilers, so that the code cannot be easily restored to a highly detailed source code. One trick that does wonders is to modify the program binary so that the bytecode contains statements that cannot be translated back into the original high-level language. The example given in A Taxonomy of Obfuscating Transformations by Christian Collberg, Clark Thomborson, and Douglas Low [Collberg2] is the Java programming language, where the high-level language does not have the goto statement, but the Java bytecode does. This means that its possible to add goto statements into the bytecode in order to completely break the program’s flow graph, so that a decompiler cannot later reconstruct it (because it contains instructions that cannot be translated back to Java).

In native processor languages such as IA-32 machine code, decompilation is such a complex and fragile process that any kind of obfuscation transformation could easily get them to fail or produce meaningless code. Consider, for example, what would happen if a decompiler ran into the OBFUSCATE macro from the previous section.

Table Interpretation

Converting a program or a function into a table interpretation layout is a highly powerful obfuscation approach, that if done right can repel both deobfuscators and human reversers. The idea is to break a code sequence into multiple short chunks and have the code loop through a conditional code sequence that decides to which of the code sequences to jump at any given moment. This dramatically reduces the readability of the code because it completely hides any kind of structure within it. Any code structures, such as logical statements or loops, are buried inside this unintuitive structure.

As an example, consider the simple data processing function in Listing 10.2.

00401000	push	esi
00401001	push	edi
00401002	mov	edi,dword ptr [esp+10h]
00401006	xor	eax,eax
00401008	xor	esi,esi
0040100A	cmp	edi,3
0040100D	jbe	0040103A
0040100F	mov	edx,dword ptr [esp+0Ch]
00401013	add	edi,0FFFFFFFCh
00401016	push	ebx

Listing 10.2 A simple data processing function that XORs a data block with a parameter passed to it and writes the result back into the data block.

		Antireversing Techniques 349

00401017	mov	ebx,dword ptr [esp+18h]
0040101B	shr	edi,2
0040101E	push	ebp
0040101F	add	edi,1
00401022	mov	ecx,dword ptr [edx]
00401024	mov	ebp,ecx
00401026	xor	ebp,esi
00401028	xor	ebp,ebx
0040102A	mov	dword ptr [edx],ebp
0040102C	xor	eax,ecx
0040102E	add	edx,4
00401031	sub	edi,1
00401034	mov	esi,ecx
00401036	jne	00401022
00401038	pop	ebp
00401039	pop	ebx
0040103A	pop	edi
0040103B	pop	esi
0040103C	ret

Listing 10.2 A simple data processing function that XORs a data block with a parameter passed to it and writes the result back into the data block.

Let us now take this function and transform it using a table interpretation transformation.

00401040	push	ecx
00401041	mov	edx,dword ptr [esp+8]
00401045	push	ebx
00401046	push	ebp
00401047	mov	ebp,dword ptr [esp+14h]
0040104B	push	esi
0040104C	push	edi
0040104D	mov	edi,dword ptr [esp+10h]
00401051	xor	eax,eax
00401053	xor	ebx,ebx
00401055	mov	ecx,1
0040105A	lea	ebx,[ebx]
00401060	lea	esi,[ecx-1]
00401063	cmp	esi,8
00401066	ja	00401060
00401068	jmp	dword ptr [esi*4+4010B8h]
0040106F	xor	dword ptr [edx],ebx
00401071	add	ecx,1
00401074	jmp	00401060
00401076	mov	edi,dword ptr [edx]

Listing 10.3 The data-processing function from Listing 10.2 transformed using a table interpretation transformation. (continued)

350 Chapter 10

00401078	add	ecx,1
0040107B	jmp	00401060
0040107D	cmp	ebp,3
00401080	ja	00401071
00401082	mov	ecx,9
00401087	jmp	00401060
00401089	mov	ebx,edi
0040108B	add	ecx,1
0040108E	jmp	00401060
00401090	sub	ebp,4
00401093	jmp	00401055
00401095	mov	esi,dword ptr [esp+20h]
00401099	xor	dword ptr [edx],esi
0040109B	add	ecx,1
0040109E	jmp	00401060
004010A0	xor	eax,edi
004010A2	add	ecx,1
004010A5	jmp	00401060
004010A7	add	edx,4
004010AA	add	ecx,1
004010AD	jmp	00401060
004010AF	pop	edi
004010B0	pop	esi
004010B1	pop	ebp
004010B2	pop	ebx
004010B3	pop	ecx
004010B4	ret
The function’s jump table:
0x004010B8	0040107d	00401076	00401095	0040106f
0x004010C8	00401089	004010a0	004010a7	00401090
0x004010D8	004010af

Listing 10.3 (continued)

The function in Listing 10.3 is functionally equivalent to the one in 10.2, but it was obfuscated using a table interpretation transformation. The function was broken down into nine segments that represent the different stages in the original function. The implementation constantly loops through a junction that decides where to go next, depending on the value of ECX. Each code segment sets the value of ECX so that the correct code segment follows. The specific code address that is executed is determined using the jump table, which is included at the end of the listing. Internally, this is implemented using a simple switch statement, but when you think of it logically, this is similar to a little virtual machine that was built just for this particular function. Each “instruction” advances the “instruction pointer”, which is stored in ECX. The actual “code” is the jump table, because that’s where the sequence of operations is stored.

<<< < Предыдущая 26 27 28 29 30 31 32 33 34 35 36 3738 / 6238 39 40 41 42 43 44 45 46 47 48 49 50 > Следующая >>>

Соседние файлы в предмете Электротехника

#
23.08.20139 Mб52Dueck R.Digital design with CPLD applications and VHDL.2000.pdf
#
23.08.2013720.95 Кб9ECMA-262 standard.ECMAScript language specification.1999.pdf
#
23.08.2013355.43 Кб9Eden20v115.pdf
#
23.08.2013385.77 Кб129EIA-364-108 standard.Impedance,reflection coefficient,return loss measured for electrical connectors,cable assemblies and i.pdf
#
23.08.2013261.47 Кб24EIA-364-109 standard.Loop inductance measurement test procedure for electrical connectors.pdf
#
23.08.20138.78 Mб69Eilam E.Reversing.Secrets of reverse engineering.2005.pdf
#
23.08.2013323.38 Кб20Electrical connections for power circuits.2000.pdf
#
23.08.201326.2 Mб16Elektor electronics 2005.03.pdf
#
23.08.20135.43 Mб11Elkan C.The paradoxical success of fuzzy logic.pdf
#
23.08.201387.27 Кб17Emacs beginner's HOWTO.pdf
#
23.08.2013270.17 Кб15Emacs predictive completion manual.V0.12.2.pdf