Eilam E.Reversing.Secrets of reverse engineering.2005
.pdf
Understanding Compiled Arithmetic 521
The above addition will produce different results, depending on whether the destination operand is treated as signed or unsigned. When presented in hexadecimal form, the result is 0x8326, which is equivalent to 33574—assum- ing that AX is considered to be an unsigned operand. If you’re treating AX as a signed operand, you will see that an overflow has occurred. Because any signed number that has the most significant bit set is considered negative, 0x8326 becomes –31962. It is obvious that because a signed 16-bit operand can only represent values up to 32767, adding 4390 and 29184 would produce an overflow, and AX would wraparound to a negative number. Therefore, from an unsigned perspective no overflow has occurred, but if you consider the destination operand to be signed, an overflow has occurred. Because of this, the preceding code would result in OF (representing overflows in signed operands) being set and in CF (representing overflows in unsigned operands) being cleared.
The Zero Flag (ZF)
The zero flag is set when the result of an arithmetic operation is zero, and it is cleared if the result is nonzero. ZF is used in quite a few different situations in IA-32 code, but probably one of the most common uses it has is for comparing two operands and testing whether they are equal. The CMP instruction subtracts one operand from the other and sets ZF if the pseudoresult of the subtraction operation is zero, which indicates that the operands are equal. If the operands are unequal, ZF is set to zero.
The Sign Flag (SF)
The sign flag receives the value of the most significant bit of the result (regardless of whether the result is signed or unsigned). In signed integers this is equivalent to the integer’s sign. A value of 1 denotes a negative number in the result, while a value of 0 denotes a positive number (or zero) in the result.
The Parity Flag (PF)
The parity flag is a (rarely used) flag that reports the binary parity of the lower 8 bits of certain arithmetic results. Binary parity means that the flag reports the parity of the number of bits set, as opposed to the actual numeric parity of the result. A value of 1 denotes an even number of set bits in the lower 8 bits of the result, while a value of 0 denotes an odd number of set bits.
522 Appendix B
Basic Integer Arithmetic
The following section discusses the basic arithmetic operations and how they are implemented by compilers on IA-32 machines. I will cover optimized addition, subtraction, multiplication, division, and modulo.
Note that with any sane compiler, any arithmetic operation involving two constant operands will be eliminated completely and replaced with the result in the assembly code. The following discussions of arithmetic optimizations only apply to cases where at least one of the operands is variable and is not known in advance.
Addition and Subtraction
Integers are generally added and subtracted using the ADD and SUB instructions, which can take different types of operands: register names, immediate hard-coded operands, or memory addresses. The specific combination of operands depends on the compiler and doesn’t always reflect anything specific about the source code, but one obvious point is that adding or subtracting an immediate operand usually reflects a constant that was hard-coded into the source code (still, in some cases compilers will add or subtract a constant from a register for other purposes, without being instructed to do so at the source code level). Note that both instructions store the result in the left-hand operand.
Subtraction and addition are very simple operations that are performed very efficiently in modern IA-32 processors and are usually implemented in straightforward methods by compilers. On older implementations of IA-32 the LEA instruction was considered to be faster than ADD and SUB, which brought many compilers to use LEA for quick additions and shifts. Here is how the LEA instruction can be used to perform an arithmetic operation.
lea |
ecx, DWORD PTR [edx+edx] |
Notice that even though most disassemblers add the words DWORD PTR before the operands, LEA really can’t distinguish between a pointer and an integer. LEA never performs any actual memory accesses.
Starting with Pentium 4 the situation has reversed and most compilers will use ADD and SUB when generating code. However, when surrounded by several other ADD or SUB instructions, the Intel compiler still seems to use LEA. This is probably because the execution unit employed by LEA is separate from the ones used by ADD and SUB. Using LEA makes sense when the main ALUs are busy—it improves the chances of achieving parallelism in runtime.
Understanding Compiled Arithmetic 523
Multiplication and Division
Before beginning the discussion on multiplication and division, I will discuss a few of the basics. First of all, keep in mind that multiplication and division are both considered fairly complex operations in computers, far more so than addition and subtraction. The IA-32 processors provide instructions for several different kinds of multiplication and division, but they are both relatively slow. Because of this, both of these operations are quite often implemented in other ways by compilers.
Dividing or multiplying a number by powers of 2 is a very natural operation for a computer, because it sits very well with the binary representation of the integers. This is just like the way that people can very easily divide and multiply by powers of 10. All it takes is shifting a few zeros around. It is interesting how computers deal with division and multiplication in much in the same way as we do. The general strategy is to try and bring the divisor or multiplier as close as possible to a convenient number that is easily represented by the number system. You then perform that relatively simple calculation, and figure out how to apply the rest of the divisor or multiplier to the calculation. For IA-32 processors, the equivalent of shifting zeros around is to perform binary shifts using the SHL and SHR instructions. The SHL instruction shifts values to the left, which is the equivalent of multiplying by powers of 2. The SHR instruction shifts values to the right, which is the equivalent of dividing by powers of 2. After shifting compilers usually use addition and subtraction to compensate the result as needed.
Multiplication
When you are multiplying a variable by another variable, the MUL/IMUL instruction is generally the most efficient tool you have at your disposal. Still, most compilers will completely avoid using these instructions when the multiplier is a constant. For example, multiplying a variable by 3 is usually implemented by shifting the number by 1 bit to the left and then adding the original value to the result. This can be done either by using SHL and ADD or by using LEA, as follows:
lea |
eax, DWORD PTR [eax+eax*2] |
In more complicated cases, compilers use a combination of LEA and ADD. For example, take a look at the following code, which is essentially a multiplication by 32:
lea |
eax, DWORD PTR [edx+edx] |
|
add |
eax, eax |
|
add |
eax, eax |
|
add |
eax, |
eax |
add |
eax, |
eax |
Understanding Compiled Arithmetic 525
faster than division on IA-32 processors, and in some cases it is possible to avoid the use of division instructions by using multiplication instructions. The idea is to multiply the dividend by a fraction that is the reciprocal of the divisor. For example, if you wanted to divide 30 by 3, you would simply compute the reciprocal for 3, which is 1 ÷ 3.The result of such an operation is approximately 0.3333333, so if you multiply 30 by 0.3333333, you end up with the correct result, which is 10.
Implementing reciprocal multiplication in integer arithmetic is slightly more complicated because the data type you’re using can only represent integers. To overcome this problem, the compiler uses fixed-point arithmetic.
Fixed-point arithmetic enables the representation of fractions and real numbers without using a “floating” movable decimal point. With fixed-point arithmetic, the exponent component (which is the position of the decimal dot in floating-point data types) is not used, and the position of the decimal dot remains fixed. This is in contrast to hardware floating-point mechanisms in which the hardware is responsible for allocating the available bits between the integral value and the fractional value. Because of this mechanism floatingpoint data types can represent a huge range of values, from extremely small (between 0 and 1) to extremely large (with dozens of zeros before the decimal point).
To represent an approximation of a real number in an integer, you define an imaginary dot within our integer that defines which portion of it represents the number’s integral value and which portion represents the fractional value. The integral value is represented as a regular integer, using the number of bits available to it based on our division. The fractional value represents an approximation of the number’s distance from the current integral value (for example, 1) to the next one up (to follow this example, 2), as accurately as possible with the available number of bits. Needless to say, this is always an approximation—many real numbers can never be accurately represented. For example, in order to represent .5, the fractional value would contain 0x80000000 (assuming a 32-bit fractional value). To represent .1, the fractional value would contain 0x20000000.
To go back to the original problem, in order to multiply a 32-bit dividend by an integer reciprocal the compiler multiplies the dividend by a 32-bit reciprocal. This produces a 64-bit result. The lower 32 bits contain the remainder (also represented as a fractional value) and the upper 32 bits actually contain the desired result.
Table B.1 presents several examples of 32-bit reciprocals used by compilers. Every reciprocal is used together with a divisor which is always a powers of two (essentially a right shift, we’re trying to avoid actual division here). Compilers combine right shifts with the reciprocals in order to achieve greater accuracy because reciprocals are not accurate enough when working with large dividends.
526Appendix B
Table B.1 Examples of Reciprocal Multiplications in Division
|
|
RECIPROCAL |
|
DIVISOR IN |
32-BIT |
VALUE (AS A |
COMBINED |
SOURCE CODE |
RECIPROCAL |
FRACTION) |
WITH DIVISOR |
3 |
0xAAAAAAAB |
2/3 |
2 |
|
|
|
|
5 |
0xCCCCCCCD |
4/5 |
4 |
|
|
|
|
6 |
0xAAAAAAAB |
2/3 |
4 |
|
|
|
|
Notice that the last digit in each reciprocal is incremented by one. This is because the fractional values can never be accurately represented, so the compiler is rounding the fraction upward to obtain an accurate integer result (within the given bits).
Of course, keep in mind that multiplication is also not a trivial operation, and multiplication instructions in IA-32 processors can be quite slow (though significantly faster than division). Because of this, compilers only use reciprocal when the divisor is not a power of 2. When it is, compilers simply shift operands to the right as many times as needed.
Deciphering Reciprocal-Multiplications
Reciprocal multiplications are quite easy to detect when you know what to look for. The following is a typical reciprocal multiplication sequence:
mov |
ecx, eax |
mov |
eax, 0xaaaaaaab |
mul |
ecx |
shr |
edx, 2 |
mov |
eax, edx |
DIVIDING VARIABLE DIVIDENDS USING RECIPROCAL MULTIPLICATION?
There are also optimized division algorithms that can be used for variable dividends, where the reciprocal is computed in runtime, but modern IA-32 implementations provide a relatively high-performance implementation of the DIV and IDIV instructions. Because of this, compilers rarely use reciprocal multiplication for variable dividends when generating IA-32 code—they simply use the DIV or IDIV instructions. The time it would take to compute the reciprocal in runtime plus the actual reciprocal multiplication time would be longer than simply using a straightforward division.
528 Appendix B
mov |
eax, DWORD PTR [Divisor] |
cdq |
|
mov |
edi, 100 |
idiv |
edi |
This code divides Divisor by 100 and places the result in EDX. This is the most trivial implementation because the modulo is obtained by simply dividing the two values using IDIV, the processor’s signed division instruction. IDIV’s normal behavior is that it places the result of the division in EAX and the remainder in EDX, so that code running after this snippet can simply grab the remainder from EDX. Note that because IDIV is being passed a 32-bit divisor (EDI), it will use a 64-bit dividend in EDX:EAX, which is why the CDQ instruction is used. It simply converts the value in EAX into a 64-bit value in EDX:EAX. For more information on CDQ refer to the type conversions section later in this chapter.
This approach is good for reversers because it is highly readable, but isn’t quite the fastest in terms of runtime performance. IDIV is a fairly slow instruc- tion—one of the slowest in the entire instruction set. This code was generated by the Microsoft compiler.
Some compilers actually use a multiplication by a reciprocal in order to determine the modulo (see the section on division).
64-Bit Arithmetic
Modern 32-bit software frequently uses larger-than-32-bit integer data types for various purposes such as high-precision timers, high-precision signal processing, and many others. For general-purpose code that is not specifically compiled to run on advanced processor enhancements such as SSE, SSE2, and SSE3, the compiler combines two 32-bit integers and uses specialized sequences to perform arithmetic operations on them. The following sections describe how the most common arithmetic operations are performed on such 64-bit data types.
When working with integers larger than 32-bits (without the advanced SIMD data types), the compiler employs several 32-bit integers to represent the full operands. In these cases arithmetic can be performed in different ways, depending on the specific compiler. Compilers that support these larger data types will include built-in mechanisms for dealing with these data types. Other compilers might treat these data types as data structures containing several integers, requiring the program or a library to provide specific code that performs arithmetic operations on these data types.
530 Appendix B
called allmul that is called whenever two 64-bit values are multiplied. This function, along with its assembly language source code, is included in the Microsoft C run-time library (CRT), and is presented in Listing B.1.
_allmul PROC NEAR |
|
|
mov |
eax,HIWORD(A) |
|
mov |
ecx,HIWORD(B) |
|
or |
ecx,eax |
;test for both hiwords zero. |
mov |
ecx,LOWORD(B) |
|
jnz |
short hard |
;both are zero, just mult ALO and BLO |
mov |
eax,LOWORD(A) |
|
mul |
ecx |
|
ret |
16 |
; callee restores the stack |
hard: |
|
|
push |
ebx |
|
mul |
ecx |
;eax has AHI, ecx has BLO, so AHI * BLO |
mov |
ebx,eax |
;save result |
mov |
eax,LOWORD(A2) |
|
mul |
dword ptr HIWORD(B2) ;ALO * BHI |
|
add |
ebx,eax |
;ebx = ((ALO * BHI) + (AHI * BLO)) |
mov |
eax,LOWORD(A2) |
;ecx = BLO |
mul |
ecx |
;so edx:eax = ALO*BLO |
add |
edx,ebx |
;now edx has all the LO*HI stuff |
pop |
ebx |
|
ret |
16 |
|
|
|
|
Listing B.1 The allmul function used for performing 64-bit multiplications in code generated by the Microsoft compilers.
Unfortunately, in most reversing scenarios you might run into this function without knowing its name (because it will be an internal symbol inside the program). That’s why it makes sense for you to take a quick look at Listing B.1 to try to get a general idea of how this function works—it might help you identify it later on when you run into this function while reversing.
Division
Dividing 64-bit integers is significantly more complex than multiplying, and again the compiler uses an external function to implement this functionality. The Microsoft compiler uses the alldiv CRT function to implement 64-bit divisions. Again, alldiv is fully listed in Listing B.2 in order to simply its identification when reversing a program that includes 64-bit arithmetic.
