Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Advanced C 1992

.pdf
Скачиваний:
93
Добавлен:
17.08.2013
Размер:
4.28 Mб
Скачать

Part IV • Documenting the Differences

Table 16.3. continued

Function

Microsoft

Borland

Watcom C/386

strcmp()

strcpy()

strlen()

strncat()

strncmp()

strncpy()

strnset()

strrchr()

tan()

tanh()

ÖÖ

Ö

Ö

Ö

Ö

Ö

Ö

 

Ö

 

 

Ö

 

 

Ö

 

 

Ö

 

 

Ö

 

Ö

 

 

Ö

 

 

The only disadvantage to using intrinsic functions is that they increase the size of the compiled code. For example, if strlen() is called 50 times in a program, an intrinsic function will generate 50 copies of the code that does the string copy, whereas the library function would generate one copy.

Common Subexpression Optimization

When performing common subexpression optimization, the compiler replaces redundant expressions with a single common subexpression. This process is similar to loop optimization. An example of a common subexpression follows:

k = 20;

i = k + 10; j = k + 10;

The compiler optimizes this code as

k = 20;

int _ _temp = k + 10;

i= _ _temp;

j= _ _temp;

666

Debugging and Efficiency

C C C

 

16C

 

C C C

 

C C

As in loop optimization, the compiler creates a temporary variable that holds an interim value. Often, the compiler can store the result of the computation in a register, so no memory is used. However, the statement’s location in the code (including the presence of intervening statements), dictates how the compiler processes the source code.

Common subexpression optimization is subject to the same problems as loop optimization (see the “Loop Optimization” section).

Direct Video I/O

At one time, almost all programs did direct video I/O because the original PC video systems (such as the CGA) were not designed for speed. To update the video quickly, programs had to write directly to the video’s memory.

You should consider several factors when deciding whether your program should support direct video. First, the display adapters in use today (some are EGA, but most are VGA) are better performers than the old CGA standard of 10 years ago.

Second, to support direct video I/O, your program must support a number of standards. Each system (CGA, MDA, EGA, and VGA) manages its display memory differently, with different buffer locations and arrangements in memory.

Do you access video memory directly? For most applications, there are better ways. For example, if you write a Windows or OS/2 application, accessing video memory directly is unacceptable because Windows manages the video. In addition, the efficiency of VGA video systems makes direct I/O unnecessary.

Some programmers assume that you must use direct video I/O to perform random writes to the screen (perhaps to place a message box in the center of the screen). This is untrue. On the PC, you can use the ANSI.SYS device driver for many screen operations, such as color changes and direct cursor addressing.

Floating-Point Optimization

You can significantly improve a program that relies on floating-point math (that is, a program that has data types of float, double, or long double) by using the proper floating-point compile and link options.

667

Part IV • Documenting the Differences

Microsoft C has five options (and various suboptions) for floating-point support. You can choose from three libraries when linking floating-point programs. These libraries control calls to floating-point routines:

The Alternate Math Package Library (mLIBCA.LIB) generates calls to floating-point math routines. There are both the true floating-point functions and routines to emulate the math coprocessor. If a coprocessor is installed, this package simply ignores it. This library produces the smallest executable program, but it does not support long double data types.

The Emulator Library (mLIBCE.LIB) generates calls to a library of functions that emulate the 80x87 math coprocessors. Floating-point functions are contained in the library; these functions also call the emulation routines rather than use a floating-point coprocessor.

The Coprocessor Library (mLIBC7.LIB) is not a library for support of floating-point operations because all floating-point operations are coded inline and are performed using an installed math coprocessor, which is required.

Floating-point functions are contained in the library, and these functions also use the floating-point coprocessor directly. A math coprocessor is required to execute a program linked with this library.

These libraries are used with the floating-point compiler options, which are as follows:

/FPa

Your code generates calls to a library. You can select which

 

library is linked: mLIBCA (the default), mLIBCE, or mLIBC7.

 

(mLIBC7 does not take advantage of inline instructions to the

 

math coprocessor).

/FPc

Your code is similar to that generated with /FPa, except the

 

default library supports long double. You can select which

 

library is linked: mLIBCA, mLIBCE (the default), or mLIBC7.

 

(mLIBC7 does not take advantage of inline instructions to the

 

math coprocessor).

/FPc87

Your code requires a math coprocessor. This option is a good

 

choice if you know that the target computer has a math

 

coprocessor. The code is similar to that generated using /FPc.

 

You can select which library is linked: mLIBCA, mLIBCE, or

 

mLIBC7 (the default).

668

 

Debugging and Efficiency

C C C

 

 

16C

 

 

C C C

 

 

C C

/FPi

Your code is generated using interrupts for floating-point

 

 

operations rather than inline floating-point instructions. The

 

software interrupt handler then checks whether a coprocessor is

 

installed. If a coprocessor is installed, the handler patches the

 

code to support the coprocessor. A problem with some compil-

 

ers is that this technique creates self-modifying code, some-

 

 

thing that does not work well under protected mode, in which

 

code segments can be read only. You can select which library is

 

linked: mLIBCA, mLIBCE (the default), or mLIBC7.

 

/FPi87

Your code is generated using interrupts for floating-point

 

 

operations. The software interrupt handler then checks to see if

 

a coprocessor is installed. If one is installed, the handler patches

 

the code to support the coprocessor. One problem with some

 

compiler versions is that this technique creates self-modifying

 

code, something that doesn’t work well under protected mode

 

where code segments may be read only. You can select which

 

library is linked: mLIBCA, mLIBCE, or mLIBC7 (the default).

The default option is /FPi, which works for most applications.

Inline Assembly

There is no argument about it—well-written assembly creates the fastest and smallest programs. However, many programmers are uncomfortable writing even small programs in assembly, let alone a major project. Writing an assembly program can take five times as long as writing a C program, and there may be five times as many lines in an assembly program, and five times as many chances to make a mistake.

One quick and not too difficult solution is to use inline assembly. In this technique, you use C to develop the underlying foundation for a function, then write the critical code in assembly. One useful feature of this technique is that you could write the entire function in C, then after determining that the program is functioning correctly, rewrite the function’s critical parts using inline assembly.

Inline assembly comes with a price, however. One problem is that the compiler cannot perform many of the optimizations it can do for a normal C function. This is not too critical if most of the function is written using inline assembly. If the function is written primarily in C with only a small part using inline assembly, the lack of full optimization may be a problem.

669

Part IV • Documenting the Differences

Another problem is that any function that relies on inline assembly is not very portable. If you plan to run your application on different computer systems, you may want to avoid inline assembly.

Linking for Performance

Some linker options affect the application’s performance. These options can create problems if you do not understand what they do.

One important factor in creating an efficient executable program is to be sure that the linker is not including any debugging information (such as line-number tables) in the executable program. We often remember to compile the final versions of our programs with the correct compiler options, then forget to change the link options.

Many link programs have options that pack the executable program. These options are categorized as follows:

Packing redundant bytes. Using a simple compression technique, the link program can pack multiple occurrences of bytes with the same value. Most linkers pack only bytes that are zero (because multiple bytes of a nonzero value are rare). When the program is loaded, the loader expands these bytes to their original count. This process reduces the size of the executable file and may shorten the load time.

Packing CODE segments. When a program is created on the PC using the Large or Medium compiler option, each source file has its own code segment. Often, several of these segments can be combined, and then calls to functions in the combined segment can be converted to NEAR calls (which are faster than FAR calls).

Packing DATA segments. When a program is created on the PC using the Large or Compact compiler option, each source file has its own data segment (assuming that the segment has sufficient data). Often, several of these segments can be combined into one.

One way to make an application more efficient is to write it as an overlay program. The main advantage is that RAM does not have to be permanently allocated for infrequently called functions. This leaves more memory for data storage (perhaps eliminating the creation of temporary work files). A disadvantage of overlays is that

670

Debugging and Efficiency

C C C

 

16C

 

C C C

 

C C

many linkers require you to determine which functions are part of which overlays. In addition, overhead is incurred when functions not currently in memory must be loaded.

Pascal and cdecl Calling Conventions

Calling conventions are the rules on how parameters are passed to a function being called, and whether the caller or the function being called is responsible for removing the parameters from the stack when the function is finished.

For many functions, using the Pascal calling method is slightly more efficient than the native C calling conventions. The degree of performance improvement depends on the number of parameters and how often the function is called. Note that the Pascal calling conventions cannot be used with a function that has a variable number of arguments.

Precompiled Headers

Using a precompiled header increases the performance of the compiler when it is compiling the program but does not affect the performance of the application while it is executing. If you are spending too much time compiling your programs, look into the benefits of precompiled headers. Borland compilers and Microsoft compilers support precompiled headers.

Using 80286/80386/80486 Instruction Sets

The use of the 80286 (or 80386 or 80486) instruction set is an overlooked but important way to improve program performance. An 80486 executes all instructions that an 80286 executes, but the reverse is not true. After a program is compiled using a specific CPU’s instruction set, it will not run on a CPU that is less than the target CPU.

Windows supports only the 80286 instruction set and above, so you should always compile Windows applications using the 80286 options.

671

Part IV • Documenting the Differences

Most of the power of the 80386/80486 cannot be utilized unless your application supports the CPU’s 32-bit mode. Because this requires a 32-bit compiler, you must plan ahead for 32-bit programs, ensuring that you have access to the necessary compiler.

Using a Source Profiler

One way to make an application more efficient is to determine which functions take the most time, then optimize them. Guesswork will not work—you cannot look at a function and determine that it is using a lot of CPU resources. You must use a source code profiler to determine where the most CPU time is being spent in your program.

Most source code profilers work by setting a very fast clock interrupt. Each time an interrupt occurs, the profiler records the name and address of the function that is executing. A second program then correlates the function and address information, and creates a source file/function table that shows where most of the time was used.

Using Intrinsic Functions

When your application calls a library function (such as strlen()), overhead is incurred: the function’s arguments are placed on the stack, registers are saved, and the function is called. When the function returns, the arguments must be removed from the stack and the registers must be restored.

Many modern C compilers enable you to substitute inline code for common C library functions. This eliminates much of the overhead for a function call, but at the expense of having more than one copy (usually many more copies) of the code that performs the function.

When you use an intrinsic function in a loop and the function is called many times, you can boost the loop’s performance substantially. (See Table 16.3 for a list of which functions are available as intrinsic functions.)

672

Debugging and Efficiency

C C C

 

16C

 

C C C

 

C C

Using Memory Models

When programming for a computer that uses segmented memory architecture, such as the PC, you can choose which memory model the compiler uses. For small programs, any memory model usually works. The issue is to use the most efficient memory model for the task at hand. Each memory model has both benefits and drawbacks, as shown in Table 16.4.

Table 16.4. Memory models.

Model

Description

Attributes

Tiny

One segment for both

Fast and small, usable

data and code

only with .COM files.

 

The total size of the data

 

and the code cannot

 

exceed 64K.

Small

Medium

Compact

One segment for data, and one for code

One segment for data,

and separate code segments for each source module

Separate data segments for each module, and one segment for code

Fast and small, usable only with .EXE files. Neither the code nor the data can exceed 64K each.

Calls are slower, but data can be accessed quickly because it is always in the default data segment. The code may be as large as necessary (to the limits imposed by the system RAM), but the data cannot exceed 64K.

Calls are faster, but data accesses are

generally performed using FAR pointers, which slow data access.The program’s

continues

673

Part IV • Documenting the Differences

Table 16.4. continued

Model

Description

Attributes

Large

Huge

Separate data segments for each module, and separate code segments for each source module

Separate data segments for each module, and separate code segments for each source module

code cannot exceed 64K. The data is limited only by the amount of available RAM. Individual data objects cannot exceed 64K in size.

Calls are slow and data accesses are generally performed using FAR pointers, which slow data access. The program’s code and data are limited only by the

amount of available RAM. Individual data objects cannot exceed 64K in size.

Calls are slow and data accesses are generally performed using FAR pointers, which slow data access. The program’s code and data are limited only by the

amount of available RAM. Individual data objects can be larger than 64K.

Most programs can be written using the Small model. Programs with a large amount of code or data often need the Large (or Huge) model.

674

Debugging and Efficiency

C C C

 

16C

 

C C C

 

C C

Summary

In this chapter, you learned how to debug a program and improve a program’s performance.

There are a number of common bugs. Checking your code for these common bugs first can save debugging time.

The C assert() macro assists in debugging. It enables you to test a condition. If the condition fails, the program ends with a diagnostic message.

If a debugger is unavailable, writing a debug output function can save time in determining the flow of a program.

Most compilers come with a debugger.

Most debuggers require a substantial setup and learning curve.

A debugger is the only effective way to find some problems.

When programming for efficiency, you can program in the 80386/80486 32-bit mode if your compiler supports such a mode.

Using the compiler’s optimization can make your program run faster. Generally, you should develop the application with optimization turned off. When the application is finished, turn on optimization and retest the application.

Sometimes, the compiler’s optimization will cause a program to fail that worked with optimization turned off.

Inline assembly can be useful for creating fast functions without resorting to full assembly code.

A source profiler can determine which parts of your program are executed most frequently. By combining these routines, you get the maximum benefit from hand optimization.

Intrinsic functions allow the function’s code to be placed inline rather than being called. Although intrinsic functions increase the program’s size, they can greatly enhance the program’s efficiency.

The memory model you choose for a program affects its performance. Choose the Small model for small, simple programs.

675