Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Advanced C 1992

.pdf
Скачиваний:
96
Добавлен:
17.08.2013
Размер:
4.28 Mб
Скачать

Part II • Managing Data in C

Listing 10.1. continued

for (i = 0; i < nCurrentBlock; i++)

{

free(pBlocks[i]);

}

return (0);

}

int compare( char **arg1, char **arg2)

{

return strcmp(*(char**)arg1, *(char**)arg2);

}

Note the declaration of the compare() function:

int compare(char **arg1, char **arg2);

The function has two parameters. It receives its parameters as pointers to pointers to strings. Got that? You pass an array of pointers to strings, then qsort() passes pointers to elements in the array to compare. It compares the strings these two pointers address, and returns a value based on this comparison. The compare function returns zero if the two parameters are equal, less than zero if the first parameter is less than the second, and greater than zero if the first parameter is greater than the second.

Next are some defined identifiers:

#define MAX_CHARACTERS

32767 /*

Total maximum characters */

 

#define MAX_LINES

16383 /*

Total maximum lines */

 

#define

BIGEST_LINE

512

/*

Largest line readable from

keyboard */

#define

MAX_BLOCKS

128

/*

Allow 128 * MAX_CHARACTERS

of memory */

Memory is allocated in blocks of 32K using the MAX_CHARACTERS identifier. A maximum of 16K lines can be sorted (with a 4-byte pointer, about 16K pointers can fit in 64K). The largest line allowed is 512 bytes, and up to 128 calls can be made to the memory allocation functions (which allocates more memory than you’ll find on a PC).

326

Data Management: Sorts, Lists, and Indexes

C C C

 

10C

 

C C C

 

C C

The external variables declared (they could be declared as internal static variables) define an input buffer, szBuffer[], a generic character pointer, an array of pointers to each block of memory (so that the blocks can be freed later), and an array of character pointers (*pBuffer[]) that point to each line that will be sorted:

char

szInput[BIGEST_LINE];

char

*szBuffer;

char

*pBlocks[MAX_BLOCKS];

char

*pBuffer[MAX_LINES];

int

nCurrentBlock = 0;

int

nBufferPointer = {MAX_CHARACTERS};

int

nLine = 0;

The program receives its input from the keyboard and writes to the terminal. Therefore, if the program is used as a pipe or with I/O redirection, you must be sure that error messages do not get redirected. In Chapter 9, “Disk Files and Other I/O,” you learned that the standard stream stderr does not get redirected, but stdout does. Therefore, if the program’s output is written to stdout and messages to the user are written to stderr, you can be sure that messages to the user are not mixed with the program’s output.

To access stderr, you use the fprintf(stderr,...); statement, as shown in the following code fragment:

fprintf(stderr,

“\n”

“Peter’s SORTFILE: Sorts large files at the speed of light!\n” “\n”

syntax: \n”

sortfile <inputfile >outputfile \n”

“\n”

where: \n”

the program’s I/O is redirected\n\n”);

fprintf(stderr, “Reading input...\n”);

After providing the opening messages to the user, the program reads the input from stdin. The C function gets() does fine in this context. After reading a line, the program checks whether there is enough room in the current buffer for the string. If there is not enough room, the program allocates a new buffer and displays a message that the buffer has been allocated:

327

Part II • Managing Data in C

while (gets(szInput))

{

if ((nBufferPointer + strlen(szInput)) > MAX_CHARACTERS)

{// The line won’t fit! Allocate new memory: szBuffer = (char *)malloc(MAX_CHARACTERS);

fprintf(stderr, “ Allocating buffer (32K).\n”);

nBufferPointer = 0; pBlocks[nCurrentBlock] = szBuffer; ++nCurrentBlock;

if (szBuffer == NULL)

{

fprintf(stderr, “System sort memory exceeded--cannot \ sort.\n”);

exit(16);

}

}

Now that there is enough room in the buffer for the string, the program sets the pointer array (pBuffer[]) to the string’s eventual location, then copies the string to the buffer. The intermediate buffer is used to help prevent buffer overflow (otherwise the program would have to stop filling a block of memory at least 512 bytes before the end of the block). The call to strcpy() does not take too much overhead. The program also updates the pointer into the block of memory, in preparation for the next string.

pBuffer[nLine] = &szBuffer[nBufferPointer];

strcpy(pBuffer[nLine], szInput);

// The + 1 skips over the terminating NULL in each string. nBufferPointer += strlen(szInput) + 1;

A bit of error checking comes next, to be sure that the program does not read in too many lines:

if (++nLine >= MAX_LINES)

{ // Too many lines! End the program.

fprintf(stderr, “Too many lines--cannot sort.\n”); exit(16);

}

}

After the input file had been read, the program calls qsort() to sort the file, using the compare (described previously):

328

Data Management: Sorts, Lists, and Indexes

C C C

 

10C

 

C C C

 

C C

qsort((void *)pBuffer, (size_t)nLine, sizeof(char *), compare);

When qsort() returns, the program uses printf() to write the final sorted output:

fprintf(stderr, “Writing output...\n”);

for (i = 0; i < nLine; i++)

{

printf(“%s\n”, pBuffer[i]);

}

Because the printf() output goes to stdout, the output could be redirected to a file. Finally, the blocks of memory are freed and the program ends:

for (i = 0; i < nCurrentBlock; i++)

{

free(pBlocks[i]);

}

The compare function, which is called by qsort() in the main program, is simple. The program calls strcmp(). If you want the program to ignore case, you could call stricmp() instead. You could also create your own function to compare the strings, but C’s functions work well enough.

int compare( char **arg1, char **arg2)

{

return strcmp(*arg1, *arg2);

}

The SORTFILE program can sort files up to 500K, depending on the DOS version). You could use SORTFILE also with I/O redirection or as a filter with DOS’s pipe operator, |.

Merging

No matter how much memory you have available, eventually you will want to sort a file that is too large. You could sort the file from the disk. Another method is to break

329

Part II • Managing Data in C

the file into smaller parts that will fit in memory, sort these parts, then combine the sorted parts into a final sorted file that contains the sum of the parts. The process of breaking a file into smaller, more manageable parts, called a sort/merge, is a common technique on mainframes and minicomputers.

To keep the programs in this chapter as simple as possible (but wait until you see the BTREE program later in the chapter), I created separate merge and sort programs. Listing 10.2, MERGFILE.C, does not use stdin for its input because you must have two files to perform a merge.

Listing 10.2. MERGFILE.C.

/* MERGFILE, written 1992 by Peter D. Hipson

*This program merges two sorted files into one large

*sorted file. If your PC has memory models, you must

*compile with the LARGE model.

*/

 

#include <stdlib.h>

// For standard functions

#include <stdio.h>

// Make includes first part of file

#include <string.h>

// For string functions

#include <process.h>

// For exit(), etc.

#include <malloc.h>

// For malloc(), calloc(), realloc(), free()

#include <search.h>

// For qsort()...

int main(int argc, char *argv[], char *envp[]);

int compare(char **arg1, char **arg2);

#define BIGEST_LINE #define NEED_RECORD #define END_OF_FILE #define ALL_OK

512

/* The largest

readable line

*/

1

/* A record is

needed from the file */

2

/*

This file is finished

*/

3

/*

No record needed; not EOF

*/

/* Although these variables are defined as external,

*they could be defined inside the function or

*allocated dynamically, depending on the program’s

*needs and available memory.

*/

330

Data Management: Sorts, Lists, and Indexes

C C C

 

10C

 

C C C

 

C C

char szInput1[BIGEST_LINE]; char szInput2[BIGEST_LINE];

int main(

 

int

argc,

char

*argv[],

char

*envp[]

)

 

{

 

FILE *InFile1;

FILE *InFile2;

FILE *OutFile;

char szProgram[30];

/* Strings for _splitpath() (which parses a filename) */

char

szDrive[_MAX_DRIVE];

char

szDir[_MAX_DIR];

char

szFname[_MAX_FNAME];

char

szExt[_MAX_EXT];

int

i;

int

j;

int

nCompare = 0;

int

nFileOneStatus = NEED_RECORD;

int

nFileTwoStatus = NEED_RECORD;

/* Use fprintf(stderr...) to force prompts and error messages

*to be displayed on the user’s screen regardless of whether

*the output has been redirected.

*/

_splitpath(argv[0], szDrive,

szDir,

szFname,

szExt);

continues

331

Part II • Managing Data in C

Listing 10.2. continued

strncpy(szProgram, szFname, sizeof(szProgram) - 1);

if (argc <= 3)

{

fprintf(stderr,

“\n”

“%s -\n” “\n”

“Peter’s MERGEFILE: Merges two sorted files into one!\n” “\n”

syntax: \n”

%s inputfile1 inputfile2 outputfile \n”

“\n”,

szProgram,

szProgram);

return(16);

}

fprintf(stderr, “Reading input...\n”);

InFile1 = fopen(argv[1], “rt”);

InFile2 = fopen(argv[2], “rt”);

OutFile = fopen(argv[3], “wt”);

while (

nFileOneStatus != END_OF_FILE || nFileTwoStatus != END_OF_FILE)

{

switch(nFileOneStatus)

{

case NEED_RECORD: /* Read a record */

if (fgets(szInput1, sizeof(szInput1), InFile1) == NULL)

{

nFileOneStatus = END_OF_FILE;

}

else

{

nFileOneStatus = ALL_OK;

332

Data Management: Sorts, Lists, and Indexes

C C C

 

10C

 

C C C

}

C C

 

break;

 

case ALL_OK: /* Nothing needed */

 

break;

 

case END_OF_FILE: /* Can’t do anything */ break;

}

switch(nFileTwoStatus)

{

case NEED_RECORD: /* Read a record */

if (fgets(szInput2, sizeof(szInput2), InFile2) == NULL)

{

nFileTwoStatus = END_OF_FILE;

}

else

{

nFileTwoStatus = ALL_OK;

}

break;

case ALL_OK:

/* Nothing needed */

break;

 

case END_OF_FILE: /* Can’t do anything */ break;

}

if (nFileOneStatus == END_OF_FILE)

{

if (nFileTwoStatus != END_OF_FILE)

{

fputs(szInput2, OutFile); nFileTwoStatus = NEED_RECORD;

}

}

else

{

if (nFileTwoStatus == END_OF_FILE)

{

continues

333

Part II • Managing Data in C

Listing 10.2. continued

if (nFileOneStatus != END_OF_FILE)

{

fputs(szInput1, OutFile); nFileOneStatus = NEED_RECORD;

}

}

else

{

nCompare = strcmp(szInput1, szInput2); if (nCompare < 0)

{/* File one is written */ fputs(szInput1, OutFile); nFileOneStatus = NEED_RECORD;

}

else

{

if (nCompare > 0)

{/* File two is written */ fputs(szInput2, OutFile); nFileTwoStatus = NEED_RECORD;

}

else

{/* They are the same; write both */ fputs(szInput1, OutFile); fputs(szInput2, OutFile); nFileOneStatus = NEED_RECORD; nFileTwoStatus = NEED_RECORD;

}

}

}

}

}

fclose(InFile1);

fclose(InFile2);

fclose(OutFile);

return (0);

}

334

Data Management: Sorts, Lists, and Indexes

C C C

 

10C

 

C C C

 

C C

Merging files is a simple process. Because this program does not use advanced techniques, I will dispense with the line-by-line analysis of the program’s code and refer instead to the program’s flowchart, shown in Figure 10.1.

Figure 10.1. The flowchart for MERGFILE.C.

First, the program opens the two input files and the output file. If errors do not occur in this stage, the program reads a record from both input files. After the records

335