Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Программирование

Файл:

Advanced C 1992

.pdf

Скачиваний:

Добавлен:

17.08.2013

Размер:

4.28 Mб

Скачать

☆

<<< < Предыдущая 25 26 27 28 29 30 31 32 33 34 35 3637 / 8137 38 39 40 41 42 43 44 45 46 47 48 49 > Следующая >>>

Part II • Managing Data in C

are read, the program does its comparisons (taking into consideration possible end-of- file conditions), and writes the correct record. When the program reaches the end of both input files, it closes all the files and ends. It is a simple program that works quickly.

When writing a merge function, you must consider that one file may be (and usually is) shorter than the other. The merge program must be sure that the longer file’s records are written to the output.

Purging

One often needed (and hard to find) program is a purge program, which is used to delete duplicates (sometimes called de-dup) from a file. You might want to delete duplicates, for example, from a mailing list or a word list.

The PURGFILE.C program in Listing 10.3 performs two functions. Part of the program works like MERGFILE (Listing 10.2). Unlike MERGEFILE, however, PURGFILE does not write duplicates to the output file.

Listing 10.3. PURGFILE.C.

/* PURGFILE, written 1992 by Peter D. Hipson

*This program merges and purges in one step. If your

*PC has memory models, you must compile with the

*LARGE model.

*/
#include <stdlib.h>	// For standard functions
#include <stdio.h>	// Make includes first part of file
#include <string.h>	// For string functions
#include <process.h>	// For exit(), etc
#include <malloc.h>	// For malloc(), calloc(), realloc(), free()
#include <search.h>	// For qsort()...

int main(int argc, char *argv[], char *envp[]);

int compare(char **arg1, char **arg2);

#define	BIGEST_LINE	512	/*	The largest	readable line */
#define	NEED_RECORD	1	/*	A record is	needed from the file */

336

Data Management: Sorts, Lists, and Indexes

#define	END_OF_FILE	2	/*	This file	is finished	*/
#define	ALL_OK	3	/*	No record	needed, not	EOF */

/* Although these variables are defined as external, they could

*be defined inside the function or allocated dynamically,

*depending on the program’s needs and available memory.

char	szInput[BIGEST_LINE];
char	szInput1[BIGEST_LINE];
char	szInput2[BIGEST_LINE];
int main(
int	argc,
char	*argv[],
char	*envp[]
)
{
FILE	*InFile1;
FILE	*InFile2;
FILE	*OutFile;
char	szProgram[30];

C C C

C10C C

C C C

/* Strings for _splitpath(), which parses a file name */

char	szDrive[_MAX_DRIVE];
char	szDir[_MAX_DIR];
char	szFname[_MAX_FNAME];
char	szExt[_MAX_EXT];
int	i;
int	j;
int	nCompare = 0;
int	nFileOneStatus = NEED_RECORD;
int	nFileTwoStatus = NEED_RECORD;
/*	Use fprintf(stderr...) to force prompts and error messages to be

* displayed on the user’s screen regardless of whether the output

continues

337

Part II • Managing Data in C

Listing 10.3. continued

* has been redirected. */

_splitpath(argv[0], szDrive,

szDir,

szFname,

szExt);

strncpy(szProgram, szFname, sizeof(szProgram) - 1);

if (argc <= 3)

{

fprintf(stderr,

“\n”

“%s -\n” “\n”

“Peter’s PURGEFILE: Merges two sorted files, \n”

“purging all duplicate lines!\n”

“\n”

“inputfile1 and inputfile2 can be the same file,\n”

“if you want to de-dup only one file.\n”

“\n”

“syntax: \n”

“\n”

“%s inputfile1 inputfile2 outputfile \n”

“\n”,

szProgram,

szProgram);

return(16);

}

InFile1 = fopen(argv[1], “rt”);

InFile2 = fopen(argv[2], “rt”);

OutFile = fopen(argv[3], “wt”);

while (

nFileOneStatus != END_OF_FILE || nFileTwoStatus != END_OF_FILE)

338

Data Management: Sorts, Lists, and Indexes	C C C
	10C
	C C C
	C C

{

while(

nFileOneStatus == NEED_RECORD || nFileTwoStatus == NEED_RECORD)

{

switch(nFileOneStatus)

{

case NEED_RECORD: /* Read a record */

if (fgets(szInput, sizeof(szInput), InFile1) == NULL)

{

nFileOneStatus = END_OF_FILE;

}

else

{

if (strcmp(szInput, szInput1) != 0)

{

strcpy(szInput1, szInput); nFileOneStatus = ALL_OK;

}

}
break;
case ALL_OK:	/* Nothing needed */
break;

case END_OF_FILE: /* Can’t do anything */ break;

}

switch(nFileTwoStatus)

{

case NEED_RECORD: /* Read a record */

if (fgets(szInput, sizeof(szInput), InFile2) == NULL)

{

nFileTwoStatus = END_OF_FILE;

}

else

{

if (strcmp(szInput, szInput2) != 0)

{

continues

339

Part II • Managing Data in C

Listing 10.3. continued

strcpy(szInput2, szInput); nFileTwoStatus = ALL_OK;

}

break;

case ALL_OK:	/* Nothing needed */
break;

case END_OF_FILE: /* Can’t do anything */ break;

}

if (nFileOneStatus == END_OF_FILE)

{

if (nFileTwoStatus != END_OF_FILE)

{

fputs(szInput2, OutFile); nFileTwoStatus = NEED_RECORD;

}

else

{

if (nFileTwoStatus == END_OF_FILE)

{

if (nFileOneStatus != END_OF_FILE)

{

fputs(szInput1, OutFile); nFileOneStatus = NEED_RECORD;

}

else

{

nCompare = strcmp(szInput1, szInput2); if (nCompare < 0)

{/* File one is written */ fputs(szInput1, OutFile); nFileOneStatus = NEED_RECORD;

340

Data Management: Sorts, Lists, and Indexes	C C C
	10C
	C C C
	C C

}

else

{

if (nCompare > 0)

{/* File two is written */ fputs(szInput2, OutFile); nFileTwoStatus = NEED_RECORD;

}

else

{/* They are the same; write one and discard the other. */

fputs(szInput1, OutFile); nFileOneStatus = NEED_RECORD; nFileTwoStatus = NEED_RECORD;

}

fclose(InFile1);

fclose(InFile2);

fclose(OutFile);

return (0);

}

Purging duplicate records from a single file is not difficult. First the program reads a line. Then the program discards the line if it is the same as the previous line, or saves the line if it is different from the previous line. PURGFILE performs a merge and a purge at the same time, however, making the program a bit more complex.

To use PURGFILE to purge a single file, you simply specify the same name twice or specify NUL: as the second filename. (A second filename must be specified to provide the output filename.)

The flowchart in Figure 10.2 shows how the PURGFILE program works. The program does not use advanced techniques, so this section looks only at the flowchart, rather than each line of code.

341

Part II • Managing Data in C

Figure 10.2. The flowchart for PURGFILE.C.

342

Data Management: Sorts, Lists, and Indexes	C C C
	10C
	C C C
	C C

As you can see in Figure 10.2, the program begins by opening the two input files and the output file. If there are no errors in the file-open stage, the program reads a record from each file (assuming that the program should read a record and that the program has not reached the end of the file).

After the records are read, the program makes its comparisons (taking into consideration possible end-of-file conditions), then writes the correct record. When the program has the same record from both files, it discards the second file’s record, sets the flag indicating that it needs a new record from the second file, and saves the first file’s record.

When the program reaches the end of both input files, it closes all the files and ends. It is a simple program that works quickly.

When you write a purge function, remember that a record might be repeated many times. When your program finds a duplicate and therefore reads a new record, it still must test to be sure that it has read a unique record. The program might be reading a third duplicate, for example, that must also be discarded.

Sorting, Merging, and Purging All in One

Usually, a single utility offers sort, merge, and purge functions. This type of utility will have one or two input filenames, sort the files, purge the duplicates, and provide a single output file.

A variation of a sort program is a sort that works on a file of any size. The process to create the ultimate sort follows:

1.Read the file, stopping at the end of the file or when there is no more free memory.

2.Sort this part of the file. Write the result of the sort to a temporary work file.

3.If the program has reached the end of the file and there are no more records to read in, the program renames step 2’s work file to the output file’s name and ends the program.

4.Again read the file, stopping when there is no more free memory or when the end of the file is reached.

343

Part II • Managing Data in C

5.Sort this part of the file. Write the result of the sort to a second temporary work file.

6.Merge the file created in step 2 with the file from step 5. Delete both of the files created by steps 2 and 5, and rename this new file using the name from step 2.

7.Go to step 3.

Linked Lists

A linked list is a group of data objects in which each object has a pointer to the next object in the group. Everything that you do with linked lists can be performed in memory or as part of a disk file.

Sometimes, sorting the data externally to the program (using the DOS SORT program) is not enough. When a user is entering data, it is never acceptable to stop the program, exit the program, run a sort, create a sorted file, then start the program again.

We have become accustomed to having the computer do the work for us, and rightly so. A program should not require the user to do anything that the program can perform without the user’s intervention.

There are alternatives when data must be sorted. For example, when the user enters an item, the program can pause and use the qsort() function to insert the new item into the current database. If the database is large, however, the pause could be so long that you could go get lunch! Even a simple insert at the beginning of a list can be time consuming—every record in the database must be moved. The size and number of these records can be the critical factor.

Many programs must present the user’s data in a sorted format. Because speed is critical, sorting each time the data is displayed usually is unacceptable—the data must be stored in sorted order.

Many programs work to keep as much of the user’s current data as possible in memory. Searching a large quantity of data in memory should be not only quick, but instantaneous! If the data is not well organized, the search must be linear (record after record). On average, the program must look at half the records to find a matching record, assuming that the records are stored randomly.

344

Data Management: Sorts, Lists, and Indexes	C C C
	10C
	C C C
	C C

In general, a linear search of a block of data or sorting after a data item has been added or edited is too slow and therefore inadequate.

The program’s data must be organized better than the order in which it was entered. One way to organize is to use a linked list. In a linked list, you start with a pointer that points to, or identifies, the first member of the list. Each member (except the last) has a pointer that points to the next member in the list. The last member’s pointer is a NULL pointer to indicate the end of the list. Often there is a separate pointer to the last member in the list—this enables you to add to the end of the list. A single linked list is shown in Figure 10.3.

Figure 10.3. A single linked list.

When you add a new member to a linked list, the program simply follows the list until it finds the member that will precede the new member and inserts the new member at that point. When the program must display sorted data to the user, it uses the linked list pointers to find the necessary data. Because the links are already sorted, the program’s performance is fast.

Using Dynamic Memory

Often you must rely on dynamic memory allocation (memory allocated using one of the memory allocation functions) because you cannot tell how much user data will be provided by the user. When allocating memory, the program must track each block of

345

<<< < Предыдущая 25 26 27 28 29 30 31 32 33 34 35 3637 / 8137 38 39 40 41 42 43 44 45 46 47 48 49 > Следующая >>>

Соседние файлы в предмете Программирование

#
17.08.20138.26 Mб3155 Ways To Have Fun With Google (2006).pdf
#
17.08.201354.23 Кб24A Comparative Analysis of Methods of Defense against Buffer Overflow Attacks.pdf
#
17.08.201317.17 Кб30A Little Bit about Buffer Overflows.pdf
#
17.08.20138.15 Mб29Absolute BSD - The Ultimate Guide To FreeBSD (2002).pdf
#
17.08.20134.82 Mб62Advanced Bash Shell Scripting Gude.pdf
#
17.08.20134.28 Mб96Advanced C 1992.pdf
#
17.08.201321.94 Mб60Advanced Wireless Networks - 4G Technologies.pdf
#
17.08.20137.99 Mб34Agile Web Development With Rails, 1st Edition (2005).pdf
#
17.08.20136.23 Mб36Agile Web Development With Rails, 2nd Edition (2006).pdf
#
17.08.20138.36 Mб67Ajax In Action (2006).pdf
#
17.08.201315.94 Mб43Ajax Patterns And Best Practices (2006).pdf