Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Advanced C 1992

.pdf
Скачиваний:
96
Добавлен:
17.08.2013
Размер:
4.28 Mб
Скачать

Part II • Managing Data in C

CustIndex[nRecord].Customer = ftell(DataFile); strcpy(CustIndex[nRecord].szName, Customer->szName);

After the index has been set up, the program writes the record to the database and closes the file:

fwrite(Customer, sizeof(CUSTNAME), 1, DataFile);

fclose(DataFile);

When the user requests a record, the program searches for the record using both a linear search and the bsearch() function. I used both search techniques in Listing 10.5 simply to show how they are implemented; your program should use one or the other (probably bsearch() because it is easy to implement and fast).

To use a binary search, the index must be sorted. When the user wants the names displayed, the program sorts the index list. The programmer can choose to sort the index either as names are added (which slows the process of adding names) or when the sorted index list is used. This program would have been better if it included a flag to indicate when the list was already sorted.

The following code shows how a record is retrieved and displayed:

case ‘D’: /* Display a record */ case ‘d’:

printf(“Display customer (total %d).\n”, nRecord + 1);

qsort(CustIndex, nRecord + 1, sizeof(CUSTINDEX), compare);

for (i = 0; nDebug && i <= nRecord; i++)

{/* In debug mode, display the sorted index list. */ printf(“Record %2d szName ‘%s’\n”,

i,

CustIndex[i].szName);

}

In the debug mode, the program first shows the programmer the index list. This display is useful when you want to see the results of the sort.

386

Data Management: Sorts, Lists, and Indexes

C C C

 

10C

 

C C C

 

C C

Following the display of the index list, the user is prompted to provide a name to search for:

memset(Customer, 0, sizeof(CUSTNAME)); memset(&TempCustIndex, 0, sizeof(CUSTINDEX));

printf(“Enter name”); gets(TempCustIndex.szName);

printf(“Searching with a linear search\n”);

After the user enters a name, the program does a linear search. This search starts at the first name, then searches each name in order, until either the list ends or the name is found:

nDesiredRecord = -1;

for (i = 0; i <= nRecord; i++)

{/* Linear search; could be bsearch() */ if (stricmp(TempCustIndex.szName,

CustIndex[i].szName) == 0)

{

nDesiredRecord = i; break;

}

}

If the supplied key name is found, the program opens the database file (read mode) and uses fseek() to find the correct record. After finding the record, the program reads it in and displays the information for the user. If the supplied key name is not found, the program simply gives the user a message that the name wasn’t found.

if (nDesiredRecord >= 0)

{

DataFile = fopen(szDataFile, “rb”);

if (DataFile == NULL)

{

printf(

“ERROR: Data file ‘%s’ couldn’t be opened.\n”, szDataFile);

387

Part II • Managing Data in C

exit(4);

}

fseek(DataFile, CustIndex[nDesiredRecord].Customer, SEEK_SET);

fread(Customer, sizeof(CUSTNAME), 1, DataFile);

printf(

“Name ‘%10s’ City ‘%10s’ State ‘%2s’ “ “ZIP ‘%5.5d’\n”,

Customer->szName,

Customer->szCity,

Customer->szState, Customer->nZip);

fclose(DataFile);

}

else

{

printf(“LINEAR SEARCH: Sorry, the name ‘%s’ couldn’t be found\n”, TempCustIndex.szName);

}

After the linear search is finished, the program does a binary search. This search is performed with one statement:

if ((pTempCustIndex = (PCUSTINDEX)bsearch(&TempCustIndex, CustIndex,

nRecord + 1, sizeof(CUSTINDEX), compare)) != NULL)

{

If the supplied key name is found, the program opens the database file (read mode) and use fseek() to find the correct record. After seeking to the record, the program reads it in and displays the information for the user. If the supplied key name is not found, the program displays a message that the name wasn’t found.

DataFile = fopen(szDataFile, “rb”);

if (DataFile == NULL)

{

388

Data Management: Sorts, Lists, and Indexes

printf(

“ERROR: Data file ‘%s’ couldn’t be opened.\n”, szDataFile);

exit(4);

}

C C C

C10C C

C C C

fseek(DataFile, pTempCustIndex->Customer, SEEK_SET);

fread(Customer, sizeof(CUSTNAME), 1, DataFile);

printf(

“Name ‘%10s’ City ‘%10s’ State ‘%2s’ “ “ZIP ‘%5.5d’\n”,

Customer->szName,

Customer->szCity,

Customer->szState, Customer->nZip);

fclose(DataFile);

}

else

{

printf(“BSEARCH: Sorry, the name ‘%s’ couldn’t be found\n”, TempCustIndex.szName);

}

break;

When the program ends (or when the user requests a save), the index array is saved to a file. The index array in the saved file could be re-read into the index array later when the user reuses the data file. To conserve on disk space, the program writes only the index entries that have been used, not the entire index array.

case ‘S’: /* Save all records */ case ‘s’:

printf(“Saving customer index file.\n”);

IndexFile = fopen(szIndexFile, “wb”);

389

Part II • Managing Data in C

if (IndexFile == NULL)

{/* Test for file open. If file can’t be opened, exit with message. */

printf(“ERROR: Index file ‘%s’ couldn’t be opened.\n”, szIndexFile);

}

else

{

fwrite(CustIndex,

sizeof(CUSTINDEX) * (nRecord + 1), 1,

IndexFile);

fclose(IndexFile);

nNeedSaving = FALSE;

A quick look at the compare function shows that the szName members of the index array are being compared using stricmp(). I have included a (commented out) printf() that shows how the sort and the search use the compare function.

int compare( PCUSTINDEX CustIndex1, PCUSTINDEX CustIndex2)

{

//Uncomment the following printf() to see how qsort and qsearch work.

//printf(“Comparing %s and %s\n”,

//CustIndex1->szName,

//CustIndex2->szName);

return(stricmp( CustIndex1->szName, CustIndex2->szName));

}

Indexes can reside permanently in a disk file. The index for large databases can be much too large to fit into memory. To search a disk-based index, you must write a binary search function. Typically, such a function would know—by a global variable or a passed parameter—the number of records in the index, the size of the index records, and the index file’s name or file handle.

390

 

Data Management: Sorts, Lists, and Indexes

C C C

 

 

10C

 

 

C C C

 

 

C C

 

Your disk-based bsearch function would then read the middle record. Compute

this record’s position using an fseek(). For example:

 

/*

The code assumes that more than one record is in

 

*

the index file.

 

*/

 

 

long

lFirstRecord = 0;

 

long

lLastRecord = lTotalRecords;

 

long

lCurrentRecord = ((lLastRecord - lFirstRecord) / 2);

 

long

lOffset = lLastRecord - lFirstRecord;

 

while(lOffset > 0)

{

lCurrentRecord = ((lLastRecord - lFirstRecord) / 2);

fseek(IndexFile, lCurrentRecord *

sizeof(CUSTINDEX) * (lCurrentRecord), SEEK_SET);

//Read the record into Index (not shown)

if (Key < Index) /* This compare depends on Key’s data type */

{

lLastRecord = lCurrentRecord;

}

if (Key > Index) /* This compare depends on Key’s data type */

{

lFirstRecord = lCurrentRecord;

}

if (Index == Key) /* This compare depends on Key’s data type */

{

return(lCurrentRecord);

}

lTotalRecords = lLastRecord - lFirstRecord; lOffset = lLastRecord - lFirstRecord;

}

391

Part II • Managing Data in C

/* The record was not found! */

return (-1);

This binary search function is simplified. I did not show the reading of the index file, nor are the compares accurate because they assume that Index and Key are numeric, which may not be true.

Indexing a file can greatly enhance the access to specific records, especially when a record must be accessed using more than one key (or index) value.

Fixed-field Disk Files

The best examples of fixed-field disk files are files created using a structure. Because the structure’s length is fixed and each member’s location is known, you can always determine the location of any structure and its members in the file.

I recommend reading a file written with a structure into an identical structure. After the data is placed in the structure, you can work on it using the individual structure members. A possible exception to the reading of individual records is when a large block of the file is read into a structure array, and the array is searched for the correct key or another data object.

Many of theexampleprograms write fixed-field files. For example, the INDEX.C program (Listing 10.5) creates two fixed-field files.

B-trees

None of the data management techniques in this chapter have addressed the problem of a data list that changes frequently, must be searched quickly, and is too large to constantly re-sort. Some problems with the techniques covered so far include:

A linked list presents data that appears to be sorted, but the list can be searched only with a linear search.

An indexed list is easy to search, but it must be resorted when an index value is added, deleted, or changed.

The solution is to use the B-tree technique, a different method of storing data. The B-tree technique arranges data in a structured format. Figure 10.10 shows some

392

Data Management: Sorts, Lists, and Indexes

C C C

 

10C

 

C C C

 

C C

sample data (used also in the “Linear Search Versus Binary Search” sidebar), and its organization in a B-tree.

Figure 10.10. A B-tree’s organization.

Data organization in a B-tree resembles an upside down tree. Usually, the first data object has a key that half of the remaining data keys are less than (called the left side) and the other half of the data keys are greater than (called the right side). The tree continues in the same manner for all remaining data objects.

The following terms are used when discussing B-trees:

Node

A data item in a B-tree.

Root node

The first node in a B-tree.

Left side

Data items on the left side are less than the current

 

data item.

Right side

Data items on the right side are greater than the

 

current data item.

Balance

How well the tree is organized. (Most B-trees exhibit

 

some imbalance.)

Figure 10.11 shows these terms and their relationships.

393

Part II • Managing Data in C

 

Root node

Root node’s

 

Root node’s

left side

 

right side

Node 12’s

 

Node 65 is right

 

left side

 

child of node 50

 

 

and parent of

 

 

node 76

Node 12’s right side

Figure 10.11. B-tree terms and relationships.

B-trees present some problems to the programmer, such as the following:

As records are added to the tree, it must be reorganized to ensure that each node has a balanced number of data objects on its right and left sides.

When a B-tree member is changed or deleted, the tree must be reorganized to eliminate the hole that is created. This reorganization can be complete, which rebalances the tree, or partial, which may create a dummy member to take the place of the missing member.

When sorted data objects are added to the B-tree, the tree’s balance suffers unless the tree is reorganized.

When programming a B-tree implementation from scratch, you must have the following functionality:

AddRecord()

Adds a record to the B-tree. If a record with

 

the key being added exists, you must decide

 

what action to take: add the record as a

 

duplicate record, have and increment an

 

occurrence counter, or do not add the dupli-

 

cate record.

DeleteRecord()

Deletes a record in the B-tree. The B-tree

 

must be reorganized, or a dummy record must

 

be inserted to replace the deleted record.

 

Using a dummy record usually implies that

 

there is a deleted flag field.

394

 

Data Management: Sorts, Lists, and Indexes

C C C

 

 

10C

 

 

C C C

SearchRecord()

Searches for a key value and returns the

C C

 

 

information necessary to access the record.

 

This function could return the record struc-

 

ture if desired.

 

PrintTree()

Debugging tool. This function is needed if

 

you are creating your own B-tree functions,

 

but is normally not used in a final program.

There are a number of supporting functions as well. These functions are not always present in any specific B-tree implementation.

Listing 10.6, the BTREE.C program, implements a basic B-tree structure. The program contains the following functions:

Search()

SearchAndAdd()

Finds a record in the B-tree.

Finds a record in the B-tree; if the key does not exist, the record is added.

Insert()

Inserts a record into the B-tree.

CopyItem()

Copies a node to another node.

NewItem()

Creates a new node.

TreePrint()

Prints the current tree.

DeleteItem()

Deletes a node from the current B-tree.

UnderFlow()

Used by DeleteItem() to adjust the B-tree

 

when an item has been deleted.

Delete()

Used by DeleteItem() to delete items from

 

the B-tree.

PrintHelp()

Prints a help screen.

Listing 10.6. BTREE.C.

/* BTREE.C

*This is a simple B-tree program. It should be compiled

*under ANSI C.

*[BTREE.C of JUGPDS Vol.19]

*/

continues

395