Part II • Managing Data in C
The DeleteItem() function deletes a node. DeleteItem() first checks that a node and an item to be deleted have been passed.
int DeleteItem( int nKeyValue, PAGE *a)
{ |
|
int |
i; |
int |
k; |
int |
l; |
int |
r; |
PAGE |
*q; |
//printf(“DeleteItem()...\n”);
if (a == NULL)
{
printf(“Key is not in tree! Cannot delete this key.\n”); return(FALSE);
}
else
Remember binary searches from earlier in the chapter? The following binary search uses the same technique: halving the list you are searching, depending on the result of the comparison of a given node and the user’s key:
{/* Binary array search */
for (l = 0, r = a->nItemCount - 1; l <= r; )
{
k = (l + r) / 2;
if (nKeyValue <= a->Item[k].nKeyValue)
{
r = k - 1;
}
if (nKeyValue >= a->Item[k].nKeyValue)
{
l = k + 1;
}
}
Data Management: Sorts, Lists, and Indexes |
C C C |
|
10C |
|
C C C |
|
C C |
q = (r == -1) ? a->LeftReference : a->Item[r].RightReference;
if (l - r > 1)
{/* Found; now delete Item[k] */
if (q |
== |
NULL) |
{/* a |
is |
a terminal page */ |
--(a->nItemCount);
for (i = k; i < a->nItemCount; i++)
{
CopyItem(&a->Item[i], &a->Item[i + 1]);
}
return (a->nItemCount < HALF_PAGE_SIZE);
}
else
{
if (Delete(q, a, k))
{
return(UnderFlow(a, q, r));
}
}
}
else
{
if (DeleteItem(nKeyValue, q))
{
return(UnderFlow(a, q, r));
}
}
}
}
The UnderFlow() function readjusts the B-tree. It shifts the remaining nodes, attempting to keep the B-tree as balanced as possible:
int UnderFlow(
PAGE |
*c, |
PAGE |
*a, |
int |
s) |
{...} |
|
Part II • Managing Data in C
DeleteItem() calls the Delete() function to delete the node. This function takes care of some of the housekeeping because there can be more than one key per block.
int Delete( |
|
PAGE |
*p, |
PAGE |
*a, |
int |
k) |
{...}
The rest of this section describes ways to make some of the routines in BTREE more generic (or more specific, depending on how you look at things).
First, the key is changed to a character field (16 characters long). Then a new field called lFileIndex is added to each node; this field is an index to a file’s record. To use this new field, Search() should return it as one of its parameters. The lFileIndex field should be set when calling SearchAndAdd().
The rest of this section describes the changes you must make to the B-tree functions. Change all references to the functions for the nKeyValue variable to reflect both the new variable’s type and the change in its name.
Use a new parameter to change the SearchAndAdd() function and the Search() function so that they pass back the lFileIndex variable.
Change TreePrint() so that it prints the lFileIndex variable in the debug mode.
Change the main function so that it can handle a character-based key.
Make the following changes to the program’s source code. The Item structure should have the following variables:
struct _item
{
char |
szKeyValue[16]; |
long |
lFileIndex; |
PAGE |
*RightReference; |
int |
nCount; |
};
Change all references to nKeyValue to reflect the new data type of szKeyValue. This means changing references such as the following:
for (i = 0; i < a->nItemCount && nKeyValue > a->Item[i].nKeyValue; i++)
to
Data Management: Sorts, Lists, and Indexes
for (i = 0; i < a->nItemCount &&
strcmp(szKeyValue, a->Item[i].szKeyValue) > 0; i++)
and changing the reference
if (nKeyValue == a->Item[i].nKeyValue && i < a->nItemCount)
to
if (strcmp(szKeyValue, a->Item[i].szKeyValue) == 0 && i < a->nItemCount)
Both of these references are in the Search() function.
In the SearchAndAdd() function, change
v->nKeyValue = nKeyValue;
to
strcpy(v->szKeyValue, nKeyValue);
and change
for (i = 0; i < a->nItemCount && nKeyValue > a->Item[i].nKeyValue; i++)
to (as in Search())
for (i = 0; i < a->nItemCount &&
strcmp(szKeyValue, a->Item[i].szKeyValue) > 0; i++)
and the reference
if (nKeyValue == a->Item[i].nKeyValue && i < a->nItemCount)
to
if (strcmp(szKeyValue, a->Item[i].szKeyValue) == 0 && i < a->nItemCount)
In CopyItem(), change
DestinationItem->nKeyValue |
= SourceItem->nKeyValue; |
to the following (with an added line for the new lFilePointer structure member):
strcpy(DestinationItem->szKeyValue, SourceItem->szKeyValue); DestinationItem->lFilePointer = SourceItem->lFilePointer;
In the DeleteItem() function, change
if (nKeyValue <= a->Item[k].nKeyValue)
Part II • Managing Data in C
to
if (strcmp(szKeyValue, a->Item[k].szKeyValue) <= 0)
and change
if (nKeyValue >= a->Item[k].nKeyValue)
to
if (strcmp(szKeyValue, a->Item[k].szKeyValue) >= 0)
These changes are simple to make. Should the key be some other data type, similar changes would have to be made.
Summary
In this chapter, you learned about data management.
•Data often must be sorted. You can perform a sort externally by using files and calling DOS’s sort program or by calling another commercial sort routine.
•If data can be sorted in memory, you can use the C qsort() function.
•There is no provision for merging sorted data files under DOS, but this chapter presented a merge utility you can use.
•Most operating systems do not have a command for purging a data file of duplicates. This chapter presented a purge utility for this purpose.
•Linked lists organize data so that it may be retrieved in a specified order (usually sorted). Each member in a linked list has a pointer to the next member in the list. Usually, this pointer is the only way to find the members of a linked list.
•Linked lists can be used to group data based on a specific attribute.
•In a double linked list, the current data object is linked both with its successor and its predecessor.
•Indexed files enable a programmer to sort a much smaller set of data, which is then used to access specific data objects. Each record in the index file needs to contain only the key value and a pointer to its corresponding data object.
Data Management: Sorts, Lists, and Indexes |
C C C |
|
10C |
|
C C C |
|
C C |
•A single data file may have more than one index file, each one indexing a different data file field.
•A B-tree organizes data so that specific data items are accessed easily.
•B-tree programs work with in-memory and file-based data structures. For acceptable performance, however, the tree must be in memory.
Part II • Managing Data in C
Cand OtherCLanguages CC C C
C11C C
C11C CC C C
C C C
C and Other Languages
Many discussions in previous chapters did not pertain to a particular compiler. In this chapter, almost everything is dependent on the type of compiler. As in other chapters, I assume that you are using a Microsoft compiler. However, I have noted when something applies to a Borland or a Watcom compiler.
Nothing in the ANSI standard restricts C programs from containing functions written in another language. If your compiler is not covered, do not assume that you cannot mix languages. Most compilers include at least a provision for writing functions in assembly.
You may be asking why anyone would mix languages when C can do almost anything. Here are some good reasons:
You have a library of application-specific functions written in another computer language, such as Pascal or FORTRAN. For mathematical applications, FORTRAN still has many advantages.
You must create a function that is faster than what an optimizing compiler can produce. With assembly, you can directly control the computer’s CPU