Data Management: Sorts, Lists, and Indexes |
C C C |
|
10C |
|
C C C |
|
C C |
With a linked list, it is easy to retrieve records in sorted order. Using multiple links, a program can retrieve records based on different criteria. A double linked list enables you to access the list in either forward order or backward order.
Linked lists do create a problem, however. The only way to access a specific member in the list is with a linear search. Because the list’s members may be located randomly in memory, the only access you usually have to the list’s members is to follow the chain of links. Therefore, finding a member in the middle of the list is not more efficient than finding a specific member in an unsorted list. Your program will know when the key field is greater than the member being tested, without searching the entire list. But you typically will be looking at approximately n/2 members (where n is the number of members in the list) to retrieve a specific member.
Indexing
Using an index to access data in a file is one way of gaining fast access to a large file of large data objects. Rarely can all of a user’s data fit in memory at one time, so you must use a file as temporary or permanent storage.
With an index, the program’s data is separated into two objects: the data and the index. The data is usually not arranged in a specific order; new records are added to the end of the block or the file. The index (there may be more than one index) is always sorted. It contains the minimum necessary to allow the program to access the data, typically a key value that the index is sorted on and a pointer to the corresponding data.
Figure 10.9 shows an indexed data file system that consists of a data file and two index files used to access the data. The records in this example are simple; many applications have thousands of bytes per record.
Each record in the data file is 183 bytes long. Each record contains a name, a company name, and an address that consists of the street, city, state, and ZIP code. The two index files are an index for the name field and an index for ZIP codes. Note that you cannot predict the order of records that do not have unique ZIP codes. In this example, either record with the ZIP code of 03468 could have been first.
The main factors for choosing an indexed data access system follow:
The main data file does not need to be sorted.
There can be more than one index, resulting in fast access to a given record.
Indexes can be created “on the fly,” as the need arises.