Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Рязанский государственный радиотехнический университет

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Silberschatz A., Galvin P. B., Gagne G. - Operating System Concepts, 9th Edition - 2012.pdf

Скачиваний:

408

Добавлен:

21.03.2016

Размер:

6.5 Mб

Скачать

☆

<<< < Предыдущая 108 109 110 111 112 113 114 115 116 117 118 119120 / 134120 121 122 123 124 125 126 127 128 129 130 131 132 > Следующая >>>

18.7 File Systems

809

space. However, most programs also need to run functions from the system libraries, and these library functions must also be loaded. In the simplest case, the necessary library functions are embedded directly in the program’s executable binary ﬁle. Such a program is statically linked to its libraries, and statically linked executables can commence running as soon as they are loaded.

The main disadvantage of static linking is that every program generated must contain copies of exactly the same common system library functions. It is much more efﬁcient, in terms of both physical memory and disk-space usage, to load the system libraries into memory only once. Dynamic linking allows that to happen.

Linux implements dynamic linking in user mode through a special linker library. Every dynamically linked program contains a small, statically linked function that is called when the program starts. This static function just maps the link library into memory and runs the code that the function contains. The link library determines the dynamic libraries required by the program and the names of the variables and functions needed from those libraries by reading the information contained in sections of the ELF binary. It then maps the libraries into the middle of virtual memory and resolves the references to the symbols contained in those libraries. It does not matter exactly where in memory these shared libraries are mapped: they are compiled into position-independent code (PIC), which can run at any address in memory.

18.7 File Systems

Linux retains UNIX’s standard ﬁle-system model. In UNIX, a ﬁle does not have to be an object stored on disk or fetched over a network from a remote ﬁle server. Rather, UNIX ﬁles can be anything capable of handling the input or output of a stream of data. Device drivers can appear as ﬁles, and interprocesscommunication channels or network connections also look like ﬁles to the user.

The Linux kernel handles all these types of ﬁles by hiding the implementation details of any single ﬁle type behind a layer of software, the virtual ﬁle system (VFS). Here, we ﬁrst cover the virtual ﬁle system and then discuss the standard Linux ﬁle system —ext3.

18.7.1The Virtual File System

The Linux VFS is designed around object-oriented principles. It has two components: a set of deﬁnitions that specify what ﬁle-system objects are allowed to look like and a layer of software to manipulate the objects. The VFS deﬁnes four main object types:

•An inode object represents an individual ﬁle.

•A ﬁle object represents an open ﬁle.

•A superblock object represents an entire ﬁle system.

•A dentry object represents an individual directory entry.

810 Chapter 18 The Linux System

For each of these four object types, the VFS deﬁnes a set of operations. Every object of one of these types contains a pointer to a function table. The function table lists the addresses of the actual functions that implement the deﬁned operations for that object. For example, an abbreviated API for some of the ﬁle object’s operations includes:

• int open(. . .) — Open a ﬁle.

• ssize t read(. . .) — Read from a ﬁle.

• ssize t write(. . .) — Write to a ﬁle.

• int mmap(. . .) — Memory-map a ﬁle.

The complete deﬁnition of the ﬁle object is speciﬁed in the struct file operations, which is located in the ﬁle /usr/include/linux/fs.h. An implementation of the ﬁle object (for a speciﬁc ﬁle type) is required to implement each function speciﬁed in the deﬁnition of the ﬁle object.

The VFS software layer can perform an operation on one of the ﬁle-system objects by calling the appropriate function from the object’s function table, without having to know in advance exactly what kind of object it is dealing with. The VFS does not know, or care, whether an inode represents a networked ﬁle, a disk ﬁle, a network socket, or a directory ﬁle. The appropriate function for that ﬁle’s read() operation will always be at the same place in its function table, and the VFS software layer will call that function without caring how the data are actually read.

The inode and ﬁle objects are the mechanisms used to access ﬁles. An inode object is a data structure containing pointers to the disk blocks that contain the actual ﬁle contents, and a ﬁle object represents a point of access to the data in an open ﬁle. A process cannot access an inode’s contents without ﬁrst obtaining a ﬁle object pointing to the inode. The ﬁle object keeps track of where in the ﬁle the process is currently reading or writing, to keep track of sequential ﬁle I/O. It also remembers the permissions (for example, read or write) requested when the ﬁle was opened and tracks the process’s activity if necessary to perform adaptive read-ahead, fetching ﬁle data into memory before the process requests the data, to improve performance.

File objects typically belong to a single process, but inode objects do not. There is one ﬁle object for every instance of an open ﬁle, but always only a single inode object. Even when a ﬁle is no longer in use by any process, its inode object may still be cached by the VFS to improve performance if the ﬁle is used again in the near future. All cached ﬁle data are linked onto a list in the ﬁle’s inode object. The inode also maintains standard information about each ﬁle, such as the owner, size, and time most recently modiﬁed.

Directory ﬁles are dealt with slightly differently from other ﬁles. The UNIX programming interface deﬁnes a number of operations on directories, such as creating, deleting, and renaming a ﬁle in a directory. The system calls for these directory operations do not require that the user open the ﬁles concerned, unlike the case for reading or writing data. The VFS therefore deﬁnes these directory operations in the inode object, rather than in the ﬁle object.

The superblock object represents a connected set of ﬁles that form a self-contained ﬁle system. The operating-system kernel maintains a single

18.7 File Systems

811

superblock object for each disk device mounted as a ﬁle system and for each networked ﬁle system currently connected. The main responsibility of the superblock object is to provide access to inodes. The VFS identiﬁes every inode by a unique ﬁle-system/inode number pair, and it ﬁnds the inode corresponding to a particular inode number by asking the superblock object to return the inode with that number.

Finally, a dentry object represents a directory entry, which may include the name of a directory in the path name of a ﬁle (such as /usr) or the actual ﬁle (such as stdio.h). For example, the ﬁle /usr/include/stdio.h contains the directory entries (1) /, (2) usr, (3) include, and (4) stdio.h. Each of these values is represented by a separate dentry object.

As an example of how dentry objects are used, consider the situation in which a process wishes to open the ﬁle with the pathname /usr/include/stdio.h using an editor. Because Linux treats directory names as ﬁles, translating this path requires ﬁrst obtaining the inode for the root — /. The operating system must then read through this ﬁle to obtain the inode for the ﬁle include. It must continue this process until it obtains the inode for the ﬁle stdio.h. Because path-name translation can be a time-consuming task, Linux maintains a cache of dentry objects, which is consulted during path-name translation. Obtaining the inode from the dentry cache is considerably faster than having to read the on-disk ﬁle.

18.7.2The Linux ext3 File System

The standard on-disk ﬁle system used by Linux is called ext3, for historical reasons. Linux was originally programmed with a Minix-compatible ﬁle system, to ease exchanging data with the Minix development system, but that ﬁle system was severely restricted by 14-character ﬁle-name limits and a maximum ﬁle-system size of 64 MB. The Minix ﬁle system was superseded by a new ﬁle system, which was christened the extended ﬁle system (extfs). A later redesign to improve performance and scalability and to add a few missing features led to the second extended ﬁle system (ext2). Further development added journaling capabilities, and the system was renamed the third extended ﬁle system (ext3). Linux kernel developers are working on augmenting ext3 with modern ﬁle-system features such as extents. This new ﬁle system is called the fourth extended ﬁle system (ext4). The rest of this section discusses ext3, however, since it remains the most-deployed Linux ﬁle system. Most of the discussion applies equally to ext4.

Linux’s ext3 has much in common with the BSD Fast File System (FFS) (Section A.7.7). It uses a similar mechanism for locating the data blocks belonging to a speciﬁc ﬁle, storing data-block pointers in indirect blocks throughout the ﬁle system with up to three levels of indirection. As in FFS, directory ﬁles are stored on disk just like normal ﬁles, although their contents are interpreted differently. Each block in a directory ﬁle consists of a linked list of entries. In turn, each entry contains the length of the entry, the name of a ﬁle, and the inode number of the inode to which that entry refers.

The main differences between ext3 and FFS lie in their disk-allocation policies. In FFS, the disk is allocated to ﬁles in blocks of 8 KB. These blocks are subdivided into fragments of 1 KB for storage of small ﬁles or partially ﬁlled blocks at the ends of ﬁles. In contrast, ext3 does not use fragments at all

812Chapter 18 The Linux System

but performs all its allocations in smaller units. The default block size on ext3 varies as a function of the total size of the ﬁle system. Supported block sizes are 1, 2, 4, and 8 KB.

To maintain high performance, the operating system must try to perform

I/O operations in large chunks whenever possible by clustering physically adjacent I/O requests. Clustering reduces the per-request overhead incurred by device drivers, disks, and disk-controller hardware. A block-sized I/O request size is too small to maintain good performance, so ext3 uses allocation policies designed to place logically adjacent blocks of a ﬁle into physically adjacent blocks on disk, so that it can submit an I/O request for several disk blocks as a single operation.

The ext3 allocation policy works as follows: As in FFS, an ext3 ﬁle system is partitioned into multiple segments. In ext3, these are called block groups. FFS uses the similar concept of cylinder groups, where each group corresponds to a single cylinder of a physical disk. (Note that modern disk-drive technology packs sectors onto the disk at different densities, and thus with different cylinder sizes, depending on how far the disk head is from the center of the disk. Therefore, ﬁxed-sized cylinder groups do not necessarily correspond to the disk’s geometry.)

When allocating a ﬁle, ext3 must ﬁrst select the block group for that ﬁle. For data blocks, it attempts to allocate the ﬁle to the block group to which the ﬁle’s inode has been allocated. For inode allocations, it selects the block group in which the ﬁle’s parent directory resides for nondirectory ﬁles. Directory ﬁles are not kept together but rather are dispersed throughout the available block groups. These policies are designed not only to keep related information within the same block group but also to spread out the disk load among the disk’s block groups to reduce the fragmentation of any one area of the disk.

Within a block group, ext3 tries to keep allocations physically contiguous if possible, reducing fragmentation if it can. It maintains a bitmap of all free blocks in a block group. When allocating the ﬁrst blocks for a new ﬁle, it starts searching for a free block from the beginning of the block group. When extending a ﬁle, it continues the search from the block most recently allocated to the ﬁle. The search is performed in two stages. First, ext3 searches for an entire free byte in the bitmap; if it fails to ﬁnd one, it looks for any free bit. The search for free bytes aims to allocate disk space in chunks of at least eight blocks where possible.

Once a free block has been identiﬁed, the search is extended backward until an allocated block is encountered. When a free byte is found in the bitmap, this backward extension prevents ext3 from leaving a hole between the most recently allocated block in the previous nonzero byte and the zero byte found. Once the next block to be allocated has been found by either bit or byte search, ext3 extends the allocation forward for up to eight blocks and preallocates these extra blocks to the ﬁle. This preallocation helps to reduce fragmentation during interleaved writes to separate ﬁles and also reduces the CPU cost of disk allocation by allocating multiple blocks simultaneously. The preallocated blocks are returned to the free-space bitmap when the ﬁle is closed.

Figure 18.7 illustrates the allocation policies. Each row represents a sequence of set and unset bits in an allocation bitmap, indicating used and free blocks on disk. In the ﬁrst case, if we can ﬁnd any free blocks sufﬁciently near the start of the search, then we allocate them no matter how fragmented

18.7 File Systems

813

allocating scattered free blocks

allocating continuous free blocks

block in use

block selected

bit boundary

by allocator

free block

bitmap search

byte boundary

Figure 18.7 ext3 block-allocation policies.

they may be. The fragmentation is partially compensated for by the fact that the blocks are close together and can probably all be read without any disk seeks. Furthermore, allocating them all to one ﬁle is better in the long run than allocating isolated blocks to separate ﬁles once large free areas become scarce on disk. In the second case, we have not immediately found a free block close by, so we search forward for an entire free byte in the bitmap. If we allocated that byte as a whole, we would end up creating a fragmented area of free space between it and the allocation preceding it. Thus, before allocating, we back up to make this allocation ﬂush with the allocation preceding it, and then we allocate forward to satisfy the default allocation of eight blocks.

18.7.3Journaling

The ext3 ﬁle system supports a popular feature called journaling, whereby modiﬁcations to the ﬁle system are written sequentially to a journal. A set of operations that performs a speciﬁc task is a transaction. Once a transaction is written to the journal, it is considered to be committed. Meanwhile, the journal entries relating to the transaction are replayed across the actual ﬁlesystem structures. As the changes are made, a pointer is updated to indicate which actions have completed and which are still incomplete. When an entire committed transaction is completed, it is removed from the journal. The journal, which is actually a circular buffer, may be in a separate section of the ﬁle system, or it may even be on a separate disk spindle. It is more efﬁcient, but more complex, to have it under separate read –write heads, thereby decreasing head contention and seek times.

If the system crashes, some transactions may remain in the journal. Those transactions were never completed to the ﬁle system even though they were committed by the operating system, so they must be completed once the system

814Chapter 18 The Linux System

recovers. The transactions can be executed from the pointer until the work is complete, and the ﬁle-system structures remain consistent. The only problem occurs when a transaction has been aborted —that is, it was not committed before the system crashed. Any changes from those transactions that were applied to the ﬁle system must be undone, again preserving the consistency of the ﬁle system. This recovery is all that is needed after a crash, eliminating all problems with consistency checking.

Journaling ﬁle systems may perform some operations faster than nonjournaling systems, as updates proceed much faster when they are applied to the in-memory journal rather than directly to the on-disk data structures. The reason for this improvement is found in the performance advantage of sequential I/O over random I/O. Costly synchronous random writes to the ﬁle system are turned into much less costly synchronous sequential writes to the ﬁle system’s journal. Those changes, in turn, are replayed asynchronously via random writes to the appropriate structures. The overall result is a signiﬁcant gain in performance of ﬁle-system metadata-oriented operations, such as ﬁle creation and deletion. Due to this performance improvement, ext3 can be conﬁgured to journal only metadata and not ﬁle data.

18.7.4The Linux Process File System

The ﬂexibility of the Linux VFS enables us to implement a ﬁle system that does not store data persistently at all but rather provides an interface to some other functionality. The Linux process ﬁle system, known as the /proc ﬁle system, is an example of a ﬁle system whose contents are not actually stored anywhere but are computed on demand according to user ﬁle I/O requests.

A /proc ﬁle system is not unique to Linux. SVR4 UNIX introduced a /proc ﬁle system as an efﬁcient interface to the kernel’s process debugging support. Each subdirectory of the ﬁle system corresponded not to a directory on any disk but rather to an active process on the current system. A listing of the ﬁle system reveals one directory per process, with the directory name being the ASCII decimal representation of the process’s unique process identiﬁer (PID).

Linux implements such a /proc ﬁle system but extends it greatly by adding a number of extra directories and text ﬁles under the ﬁle system’s root directory. These new entries correspond to various statistics about the kernel and the associated loaded drivers. The /proc ﬁle system provides a way for programs to access this information as plain text ﬁles; the standard UNIX user environment provides powerful tools to process such ﬁles. For example, in the past, the traditional UNIX ps command for listing the states of all running processes has been implemented as a privileged process that reads the process state directly from the kernel’s virtual memory. Under Linux, this command is implemented as an entirely unprivileged program that simply parses and formats the information from /proc.

The /proc ﬁle system must implement two things: a directory structure and the ﬁle contents within. Because a UNIX ﬁle system is deﬁned as a set of ﬁle and directory inodes identiﬁed by their inode numbers, the /proc ﬁle system must deﬁne a unique and persistent inode number for each directory and the associated ﬁles. Once such a mapping exists, the ﬁle system can use this inode number to identify just what operation is required when a user tries to read from a particular ﬁle inode or to perform a lookup in a particular directory

<<< < Предыдущая 108 109 110 111 112 113 114 115 116 117 118 119120 / 134120 121 122 123 124 125 126 127 128 129 130 131 132 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
21.03.2016182.38 Кб85shpory-1.docx
#
25.09.2019530.68 Кб3shpory_1-6.docx
#
25.09.2019368.19 Кб1shpory_19-22.docx
#
24.09.2019352.32 Кб5Shpory_ET.docx
#
30.07.2019273.92 Кб10Shpory_na_OPAIS.doc
#
21.03.20166.5 Mб408Silberschatz A., Galvin P. B., Gagne G. - Operating System Concepts, 9th Edition - 2012.pdf
#
04.11.20181.07 Mб38Simulink22.doc
#
14.11.2018687.1 Кб4SKADA.doc
#
21.03.2016389.63 Кб50Spravochnik_kuratora.doc
#
16.11.201853.3 Кб2sql_part_2.docx
#
16.11.2018234.56 Кб20sql_part_2.docx