An overview of the system

Files and filesystems

At the most fundamental level, a file in the UNIX system is a collection of zero or more bytes of information, which can be referred to by name. Files are used to impose a partitioning strategy on the information stored by the computer; in general, the contents of a file relate to a single program, a single database, or a single document, which can then be referred to by name. It is possible to impose an arbitrary structure on the contents of a file, but the file is essentially the main, atomic unit of information in the filesystem.

Files are themselves partitioned by directories. A directory is simply a file that contains a list of other files, and some other information that indicates where exactly the files are stored.

At the physical level, a hard disk drive bears little resemblance to a UNIX system file hierarchy. Data is stored as magnetic field patterns on the surfaces of a set of spinning platters; read/write heads (similar to those of a tape recorder) move across the platters and read the magnetic field patterns, converting them into a stream of bytes which is then fed to the system. The surfaces of the disks are divided into concentric tracks and radial sectors within each track; the same track on each platter of a multi-platter hard disk is called a cylinder.

(The same terms are applied to floppy disks, although they have only two sides. Tapes, on the other hand, are divided up into blocks, running lengthwise along the tape, which correspond to sectors on a disk.)

The smallest unit of data that a hard disk can read is a single sector of a given track. Each sector stores a fixed number of bytes, usually in the range 512 to 8192. Therefore, at some stage, the UNIX system must be able to work out from a given filename on which tracks and sectors the data within a file is stored at, and retrieve that data. There is no direct mapping between the filename and the physical location of its data on a disk.

Because files can be stored on a variety of media, it is necessary for the system to provide a uniform method for referring to files, and this is the purpose of the filesystem.

The term ``filesystem'' is used in two contexts. In the first it indicates a hierarchy of directories and files on a disk, which is ``mounted'' on (connected to) another filesystem so that it appears as a subdirectory of the first filesystem. (The directory where the filesystem is mounted is called the mount point.) In the second context, it is a more abstract term; a filesystem is a system for mapping from the name of a file to the physical location of its data on a mountable medium.

In general, a filesystem consists of three components: a superblock, an inode table, and a series of uniquely numbered blocks (corresponding to the sectors on the hard disk). The superblock is the first component of a filesystem. It contains information about the type of the filesystem, its structure, and its size, including where the inode table is, and how many data blocks there are.

Inodes

The inode table starts immediately after the superblock. It contains a fixed number of inodes, which are data records identified only by number; each inode contains information about permissions, ownership, type, the number of bytes in the file, and a number of slots which point to data blocks in the filesystem. Every file on the system has an inode which is associated with it: given an inode number it is possible to retrieve all the data blocks associated with that inode, simply by looking at the data block entries in the inode's record. If the file the inode defines is very big, the inode may contain pointers to an extension block, which contains more slots identifying the components of the file. If the file is huge, the extension block may point to further extension blocks that identify its data blocks.

The first inode in the inode table corresponds to the root directory. A directory is simply a file that contains a list of filenames and their associated inodes. Thus, when you give the system the name of a file you want to access, it looks in the directory to identify its corresponding inode, then reads the inode to identify the data blocks it needs to retrieve. If the file is large, it looks in the extension blocks to find the other blocks it needs.

To locate a file in another directory, the system looks up each directory in turn, identifies the inode of the subdirectory, then looks in that subdirectory until it finds the file. If a directory file is so large that some of its data blocks are referenced indirectly through an extension block, it may take longer to retrieve the inode number of the file; therefore it is desirable to keep the number of entries in a directory file to less than 640 files (if filenames are less than 12 characters long). (Note that when you delete a file, all that happens is that you erase the inode number associated with its name in the current directory. The contents of the inode are not cleared, because the inode may be referred to by another name, or link, in another directory. The inode contains a count of the number of links to it. When the last link is destroyed, the inode is added to a list of free nodes and can be reused.)

Caching

To speed access to the filesystem, the kernel maintains a buffer cache in memory. (The speed with which the computer can read its memory is of the order of a million times as fast as the speed at which it can retrieve data from a disk.)

The cache contains the contents of the disk blocks that have been read from or written to most recently. Whenever a block is read, it is stored in the cache, because the most recently read blocks are also those which are most likely to be read or written to next. Every few minutes, the system purges the cache, writing any recently changed buffers to the disk; alternatively, if a large number of writes accumulate, filling the cache, it may force a purge.

In general the buffer cache significantly improves the performance of the UNIX system, but there is a cost. Because some of the recent writes to the filesystem are stored in memory rather than written straight out to the disk, a power failure or crash can result in the filesystem being corrupted: that is, the inodes may not contain the correct data blocks for their files, the list of free inodes may be incorrect, and the data stored in the most recently written files may also be incorrect. This is why it is vital to follow a careful shutdown procedure and not simply switch the computer off.

Because the operating system has evolved over time, it is capable of supporting a number of different types of filesystem. The main differences between them are their speed, efficiency, size of data blocks, capacity, and history; in general it is sufficient to stick to the standard vxfs. Earlier systems are provided to maintain compatibility with older software installations. The system can also support the DOS filesystem structure, which is less efficient than the standard ones. (DOS does not support multiple names for files, long filenames, or a buffer cache.)

It is not uncommon for a UNIX system to have several filesystems mounted on it at once. To understand how this works, you need to understand device files.