An introduction to Linux filesystems – Opensource.com

This article is intended to be a very high-level discussion of Linux file system concepts. It is not intended to be a low-level description of how a particular file system type works, such as EXT4, nor is it intended to be a tutorial on file system commands.

Every general-purpose computer needs to store data of various types on a hard disk drive (HDD) or some equivalent, such as a USB stick. There are a couple of reasons for this. First, RAM loses its contents when the computer is turned off. There are non-volatile types of RAM that can keep data stored there after power is removed (such as flash RAM used in USB sticks and solid-state drives), but flash RAM is much more expensive than standard volatile RAM like DDR3 and other similar types.

The second reason why data should be stored on hard drives is that even standard RAM is still more expensive than disk space. Both RAM and disk costs have declined rapidly, but RAM still leads the way in terms of cost per byte. A quick cost-per-byte calculation, based on the costs of 16GB of RAM versus a 2TB hard drive, shows that RAM is about 71 times more expensive per drive than hard drive. A typical cost for RAM is around $0.0000000043743750 per byte today.

For a quick historical note to put current RAM costs into perspective, in the early days of computing, one type of memory was based on points on a CRT display. This was very expensive at around $1.00 per bit!

You

may hear people talk about file systems in a number of different and confusing ways. The word itself can have multiple meanings, and you may need to discern the correct meaning from the context of a discussion or document.

I

will attempt to define the various meanings of the word “file system” based on how I have observed it used in different circumstances. Please note that in attempting to conform to the standard “official” meanings, my intention is to define the term based on its various uses. These meanings will be explored in greater detail in the following sections of this article.

  1. The entire Linux directory structure starting at the top root directory (/).
  2. A specific type of data storage format, such as EXT3, EXT4, BTRFS, XFS, etc. Linux supports nearly 100 types of file systems, including some very old ones as well as some newer ones. Each of these types of file systems uses its own metadata structures to define how data is stored and accessed.
  3. A partition or logical volume formatted with a specific type of file system that can be mounted to a specified mount point on a Linux file system.

Basic functions of the file system

Disk storage is a necessity that brings with it some interesting and inescapable details. Obviously, a file system is designed to provide space for non-volatile data storage; That is its ultimate function. However, there are many other important functions that stem from that requirement.

All file systems must provide a namespace, that is, a naming and organization methodology. This defines how a file can be named, specifically the length of a file name and the subset of characters that can be used for file names from the total set of available characters. It also defines the logical structure of data on a disk, such as using directories to organize files rather than simply grouping them all together into a single, huge cluster of files.

After the namespace

is defined, a metadata structure is required to provide the logical basis for that namespace. This includes the data structures required to support a hierarchical directory structure; structures for determining which blocks of disk space are used and which are available; structures that allow you to maintain the names of files and directories; information about the files, such as their size and the times they were created, modified or last accessed; and the location or locations of the data belonging to the file on disk. Other metadata is used to store high-level information about disk subdivisions, such as logical volumes and partitions. This top-level metadata and the structures it represents contain the information that describes the file system stored on the drive or partition, but is separate and independent of the file system metadata.

File systems also require an application programming interface (API) that provides access to system function calls that manipulate file system objects such as files and directories. APIs provide tasks such as creating, moving, and deleting files. It also provides algorithms that determine things like where a file is placed in a file system. Such algorithms can take into account goals such as speed or minimization of disk fragmentation.

Modern file systems also provide a security model, which is a scheme for defining access rights to files and directories. The Linux file system security model helps ensure that users only have access to their own files and not those of others or the operating system itself.

The final building block is the software required to implement all these functions. Linux uses a two-part software implementation as a way to improve both system and programmer efficiency.

Figure 1: The Linux two-part file system software implementation.

The first part of this two-part implementation is the Linux virtual file system. This virtual file system provides a single set of commands for the kernel and developers to access all types of file systems. The virtual file system software calls the specific device driver required to interact with the various types of file systems. File system-specific device drivers are the second part of the implementation. The device driver interprets the standard set of file system commands to those specific to the file system type on the partition or logical volume.

As

a

generally very organized Virgo, I like things stored in smaller, organized groups rather than in a large cube. Using directories helps me be able to store and then locate the files I want when I’m looking for them. Directories are also known as folders because they can be thought of as folders in which files are saved in a kind of physical analogy of the desktop.

In Linux and many other operating systems, directories can be structured in a tree-like hierarchy. The Linux directory structure is well defined and documented in the Linux File System Hierarchy Standard (FHS). Referencing those directories when accessing them is achieved using the deeper sequentially connected directory names connected by forward slashes (/) such as /var/log and /var/spool/mail. These are called roads.

The following table provides a very brief list of standard, known, and defined top-level Linux directories and their purposes.

Directory description / (root file system) The root file system is the top-level directory of the file system. It must contain all the files needed to boot the Linux system before mounting other file systems. It must include all the executables and libraries needed to boot the remaining file systems. After the system boots, all other file systems are mounted on standard, well-defined mount points as subdirectories of the root file system. /bin The /bin directory contains user executable files. /boot Contains the static boot loader and kernel executable and configuration files needed to boot a Linux computer. /dev This directory contains the device files for each hardware device connected to the system. These are not device drivers, but are files that represent each device on the computer and make it easy to access those devices. /etc Contains the local system configuration files for the host computer. /home Storage in the home directory for user files. Each user has a subdirectory in /home. /lib Contains shared library files required to boot the system. /media A place to mount external removable media devices, such as USB drives that may be connected to the host. /mnt A temporary mount point for regular filesystems (such as non-removable media) that can be used while the administrator is repairing or working on a file system. /opt Optional files, such as vendor-supplied application programs, should be located here. /root This is not the root (/) file system. This is the home directory for the root user. /sbin System binary files. These are executables used for system administration. /tmp Temporary directory. Used by the operating system and many programs to store temporary files. Users can also store files here temporarily. Please note that files stored here may be deleted at any time without notice. /usr These are shareable read-only files, including binaries and executable libraries, man files, and other types of documentation. /var Variable data files are stored here. This can include things like log files, MySQL and other database files, web server data files, email inboxes, and much more. Table 1: The top level of the Linux file system hierarchy. The directories and their subdirectories

shown in Table 1, along with their subdirectories, that have a teal background are considered an integral part of the root file system. That is, they cannot be created as a separate file system and mounted at startup time. This is because (specifically, its contents) must be present at boot time for the system to boot properly.

The /media and /mnt directories are part of the root file system, but should never contain data. Rather, they are simply temporary mounting points.

The remaining directories, those that do not have background color in Table 1, do not need to be present during the boot sequence, but will be mounted later, during the boot sequence that prepares the host to perform useful work.

Be sure to check the official Linux Filesystem Hierarchy Standard (FHS) webpage for details on each of these directories and their many subdirectories. Wikipedia also has a good description of the ESF. This standard should be followed as closely as possible to ensure operational and functional consistency. Regardless of the file system types used on a host, this hierarchical directory structure is the same.

Unified directory structure of

Linux

In some non-Linux PC operating systems, if there are multiple physical hard drives or multiple partitions, each disk or partition is assigned a drive letter. You need to know which hard drive a file or program, such as C: or D:, is on. Then issue the drive letter as a command, D:, for example, to switch to drive D:, and then use the cd command to switch to the correct directory to locate the desired file. Each hard disk has its own separate and complete directory tree.

The Linux file system unifies all physical hard drives and partitions into a single directory structure. It all starts at the top: the root (/) directory. All other directories and their subdirectories are located under the single Linux root directory. This means that there is only a single directory tree in which to search for files and programs.

This can only work because a file system, such as /home, /tmp, /var, /opt, or /

usr can be created on separate physical hard disks, a different partition, or a different logical volume from the / (root) file system and then mounted on a mount point (directory) as part of the root file system tree. Even removable drives, such as a USB drive or USB or ESATA external hard drive, will be mounted on the root file system and become an integral part of that directory tree.

A good reason to do this is evident during an upgrade from one version of one Linux distribution to another, or switching from one distribution to another. In general, and apart from any update utility such as dnf-upgrade in Fedora, it is advisable to occasionally reformat hard drives containing the operating system during an upgrade to positively remove any cruft that has accumulated over time. If /home is part of the root file system, it will also be reformatted and then have to be restored from a backup. By having /home as a separate file system, it will be known by the installation program as a separate file system and formatting of it can be skipped. This can also be applied to /var where the database, email inboxes, website, and other variable user and system data are stored.

There are other reasons to keep certain parts of the Linux directory tree as separate file systems. For example, a long time ago, when I was not yet aware of the possible problems related to having all the required Linux directories as part of the / (root) file system, I managed to fill my home directory with a large number of very large files. Since neither the /home directory nor the /tmp directory were separate file systems, but simply subdirectories of the root file system, the entire root file system was filled up. There was no room left for the operating system to create temporary files or extend existing data files. At first, application programs began to complain that there was no space to save files, and then the operating system itself began to act very strangely. Booting in single-user mode and deleting the offending files in my home directory allowed me to start over. I then reinstalled Linux using a fairly standard multi-file system configuration and was able to prevent full system crashes from happening again.

I once had a situation where a Linux host continued to run, but prevented the user from logging in using the desktop GUI. I was able to log in using the command line interface (CLI) locally using one of the virtual consoles and remotely using SSH. The problem was that the /tmp file system had become full and some temporary files required by the GUI desktop could not be created at logon time. Because CLI login did not require files to be created in /tmp, the lack of space there did not prevent me from logging in with the CLI. In this case, the /tmp directory was a separate file system, and there was a lot of space available in the volume group of which the /tmp logical volume was a part. I simply expanded the /tmp logical volume to a size that accommodated my new understanding of the amount of temporary file space needed on that host and the problem was resolved. Note that this solution did not require a restart, and as soon as the /tmp file system was extended, the user was able to log on to the desktop.

Another situation occurred while working as a lab administrator at a large technology company. One of our developers had installed an application in the wrong location (/var). The application crashed because the /var file system was full and log files, which are stored in /var/log on that file system, could not be appended with new messages due to lack of space. However, the system remained up and running because the critical /(root) and /tmp file systems were not filled. Removing the offending application and reinstalling it in the /opt file system resolved that issue.

File System Types

Linux supports reading about 100 partition types; you can create and write to only some of these. But it is possible, and very common, to mount file systems of different types on the same root file system. In this context, we are talking about file systems in terms of the structures and metadata needed to store and manage user data on a partition of a hard disk or a logical volume. The full list of file system partition types recognized by the Linux fdisk command is provided here, so you can get an idea of the high degree of compatibility Linux has with many types of systems.

0 Empty 24 NEC DOS 81 Minix / old Lin bf Solaris 1 FAT12 27 NTFS hidden Win 82 Linux swap / So c1 DRDOS/sec (FAT- 2 XENIX root 39 Plan 9 83 Linux c4 DRDOS/sec (FAT- 3 XENIX usr 3c PartitionMagic 84 OS/2 hidden or c6 DRDOS/sec (FAT- 4 FAT16 <32M 40 Venix 80286 85 Linux extended c7 Syrinx 5 Extended 41 PPC PReP Boot 86 NTFS volume set da Non-FS data 6 FAT16 42 SFS 87 NTFS volume set db CP/M / CTOS / . 7 HPFS/NTFS/exFAT 4d QNX4.x 88 Linux plaintext of Dell Utility 8 AIX 4e QNX4.x 2nd part 8e Linux LVM df BootIt 9 AIX bootable 4f QNX4.x 3rd part 93 Amoeba e1 DOS access to OS/2 Boot Manag 50 OnTrack DM 94 Amoeba BBT e3 DOS R/O b W95 FAT32 51 OnTrack DM6 Aux 9f BSD/OS e4 SpeedStor c W95 FAT32 (LBA) 52 CP/M a0 IBM Thinkpad hi ea Rufus alignment e W95 FAT16 (LBA) 53 OnTrack DM6 Aux a5 FreeBSD eb BeOS fs f W95 Ext’d (LBA) 54 OnTrackDM6 a6 OpenBSD ee GPT 10 OPUS 55 EZ-Drive a7 NeXTSTEP ef EFI (FAT-12/16/ 11 Hidden FAT12 56 Golden Bow a8 Darwin UFS f0 Linux/PA-RISC b 12 Compaq diagnost 5c Priam Edisk a9 NetBSD f1 SpeedStor 14 Hidden FAT16 <3 61 SpeedStor ab Darwin boot f4 SpeedStor 16 Hidden FAT16 63 GNU HURD or Sys af HFS / HFS+ f2 DOS secondary 17 HPFS/NTF hidden 64 Novell Netware b7 BSDI fs fb VMware VMFS 18 AST SmartSleep 65 Novell Netware b8 BSDI swap fc VMware VMKCORE 1b Hidden W95 FAT3 70 DiskSecure Mult bb Boot Wizard hid fd Linux raid auto 1c Hidden W95 FAT3 75 PC/IX bc Acronis FAT32 L fe LANstep 1e Hidden W95 FAT1 80 Old Minix be Solaris boot ff BBT

The primary purpose of supporting the ability to read so many types of partitions is to enable compatibility and at least some interoperability with other computer system file systems. The options available when creating a new file system with Fedora are shown in the following list.

  • btrfs
  • cramfs ext2

  • ext3
  • ext4
  • fat

  • gfs2 hfsplus
  • minix
  • msdos
  • ntfs

  • reiserfs
  • vfat
  • xfs Other distributions

support creating different types of file systems. For example, CentOS 6 supports creating only those file systems highlighted in bold in the list above.

The

term “mounting” a file system in Linux refers to the early days of computing, when a tape or removable disk package would need to be physically mounted on an appropriate drive device. After being physically placed on the drive, the file system in the disk package would be logically mounted by the operating system so that the content would be available for access by the operating system, application programs, and users.

A mount point is simply a directory, like any other, that is created as part of the root file system. So, for example, the home file system is mounted in the /home directory. File systems can be mounted on mount points on other non-root file systems, but this is less common.

The Linux root file system is mounted to the root (/) directory very early in the boot sequence. Other file systems are mounted later, by Linux startup programs, either rc under SystemV or by systemd in newer versions of Linux. Mounting file systems during the startup process is managed by the /etc/fstab configuration file. An easy way to remember this is that fstab stands for “file system table” and is a list of file systems to be mounted, their designated mount points, and any options that may be required for specific file systems.

File systems are mounted to an existing directory/mount point using the mount command. In general, any directory that is used as a mount point should be empty and have no other files contained in it. Linux will not prevent users from mounting a file system over one that is already there or on a directory that contains files. If you mount a file system to an existing directory or file system, the original content is hidden and only the contents of the newly mounted file system are visible.

Conclusion

I hope some of the potential confusion surrounding the term file system has been cleared up by this article. It took me a lot of time and a very helpful mentor for me to truly understand and appreciate the complexity, elegance and functionality of the Linux file system in all its meanings.

If you have questions, please add them to the comments below and I’ll try to answer them.

Next month

Another important concept is that for Linux, everything is a file. This concept has some interesting and important practical applications for users and system administrators. The reason I mention this is that you might want to read my article “Everything is a file” before the article I’m planning for next month in the /dev directory.