Linux – How does /proc/* work

linuxproc

There are many files in /proc, like /proc/cpuinfo, /proc/meminfo, /proc/devices and so on, which, when opened, return system information.

These files do not seem to have any existence in reality, as running file on them only tells that they're empty.

$ file /proc/cpuinfo
/proc/cpuinfo: empty

How do these files work exactly?

Best Answer

It's actually pretty simple, at least if you don't need the implementation details.

First off, on Linux all file systems (ext2, ext3, btrfs, reiserfs, tmpfs, zfs, ...) are implemented in the kernel. Some may offload work to userland code through FUSE, and some come only in the form of a kernel module (native ZFS is a notable example of the latter due to licensing restrictions), but either way there remains a kernel component. This is an important basic.

When a program wants to read from a file, it will issue various system library calls which ultimately end up in the kernel in the form of an open(), read(), close() sequence (possibly with seek() thrown in for good measure). The kernel takes the provided path and filename, and through the file system and device I/O layer translates these to physical read requests (and in many cases also write requests -- think for example atime updates) to some underlying storage.

However, it doesn't have to translate those requests specifically to physical, persistent storage. The kernel's contract is that issuing that particular set of system calls will provide the contents of the file in question. Where exactly in our physical realm the "file" exists is secondary to this.

On /proc is usually mounted what is known as procfs. That is a special file system type, but since it is a file system, it really is no different from e.g. an ext3 file system mounted somewhere. So the request gets passed to the procfs file system driver code, which knows about all these files and directories and returns particular pieces of information from the kernel data structures.

The "storage layer" in this case is the kernel data structures, and procfs provides a clean, convenient interface to accessing those. Do keep in mind that mounting procfs at /proc is simply convention; you could just as easily mount it elsewhere. In fact, that is sometimes done, for example in chroot jails when the process running there needs access to /proc for some reason.

It works the same way if you write a value to some file; at the kernel level, that translates to a series of open(), seek(), write(), close() calls which again get passed to the file system driver; again, in this particular case, the procfs code.

The particular reason why you see file returning empty is that many of the files exposed by procfs are exposed with a size of 0 bytes. The 0 byte size is likely an optimization on the kernel side (many of the files in /proc are dynamic and can easily vary in length, possibly even from one read to the next, and calculating the length of each file on every directory read would potentially be very expensive). Going by the comments to this answer, which you can verify on your own system by running through strace or a similar tool, file first issues a stat() call to detect any special files, and then takes the opportunity to, if the file size is reported as 0, abort and report the file as being empty.

This behavior is actually documented and can be overridden by specifying -s or --special-files on the file invocation, although as stated in the manual page that may have side effects. The quote below is from the BSD file 5.11 man page, dated Oct 17 2011.

Normally, file only attempts to read and determine the type of argument files which stat(2) reports are ordinary files. This prevents problems, because reading special files may have peculiar consequences. Specifying the -s option causes file to also read argument files which are block or character special files. This is useful for determining the filesystem types of the data in raw disk partitions, which are block special files. This option also causes file to disregard the file size as reported by stat(2) since on some systems it reports a zero size for raw disk partitions.