[FreeBSD] I/O System, Descriptor and Socket IPC

2024. 6. 25. 19:03ComputerScience/FreeBSD

 

 

 

2.7 I/O System Overview

  • The basic model of the UNIX I/O system is a sequence of bytes that can be accessed either randomly or sequentially. There are no access methods and no control blocks in a typical UNIX user process. Different programs expect various levels of structure, but the kernel does not impose structure on I/O. For instance, the convention for text files is lines of ASCII characters separated by a single newline character (the ASCII line-feed character), but the kernel knows nothing about this convention.
  • For the purposes of most programs, the model is further simplified to just a stream of data bytes, or an I/O stream. It is this single common data form that makes the characteristic UNIX tool-based approach work [Kernighan & Pike, 1984]. An I/O stream from one program can be fed as input to almost any other program.

 

 

 

Descriptors and I/O

  • UNIX processes use descriptors to reference I/O streams. Descriptors are small unsigned integers obtained from the open and socket system calls.
  • The open system call takes as arguments the name of a file and a permission mode to specify whether the file should be open for reading or for writing, or for both. This system call also can be used to create a new, empty file.
  • A read or write system call can be applied to a descriptor to transfer data. The close system call can be used to deallocate any descriptor.
  • Descriptors represent underlying objects supported by the kernel and are created by system calls specific to the type of object. In FreeBSD, seven kinds of objects can be represented by descriptors—files, pipes, fifos, sockets, POSIX IPC, event queues, and processes:
    • 1. A file is a linear array of bytes with at least one name. A file exists until all of its names are deleted explicitly and no process holds a descriptor for it. A process acquires a descriptor for a file by opening that file’s name with the open system call. Most I/O devices are accessed as files.
    • 2. A pipe is a linear array of bytes, as is a file, but it is used solely as an I/O stream, and it is unidirectional. It also has no name and thus cannot be opened with open. Instead, it is created by the pipe system call, which returns two descriptors, one of which accepts input that is sent to the other descriptor reliably, without duplication, and in order.
    • 3. A fifo is often referred to as a named pipe. A fifo has properties identical to a pipe, except that it appears in the filesystem; thus, it can be opened using the open system call. Two processes that wish to communicate each open the fifo: one opens it for reading, the other for writing.
    • 4. A socket is a transient object that is used for interprocess communication; it exists only as long as some process holds a descriptor referring to it. A socket is created by the socket system call, which returns a descriptor for it. There are different kinds of sockets that support various communication semantics, such as reliable delivery of data, preservation of message ordering, and preservation of message boundaries.
    • 5. POSIX IPC includes message queues, shared memory, and semaphores. Each type of IPC has its own set of system calls that are described in Section 7.2.
    • 6. An event queue is a descriptor for which an application registers notification requests for a wide set of events. The events include arrival of data for a descriptor, availability of space for output on a descriptor, completion of asynchronous I/O, various timer-based events, and change in status of a set of its processes. An event queue is created by the kqueue system call, which returns a descriptor for it. (비동기 처리 연관) 
    • 7. A process descriptor is used by the Capsicum capability model to control the set of processes to which a sandboxed process can have access. A process descriptor is created by specifying the RFPROCDESC flag to the rfork system call. Capsicum and its use of process descriptors is described in Section 5.8.

 

  • In systems before 4.2BSD, pipes were implemented using the filesystem; when sockets were introduced in 4.2BSD, pipes were reimplemented as sockets. For performance reasons, FreeBSD no longer uses sockets to implement pipes and fifos. Rather, it uses a separate implementation optimized for local communication.
  • The kernel keeps a descriptor table for each process, which is a table that the kernel uses to translate the external representation of a descriptor into an internal representation. (The descriptor is merely an index into this table.) The descriptor table of a process is inherited from that process’s parent, and thus access to the objects to which the descriptors refer also is inherited. The main ways that a process can obtain a descriptor are
    • 1. by opening or creating an object, or
    • 2. by inheriting from the parent process.

 

  • In addition, socket IPC allows passing descriptors in messages between unrelated processes on the same machine. Every valid descriptor has an associated file offset in bytes from the beginning of the object. Read and write operations start at this offset, which is updated after each data transfer. For objects that permit random access, the file offset also may be set with the lseek system call. Ordinary files permit random access, and some devices do, too. The remaining descriptor types including pipes, fifos, and sockets do not.
  • When a process terminates, the kernel reclaims all the descriptors that were in use by that process. If the process was holding the final reference to an object, the object’s manager is notified so that it can do any necessary cleanup actions, such as final deletion of a file or deallocation of a socket.

 

 

 

Descriptor Management

  • Most processes expect three descriptors to be open already when they start running. These descriptors are 0, 1, and 2, more commonly known as standard input, standard output, and standard error, respectively. Usually, all three are associated with the user’s terminal by the login process (see Section 15.4) and are inherited through fork and exec by processes run by the user. Thus, a program can read what the user types by reading standard input, and the program can send output to the user’s screen by writing to standard output. The standard error descriptor also is open for writing and is used for error output, whereas standard output is used for ordinary output.
  • These (and other) descriptors can be mapped to objects other than the terminal; such mapping is called I/O redirection, and all the standard shells permit users to do it. The shell can direct the output of a program to a file by closing descriptor 1 (standard output) and opening the desired output file to produce a new descriptor 1. It can similarly redirect standard input to come from a file by closing descriptor 0 and opening the file.

 

  • Pipes allow the output of one program to be input to another program without rewriting or even relinking of either program. Instead of descriptor 1 (standard output) of the source program being set up to write to the terminal, it is set up to be the input descriptor of a pipe. Similarly, descriptor 0 (standard input) of the sink program is set up to reference the output of the pipe instead of the terminal keyboard.
  • The resulting set of two processes and the connecting pipe is known as a pipeline. Pipelines can be arbitrarily long series of processes connected by pipes. The open, pipe, and socket system calls produce new descriptors with the lowest unused number usable for a descriptor. For pipelines to work, some mechanism must be provided to map such descriptors into 0 and 1. The dup system call creates a copy of a descriptor that points to the same file-table entry. The new descriptor is also the lowest unused one, but if the desired descriptor is closed first, dup can be used to do the desired mapping.
  • Care is required, however: If descriptor 1 is desired, and descriptor 0 happens also to have been closed, descriptor 0 will be the result. To avoid this problem, the system provides the dup2 system call; it is like dup, but it takes an additional argument specifying the number of the desired descriptor (if the desired descriptor was already open, dup2 closes it before reusing it).

 

 

 

https://medium.com/swlh/getting-started-with-unix-domain-sockets-4472c0db4eb1

 

 

 

Socket IPC

 

 

 

Socket IPC

  • The 4.2BSD kernel introduced an IPC mechanism more flexible than pipes, based on sockets. A socket is an endpoint of communication referred to by a descriptor, just like a file or a pipe. Two processes can each create a socket and then connect those two endpoints to produce a reliable byte stream. Once connected, the descriptors for the sockets can be read or written by processes, just as the latter would do with a pipe. The transparency of sockets allows the kernel to redirect the output of one process to the input of another process residing on another machine.
  • A major difference between pipes and sockets is that pipes require a common parent process to set up the communications channel. A connection between sockets can be set up by two unrelated processes, possibly residing on different machines. Fifos appear as an object in the filesystem that unrelated processes can open and send data through in the same way as they would communicate through a pair of sockets. Thus, fifos do not require a common parent to set them up; they can be connected after a pair of processes are up and running. Unlike sockets, fifos can be used on only a local machine; they cannot be used to communicate between processes on different machines.
  • The socket mechanism requires extensions to the traditional UNIX I/O system calls to provide the associated naming and connection semantics. Rather than overloading the existing interface, the developers used the existing interfaces to the extent that the latter worked without being changed and designed new interfaces to handle the added semantics. The read and write system calls were used for byte-stream-type connections, but six new system calls were added to allow sending and receiving addressed messages such as network datagrams. The system calls for writing messages include send, sendto, and sendmsg. The system calls for reading messages include recv, recvfrom, and recvmsg. In retrospect, the first two in each class are special cases of the others; recvfrom and sendto probably should have been added as library interfaces to recvmsg and sendmsg, respectively.

 

 

 

Multiple Filesystem Support

  • With the expansion of network computing, it became desirable to support both local and remote filesystems. To simplify the support of multiple filesystems, the developers added a new virtual node or vnode interface to the kernel. The set of operations exported from the vnode interface appear much like the filesystem operations previously supported by the local filesystem.
    However, they may be supported by a wide range of filesystem types:
    • Local disk-based filesystems
    • Files imported using a variety of remote filesystem protocols
    • Read-only CD-ROM filesystems
    • Filesystems providing special-purpose interfaces, for example, the /dev filesystem
  • By using loadable kernel modules (see Section 15.3), FreeBSD allows filesystems to be loaded dynamically when the filesystems are first referenced by the mount system call. The vnode interface is described in Section 7.3; its ancillary support routines are described in Section 7.4; several of the special-purpose filesystems are described in Section 7.5.