[FreeBSD] The Process and Thread Structure

2024. 7. 21. 20:33ComputerScience/FreeBSD

 

 

 

The Process Structure

In addition to the references to the substructures, the process entry shown in Figure 4.1 contains the following categories of information:

 

• Process identification: the PID and the parent PID 

• Signal state: signals pending delivery and summary of signal actions

• Tracing: process tracing information

• Timers: real-time timer and CPU-utilization counters 

 

The process substructures shown in Figure 4.1 have the following categories of information:

• Process-group identification: the process group and the session to which the process belongs

• User credentials: the real, effective, and saved user and group identifiers

• Memory management: the structure that describes the allocation of virtual address space used by the process; the virtual-address space and its related structures are described more fully in Chapter 6

• File descriptors: an array of pointers to file entries indexed by the process’s open file descriptors; also, the open file flags and current directory

• System call vector: the mapping of system call numbers to actions; in addition to current and deprecated native FreeBSD executable formats, the kernel can run binaries compiled for several other UNIX variants such as Linux and System V Release 4 by providing alternative system call vectors when such environments are requested (OS가 다르더라도 system call 벡터를 제공하면, 바이너리 실행이 가능할 수도 있구나..!) 

• Resource accounting: the rlimit structures that describe the utilization of the many resources provided by the system 

• Statistics: statistics collected while the process is running that are reported when it exits and are written to the accounting file; also includes process timers and profiling information if the latter is being collected

• Signal actions: the action to take when a signal is posted to a process

• Thread structure: the contents of the thread structure (described at the end of this section) 

 

 

 

The state element of the process structure holds the current value of the process state. The possible state values are shown in Table 4.1. When a process is first created with a fork system call (fork - process 생성하는 시스템 콜), it is initially marked as NEW. The state is changed to NORMAL when enough resources are allocated to the process for the latter to begin execution. From that point onward, a process’s state will be NORMAL until the process terminates. Its thread(s) will fluctuate among RUNNABLE - that is, preparing to be or actually executing; SLEEPING - that is, waiting for an event; and STOPPED - that is, stopped by a signal or the parent process. A deceased process is marked as ZOMBIE until it has freed its resources and communicated its termination status to its parent process. (부모 프로세스에게 종료된 상태를 알림)

 

 

 

Process states - NEW / NORMAL / ZOMBIE

 

 

 

The system organizes process structures into two lists. Process entries are on the zombproc list if the process is in the ZOMBIE state; otherwise, they are on the allproc list. The two queues share the same linkage pointers in the process structure, since the lists are mutually exclusive

 

Segregating (분리) the dead processes from the live ones reduces the time spent both by the wait system call, which must scan the zombies for potential candidates to return, and by the scheduler and other functions that must scan all the potentially runnable processes. 

 

Most threads, except the currently executing thread (or threads if the system is running on a multiprocessor), are also in one of three queues: a run queue, a sleep queue, or a turnstile queue. (대부분의 스레드는 3개의 큐에 속하게 된다.)

Threads that are in a runnable state are placed on a run queue, whereas threads that are blocked while awaiting an event are located on either a turnstile queue (회전식 개찰구) or a sleep queue. Stopped threads awaiting an event are located on a turnstile queue, a sleep queue, or they are on no queue.

 

The run queues are organized according to thread-scheduling priority and are described in Section 4.4. The sleep and turnstile queues are organized in a data structure that is hashed by an event identifier. This organization optimizes finding the sleeping threads that need to be awakened when a wakeup occurs for an event. The sleep and turnstile queues are described in Section 4.3.

 

The p_pptr pointer and related lists (p_children and p_sibling) are used in locating related processes, as shown in Figure 4.2. When a process spawns a child process, the child process is added to its parent’s p_children list. The child process also keeps a backward link to its parent in its p_pptr pointer. If a process has more than one child process active at a time, the children are linked together through their p_sibling list entries. In Figure 4.2, process B is a direct descendant of process A, whereas processes C, D, and E are descendants of process B and are siblings of one another.

** Process B typically would be a shell that started a pipeline including processes C, D, and E.

** Process A probably would be the system-initialization process, init. 

 

 

 

Process-group hierarchy

 

 

 

CPU time is made available to threads according to their scheduling class and scheduling priority. As shown in Table 4.2, the FreeBSD kernel has two kernel and three user scheduling classes. The kernel will always run the thread in the highest-priority class. Any kernel-interrupt threads will run in preference to anything else followed by any runnable real-time threads. Any top-half-kernel threads are run in preference to runnable threads in the share and idle classes. (hierarchy) 

 

Runnable timeshare threads are run in preference to runnable threads in the idle class. The priorities of threads in the real-time and idle classes are set by the applications using the rtprio system call and are never adjusted by the kernel. (수정되지 않는다?) The bottom-half interrupt priorities are set when the devices are configured and never change. The top-half priorities are set based on predefined priorities for each kernel subsystem and never change.

 

 

 

Thread-scheduling classes

 

 

 

The priorities of threads running in the timeshare class are adjusted by the kernel based on resource usage and recent CPU utilization. (현재 자원 상황 + CPU 사용에 따라 정해짐) A thread has two scheduling priorities: one for scheduling user-mode execution and one for scheduling kernel-mode execution. The td_user_pri field associated with the thread structure contains the user-mode scheduling priority, whereas the td_priority field holds the current scheduling priority. The current priority may be different from the user-mode priority when the thread is executing in the top half of the kernel.

 

Priorities range between 0 and 255, with a lower value interpreted as a higher priority (see Table 4.2). User-mode priorities range from 120 to 255; priorities less than 120 are used only by real-time threads or when a thread is asleep - that is, awaiting an event in the kernel - and immediately after such a thread is awakened. Threads asleep in the kernel are given a higher priority because they typically hold shared kernel resources when they awaken. The system wants to run them as quickly as possible once they get a resource so that they can use the resource and return it before another thread requests it and gets blocked waiting for it.

 

When a thread goes to sleep in the kernel, it must specify whether it should be awakened and marked runnable if a signal is posted to it. In FreeBSD, a kernel thread will be awakened by a signal only if it sets the PCATCH flag when it sleeps. The msleep() interface also handles sleeps limited to a maximum time duration and the processing of restartable system calls.

The msleep() interface includes a reference to a string describing the event that the thread awaits; this string is externally visible - for example, in ps. The decision of whether to use an interruptible sleep depends on how long the thread may be blocked. Because it is complex to handle signals in the midst of doing some other operation, many sleep requests are not interruptible; that is, a thread will not be scheduled to run until the event for which it is waiting occurs. For example, a thread

waiting for disk I/O will sleep with signals blocked.

 

For quickly occurring events, delaying to handle a signal until after they complete is imperceptible. However, requests that may cause a thread to sleep for a long period, such as waiting for terminal or network input, must be prepared to have its sleep interrupted so that the posting of signals is not delayed indefinitely. Threads that sleep interruptibly may abort their

system call because of a signal arriving before the event for which they are waiting has occurred.

 

To avoid holding a kernel resource permanently, these threads must check why they have been awakened. If they were awakened because of a signal, they must release any resources that they hold. They must then return the error passed back to them by sleep(), which will be EINTR if the system call is to be aborted after the signal or ERESTART if it is to be restarted. Occasionally, an event that is supposed to occur quickly, such as a disk I/O, will get held up because of a hardware failure. Because the thread is sleeping in the kernel with signals blocked, it will be impervious (영향을 받지 않는) to any attempts to send it a signal, even a signal that should cause it to exit unconditionally. The only solution to this problem is to change sleep()s on hardware events that may hang to be interruptible. 

In the remainder of this book, we shall always use sleep() when referring to the routine that puts a thread to sleep, even when one of the mtx_sleep(), sx_sleep(), rw_sleep(), or t_sleep() interfaces is the one that is being used.

 

 

 

The Thread Structure

The thread structure shown in Figure 4.1 contains the following categories of information:

 Scheduling: the thread priority, user-mode scheduling priority, recent CPU utilization, and amount of time spent sleeping; the run state of a thread (runnable, sleeping); additional status flags; if the thread is sleeping, the wait channel, the identity of the event for which the thread is waiting (see Section 4.3), and a pointer to a string describing the event

 TSB: the user- and kernel-mode execution states 

 Kernel stack: the per-thread execution stack for the kernel

 Machine state: the machine-dependent thread information

 

Historically, the kernel stack was mapped to a fixed location in the virtual address space. The reason for using a fixed mapping is that when a parent forks, its runtime stack is copied for its child. If the kernel stack is mapped to a fixed address, the child’s kernel stack is mapped to the same addresses as its parent kernel stack. Thus, all its internal references, such as frame pointers and stack-variable references, work as expected.

 

On modern architectures with virtual address caches, mapping the kernel stack to a fixed address is slow and inconvenient. (고정된 커널 스택 주소의 한계점) FreeBSD removes this constraint by eliminating all but the top call frame from the child’s stack after copying it from its parent so that it returns directly to user mode, thus avoiding stack copying and relocation problems.

 

Every thread that might potentially run must have its stack resident in memory because one task of its stack is to handle page faults. If it were not resident, it would page fault when the thread tried to run, and there would be no kernel stack available to service the page fault. Since a system may have many thousands of threads, the kernel stacks must be kept small to avoid wasting too much physical memory. In FreeBSD on the Intel architecture, the kernel stack is limited to two pages of memory. Implementors must be careful when writing code that executes in the kernel to avoid using large local variables and deeply nested subroutine calls to avoid overflowing the run-time stack. (지역 변수의 과한 사용 + nested calls 주의하기) 

 

As a safety precaution, some architectures leave an invalid page between the area for the run-time stack and the data structures that follow it. Thus, overflowing the kernel stack will cause a kernel-access fault instead of disastrously overwriting other data structures. It would be possible to simply kill the process that caused the fault and continue running. However, the cleanup would be difficult because the thread may be holding locks or be in the middle of modifying some data structure that would be left in an inconsistent or invalid state. So the FreeBSD kernel panics on a kernel-access fault because such a fault shows a fundamental design error in the kernel. By panicking and creating a crash dump, the error can usually be pinpointed and corrected.