Virtual Memory and Processes

Multi-level Page Tables
Page Faults
Removal Policy
Shrinking Wait Time with VM
Optimization: Dirty Bit

Multi-level Page Tables

The process virtual space consists of a read-only text segment and a data segment with the heap and the stack growing from opposite ends of the data segment, leaving a wide gap in between. This leaves the page table with a lot of wasted, unused entries.

Solution. Split the page table into multiple levels like sparse files in the UNIX filesystem. Take the following code:

open("foo", O_RDWR | O_CREAT | O_TRUNC, 0600);
lseek(fd, 1000000000, SEEK_CUR);
write(fd, "x", 1);

Recall that only the last data block containing "x" is allocated. The surrounding entries in the indirect blocks are left as 0s. Similarly, the first-level page table has 0s in its entries and do not allocate their second-level tables until needed (see Figure).

x86 architecture supports 2-level page table virtual addresses like so:

10 bits	10 bits	12 bits
Virtual Page No.		Offset

The first 10 bits of the virtual page number (vpn) address the second-level tables from the first-level table, while the remaining 10 bits address the actual page. The following code demonstrates the hardware implementation:

size_t pmap(int vpn) {
    int hi = vpn >> 10;
    int lo = vpn & ((1 << 10) - 1);
    int * lopage = PAGETABLE[hi];
    if (!lopage)
        return FAULT;
    return PAGETABLE[lo];
}

Page Faults

A page fault occurs when the requested page could not be returned. The hardware traps and gives control to the kernel, which can respond in the following ways:

Terminate the process
Send signal (SIGSEGV) to the process
Load page from disk (only if the fault is due to an unloaded page)

Signal Handling Case Study 1

int i; // global variable
...
signal(SIGSEGV, sighandler);
a[i];  // invalid access causing a page fault
...
void sighandler(int sig) {
    i = 27;  // assume 27 is a valid subscript
}

This code works only if the compiler does not cache i in the register when referencing a[i].

Solution. Make i volatile to prevent caching.

Signal Handling Case Study 2

Most people don't write code like the above, but the following example comes from the Bourne shell.

int * stack;
int stacksize;
stacksize = 8196;
stack = malloc_special(stacksize);  // Bourne's version of malloc
...
stack[stacksize++] = x;  // can segfault
...
void fixstack(int sig) {
    realloc_using_mmap(stack, stacksize *= 2);
}

For convenience and performance reasons, Bourne prefers using stack[stacksize++] without bound checking. This is done by using the signal handler fixstack to automatically resize the stack when a segmentation fault occurs. Because signal handling SIGSEGV can mask invalid pointer bugs, this method demands perfect programming and is not for the faint of heart.

Aside.
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

mmap() is a function that alters the page table in a controlled way by mapping files or devices (usually /dev/zero) into memory. The block of memory from addr to addr+length will correspond to the contents of the file specified by fd. prot specifies the rwx protection.

Kernel Fault Handling

The kernel keeps track of the disk address and memory size for each process. On page fault, the kernel may evict a page to make room for the new page. The removal policy (elaborated later) determines which page should be evicted. The following code shows what the kernel roughly does:

void pfault(int va /* virtual addr */,
            int access_type /* rwx */,
            proc cp /* current process */) {
	if (swapmap(cp, va) == FAIL)
		kill(cp);  // cp doesn't have access to page.
	else {  // page not in RAM, must choose victim to evict
		(vp, vva) = removal_policy();
		physaddr = vp->pmap(vva);
		write page at physaddr to disk at swapmap(vp, vva)
		vp->pmap(vva) = FAULT;
		read page at physaddr from disk at swapmap(cp, va)
		p->pmap(va) = physaddr;
	}
}

Caveat: The kernel code that is handling virtual memory must never be paged out. If pfault is paged out, we can no longer swap pages.

Removal Policy

Due to physical constraints, pages will eventually be swapped with new ones. Which pages to be swapped is determined by the removal policy. If page access is random, any page can be evicted equally, but we can typically exploit locality of reference to minimize the number of faults, improving performance. Here, we explore several page swapping policies on the reference string "012301401234". Assume we have 3 physical pages (A-C) with the 5 physical pages (0-4).

First-In First-Out (FIFO)

FIFO is a simple and intuitive algorithm in which we evict the oldest page first.

	0	1	2	3	0	1	4	0	1	2	3	4
A	0	0	0	3	3	3	4	4	4	4	4	4
B	?	1	1	1	0	0	0	0	0	2	2	2
C	?	?	2	2	2	1	1	1	1	1	3	3
faults	X	X	X	X	X	X	X			X	X

Total: 9 faults.

	0	1	2	3	0	1	4	0	1	2	3	4
A	0	0	0	0	0	0	4	4	4	4	3	3
B	?	1	1	1	1	1	1	0	0	0	0	4
C	?	?	2	2	2	2	2	2	1	1	1	1
D	?	?	?	3	3	3	3	3	3	2	2	2
faults	X	X	X	X			X	X	X	X	X	X

Total: 10 faults.

As a side note, we see that the performance of 4 physical pages here is actually worse than that of 3 physical pages, showing that adding memory doesn't always increase performance (aka Bélády's Anomaly).

Least Recently Used (LRU)

LRU is the policy typically implemented in systems. We assume that the least active page is no longer needed and evict it.

	0	1	2	3	0	1	4	0	1	2	3	4
A	0	0	0	3	3	3	4	4	4	2	2	2
B	?	1	1	1	0	0	0	0	0	0	3	3
C	?	?	2	2	2	1	1	1	1	1	1	4
faults	X	X	X	X	X	X	X			X	X	X

Total: 10 faults.

While LRU's performance is usually better than FIFO, this particular reference string shows that it isn't always the case.

Least Needed First (LNF)

LNF evicts the page needed farthest into the future. This algorithm is provably the most efficient page replacement policy but requires an oracle to predict which page is least needed; however, we can use statistics from previous runs to approximate this algorithm.

	0	1	2	3	0	1	4	0	1	2	3	4
A	0	0	0	0	0	0	0	0	0	2	2	2
B	?	1	1	1	1	1	1	1	1	1	3	3
C	?	?	2	3	3	3	4	4	4	4	4	4
faults	X	X	X	X			X			X	X

Total: 7 faults.

Read more about page replacement algorithms.

Shrinking Wait Time with VM

To run a program, we usually

copy the program file into RAM (long time),
and set the ip to the start of file.

Copying the program makes up the bulk of the wait time. We can reduce this wait time with an alternate approach:

mmap from file to virtual memory (initializing page table without copying the pages)
set ip to start of file (will page fault)

The first page fault copies only the first page into RAM. We also load subsequent pages one at a time as needed. This is called demand paging. This method helps program start faster but increases total cost because page loading cannot be done in bulk. Thus, demand paging is more efficient for larger programs.

Optimization: Dirty Bit

The dirty bit records whether a page has been modified. Unmodified pages do not have to be written to disk, saving I/O time. Hardware usually implements the dirty bit for you, but we can implement it in software like so:

Load page in read-only mode
Attempt to write, causing a trap and yielding to kernel
Have kernel modify permissions and set dirty bit