CS 111 Lecture 5 Scribe Notes (Winter 2013)

Scribe: Christine Kuo

Sections: Orthogonality | Processes | Files

Orthogonality

Example: GNU Emacs

(directory_files_and_attributes "/usr/src/kernel/fs")
result = (("ufs" 209361271 "eggert") ("README" 209361284 "eggert"))

This is slow now (in 24.2).
Here's why: The way Emacs implements this primitive is to call the following functions at the C level.

	DIR* d = opendir("/usr/src/kernel/fs"); //ptr into directory
	while((dp = readdir(d, ...)) { //while data stx is not null
		char buf[1024];
		strcpy(buf, "/usr/src/kernel/fs/"); //ending slash is impt
		strcat(buf, dp->d_name); //dp->d_name is "ufs"
		struct stat st;
		lstat(buf, &st);
		//convert st to Lisp form
	}

What's wrong? Let D = the length of the string "/usr/src/kernel/fs".
There is an overhead in calling strcpy, strcat on string.
Can partially rectify this by hoisting string functions above the loop.

	DIR* d = opendir("/usr/src/kernel/fs");
	char buf[1024];
	strcpy(buf, "/usr/src/kernel/fs/");
	dirlen = strlen(buf);
	while((dp = readdir(d, ...)) {
		strcpy(buf+dirlen, dp->d_name);
		struct stat st;
		lstat(buf, &st);
		//convert st to Lisp form
	}

The cost is O(D*N) = length of directory name * # of files in directory = O(N).
B/c each call to lstat passes in filename & interprets each directory; work proportional to D.
Instead of lstat, use fstatat, interprets paths relative to directory pointed to by extra file descriptor argument.

	DIR* d = opendir("/usr/src/kernel/fs");
	int fd = dirfd(d);
	char buf[1024];
	strcpy(buf, "/usr/src/kernel/fs/");
	dirlen = strlen(buf);
	while((dp = readdir(d, ...)) {
		strcpy(buf+dirlen, dp->d_name);
		struct stat st;
		fstatat(fd, dp->d_name), &st, O_SYMLINKS_NOFOLLOW); //fd points to directory
		//convert st to Lisp form
	}

What is orthogonality?

Orthogonality = ability to decompose problem into independent axes that do not interfere with each other.

Orthogonality problem in classic Unix design:

The primitives used to access the file content should be orthogonal to the primitives used to specify the file name. In Emacs, the orthogonality was breaking down b/c file names were read out of a directory; this can make code slow or hard to read.

Goal is an orthogonal operating system!
Tools:
Virtualizable processor
Classic Unix model


Processes

What is a process?

Process = program in execution, running in an isolated domain = a virtual computer with a subset of abilities of the host computer it is running on.

How do we specify how processes work?
API for dealing with processes (for application programs) + associated intuition (as "data structures")

API system calls that deal with processes

What do these do?

Example: fork, a primitive in API/Data structures

	pid_t fork(); //pid_t is a handle for a process
	sys/types.h:
	typedef long pid_t;
	 //This must be a signed integer type */

But if we just needed a process ID, we could just do:

	pid_t e = 97

But this is different from:

	pid_t c = fork(); //clones current process
	//yields the pid of the clone if parent process, 0 if you're the child, -1 if not enough resources
	if(c == 0) {
		//do what the child should do
	} else {
		//in the parent
	}

Example: Print time of day as inefficiently as possible, using fork

	bool printdate(void) {
		pid_t p = fork();
		if(p<0) return 0;
		if(p==0) {
			//in child, can run date command
			execvp("/usr/bin/date", (char**){"date", "-U", 0}); //only returns if fails
			exit(126); //nonzero exit status to indicate execution did NOT work
			//exit so child does not run parent code if execvp fails
		}
		//back in parent
		//if do return 1, parent & child run in parallel, printdate & date outputs interleaved
		int status;
		while(waitpid(p, &status, 0) < 0) continue; //continues if interrupted
		return WIFEXITED(status) && WEXITSTATUS(status)==0; //true if normal exit
	}

What can go wrong?

One way is to create process IDs on our own:

	waitpid(97, &st, 0)

Zombie processes

	init.c:
	//reap the zombies
	while(waitpid(-1, ~~~)) continue; //beware: -1 means wait for any child to exit

Signal alarm interrupt

	while(waitpid(p, &status, 0)<0) continue;
	//waitpid would hang and never exit
	//want to insulate caller even if this happens in callee
	signal(SIGALRM, donothing); // void donothing(int sig) {}
	alarm(5); //please wake me up in 5 seconds & deliver a signal, will call a function asynchronously
	//BUGGY! If alarm returns before waitpid is called, would hang forever...
	if(waitpid(p, &status, 0) < 0 && errno==EINTR) {
		//date is looping, alarm went off
		//must get rid of subsidiary process
		kill(p, SIGINT); //sends Ctrl+C signal to child
		waitpid(p, &status, 0); //politely exit date zombie
		//if do not trust date (ill behaved), use SIGKILL, but avoid bc extreme & prevents clean up
		//usually bit vector keeps track of signals and processes
	}

Data structure to implement this...

Inside kernel, there is a process table indexed by process id that keeps process info:
exit status, zombie flag, parent pid, start loc of RAM, size of RAM, ip, registers, uid, gid

In kernel, to resume pid 97:

	reti //exit kernel mode, enter user mode, set ip to 97's ip near top of stack
	

Example: Single-core machine running 10 processes numbered 1 to 10

10 rows in process table all filled with info
If 3 is running, a lot of entries are junk (e.g. ip is not up to date)
If 10 is not running, it is a virtual process (info saved in process table)


Files

API system calls that deal with files

Data structure to implement this...

How to represent inside a kernel?
Process table includes set # of file descriptors that point to info about opened file
Want orthogonality. Does fork(); clone file descriptors?