CS 111

Scribe Notes for 4/13/10

by William Lu and Adam LeWinter

Orthogonality

Why orthogonality?

Why orthogonality? Orthogonality is important because we want an interface that is simple, complete (specify any point in 3D space), and combinable (being able to mix any combination of interfaces and have the system still work). You can think of orthogonality as represented by the image below. The x-axis represents file API, the y-axis represents process such as fork, wait, etc., and the z-axis represents memory. Any combination should make sense.

Orthongonality

Example:

Choice 1:
read(fd,buf,buf size, fd offset);
Choice 2:
read(fd,buf,buff size); lseek(fd,offset,flag);
Choice 2 makes more sense if you can claim lseek and read are interchangeable, but in reality, Choice 1 makes more sense.

How Mechanisms Can Access (Model) OS Resources

A) You can think of the OS resource like an object. Application deals with references to those objects rather then the object itself.
In C:

struct pte {...} //process description
 struct pte *p // how process gets modeled

The upside to this is that the structure is simple and fast. The downside is that there is no protection against bad user programs and they can trash data structures. Race conditions also emerge because two program can modify the same structure inconsistently. B) The OS resource can be referenced by an integer number.

pid_t (some integer type)

    dev_t (device number)

    int (file descriptor)

    ino_t

This method uses opaque identifiers and is safer than A because the OS must interpret the identifiers. It is also a more flexible approach because you don't need to change applications every time you make a change to the kernel. However, this method is slower and much more complex.
Example:
int fd = open("/dev/null", O_RDONLY, 0);
The system call open has th following description:
int open (const char *pathname, int oflag, mode_t mode);
The system call open returns a file descriptor that has all the information you need about the file you just opened. The integer value represents an index on a table of open files for the current process. The file descriptor can be modeled as follows:

So, using the above open call and image, our entry in the table has file descriptor number 17 and would have read only permissions. That means when we call any system call on the file such as read(17,buf,bufsize);, the OS would go to the process table->process descriptor->file descriptor table->file descriptor to get the information on what to do (in this case return EoF because of the "/dev/null" directory always returns EoF on a read call) and execute. There are a few flags that can be used in the open call to specify permissions on the file.

O_RDWR = allow reading and writing

O_RDONLY = allow for reading only

O_WRONLY = allow for writing only

O_CREAT = create the file if it doesn't exist

O_TRUNC = If the file opens successfully and is not empty, make it empty

O_APPEND = when writing to the file, append to the end of the file

All the flags can be logical OR together to get unique integer patterns to tell the file what to do. When you do a close(fd) , the OS goes to the file descriptor table and closes file and removes it from the table. The last argument to open is mode_t and is the permissions if you create the file in the octal number system.

Side Note

int open(char const *name, int flags, ...);
The ... means first two arguments must be defined type specified, but then caller can pass whatever it wants. You have to specify the correct type and order as the callee or the program will crash.

umask

When the caller asks for a permission in a system call, it is done in an octal number. The umask is defined in the process descriptor table and is a per process operation (different for each process) and logical AND's the umask 1's compliment with the octal number sent by the caller to set the file permissions. The umask can ONLY take away permissions, not add them. umask is an important protection mechanism because you need to create the file with the correct permissions from day 1. If you change the permissions of the file later and the file is already open by another call, the permissions for the open file do not change. umask gets its default value from the process it was created from. The original definition is created for process 1 by the kernel. You can change the umask for each process by accessing it in the process descriptor table.
Example
umask = 022; caller asks for 777; The process will have permissions 755 which amounts to rwxr-xr-x. This means the owner is the only one that can write to the file and that group members and other users can only read and execute the file.

Processes

Processes are also controlled by syscalls just like files have open to create them and close to destroy them, processes have fork to create and exit to destroy them.

Files:
open(...)
.
.
.
close(...)

processes:
fork()
.
.
.
_exit(n)

new processes are created using the fork command
pid_t fork(void)
-1-> fork failed
0-> fork succeeded, now currently running child process
>0-> fork succeeded, now currently running parent process

Here are several error codes that can be found in the error register
#include < errno > ENOMEM - no memory EAGAIN - low on resources try again later
a process exits using this
void _exit(int);

To enforce orthogonality we have access to system calls which control and provide information on processes

Process syscalls:

pid_t getpid(void); //return your own pid
pid_t getppid(void); //return your parent's pid
pid_t waitpid(pid_t p, int * status, int option); //wait on your child's process to finish
waitpid returns child process id or -1 on failure
waitpid also takes in p (process id of the child), the pointer to where the exit status will be, and option (ex. WNOHANG)
Note: we only allow waiting on child process to avoid deadlock

int execvp(char const *file, char* const *argv)
the argv argument holds a char array of arguments (ex char*[] {"date", "-u", NULL})

the function always returns -1 if it returns. This is because if the function returns it has failed (file is not available or arguments aren't recognized)

If the function succeeds it blows everything away, your global variables, local variables, registers, everything and starts a new process given by the file and the commands passed in the argv

Now lets try to create an example function that sorts inputs and outputs
Here's a start:


int sortio(void)
	execvp("/bin/sort", (char*[]) {"sort" , NULL});
}

This is not going to work because it blows everything away and we still want to keep our current process. We conclude we want the function call to execvp in the child process


int sortio(void){
	pid_t = fork();

	switch(p){
		case 0:   //currently running in child
			execvp("/bin/sort", (char*[]) {"sort" , NULL});   //run the sort program
			_exit(1);  //if it reaches here exit with an error
			break;
		case -1: //fork has failed
			return; //simply return with whatever fork put in the error register
		default: //currently in the parent
			int status;
			if(waitpid(p, & status, 0)<0)
				return -1;

			//WIFEXITED is true if child exited false if child was killed
			//WEXITSTATUS returns 0 if child returned normally something else if it did not
			if(WIFEXITED(status) || WEXITSTATUS(status) !=0)
				return -1;

	  }
	return 0;

}

Now that we have the sortio function lets take a look at the sort function itself. Well the parts of the code that actually called by execvp
inside sort.c:

the call points to some assembly code just before the main of sort.c, this area is called crt0
Then it pulls the args from the previous call (this is why you never want to have too many arguments kernel has to copy all of them).


int main(int argc, char**argv){
	.
	.
	.
	return 0; //this is actually the argument to the exit call
}

there are many different ways to exit a process
_exit(n); //notice this underscore this is a quick exit don't clean up just exit
exit(n); //this is a clean exit, clean up your memory then exit (ex. flush output buffer)

Now onto forking
a fork clones the current process (now referred to as the parent process)
the child has all the properties of the parent except for: -pid
-ppid
-file descriptions are shared and their file descriptors are copied
-accumulated execution times
-file locks
-pending signals (ex. ctrl+c)

exec actually destroys the current process except for the stuff mentioned above

Fork and Exec are essentially opposites of one another, but at the same time exec is almost always called after fork. This has spawned a new school of thought that actually combines the two fork+exec to create a new function spawnvp. Windows was the one to adopt this school of thought, and this is one of the big reasons why porting code from linux to windows drastically reduces performance.