Winter 2013 CS 111 Lecture 4 Scribe Notes

Prepared by Ryan Chan and Jen-Hung Lo for a lecture by Professor Paul Eggert on January 16, 2013

Table of Contents

  1. Previous Lecture
  2. Motivation
  3. Introduction
  4. Protected Transfer of Control
  5. Virtualizable Processor
  6. Policy on Implementing System Calls
    1. Low-level approach
    2. High-level approach 1: Using FILE pointer
    3. High-level approach 2: Read line-by-line
    4. High-level approach 3: Read by sectors
    5. High-level approach 4: Read by Size in Bytes
    6. High-level approach 5: Read by offset
    7. High-level approach 6: Combining read + lseek methods
  7. Error Handling
    1. Trivial Cases
    2. Data Type Limitations
    3. Miscellaneous Issues
  8. Works Cited

Previous Lecture

From previous lecture (Lecture 3), we learned that we can improve our VSWC (Very Secure Word Count) program by using modularity. With modularity, our program will be easier to modify and easier to debug. There are three ways to modularize your program (Please check previous lecture’s scribe notes for more detail):

  1. Modularization by functions (Soft Modularity)
  2. Client/Server modularity (Hard Modularity)
  3. Virtualization (Hard Modularity)

From previous lecture, we know that the first way is categorized as soft modularity, the second and third way as hard modularity.

We want to observe how soft and hard modularity behaves, and we did this by adding a new functionality called factl (factorial long).

Motivation

In the first half of the lecture, we will be covering the third way to modularize: Virtualization.

So why do we prefer virtualization over the other two ways to modularize?

Cons of Modularization by functions:

Modularization by functions is unsafe because modules’ integrity is maintained by soft modularity. Soft modularity can lead to disaster if one of the modules violates the rules of the contract. So, trust between each modules is required if you want to modularize by functions.

Cons of Client/Server modularity:

In client/server modularity, one computer is considered as one module, and the network between these computers serves as a modularity enforcer (i.e. the network prevents hidden interactions between client and server). However, the disadvantage of using one computer per module is that the amount of computer required is linearly proportional to the numbers of modules. For example, if a system of network decides to implement n modules, then we would need n computers, and this approach is not desirable. Moreover, there’s high latency between modules since messages have to travel through the network. Identity verification is also needed when a module receives a message from other modules. This could lead to security problems if the authentication process is not done correctly and exploited by a third party. Overall, this approach is slow and very complex.

Introduction

From previous lecture, we tried to add a new instruction to an existing machine (e.g. x84 + new instruction). Let’s call this new instruction “unlink”.

There are a couple of ways to add a new instruction to an existing machine. One way is to use function call. In assembly language, it would look like this:

pushl filename call unlink addl $4, sp

This approach is not recommended because caller can violate the contract and mess with the callee. This can lead to disaster if the function is acting as an interface between the user application (caller) and OS kernel (callee).

A second way to add a new instruction is to make it part of the instruction set on the hardware level. In assembly language, it would look like this:

pushl filename unlink addl $4, sp

This approach is not practical because the chances that Intel/AMD would let you add a new instruction to x86 is non-existing. Even if the new instruction is added to the ISA, it’ll be very troublesome to debug as you would have to constantly modify the hardware.

A third way to add a new instruction is to write an interpreter that emulates a virtual machine (e.g. x86 emulator), and then add the new instruction to that emulator. This interpreter (written in microcode) along with the new instruction is build on top of a physical machine (e.g. Intel, AMD). Not only can interpreter emulates a different machine, interpreter can also be written to emulate a new machine in development. However, there are some downsides:

  1. It’s slower because several instructions are needed for a single virtual instruction. As a result, emulation can cost a factor of 10 in performance (CPU cost). Moreover, interpreter requires a memory of its won to simulate another machine (memory cost).
  2. Can be buggy
  3. The timing characteristics on the emulator can be different, and because of this difference, a bug can be unintentionally fixed or created.

A fourth way is to virtualize the processor, given that processor is able to implement a virtual instance of itself (i.e. virtualizable processor). Although this approach loses the portability of emulation mentioned previously, it provides a better performance.

Protected Transfer of Control

In virtualizable processor, we have user application, kernel, hardware, etc… Some applications are malicious and can potentially harm the kernel. So, we want to restrict and control what the application can do. We can do this by protected transfer of control. This form of control categorizes the hardware instructions into two parts: privileged and unprivileged. If the application contains privileged instructions, then the hardware will pass the control to the kernel, and let the kernel decides what to do with it. The hardware will also pass the control to the kernel if the application has invalid instructions. For unprivileged instructions, the application is able to execute them without consulting the kernel. Here are some examples in assembly code:

pushl filename 242 addl $4, sp

The instruction “242” is an invalid instruction, and when an invalid instruction is encountered, a trap will be caused. A trap is a type of interrupt, and it usually results in transferring the control to the kernel, and the kernel will decide what to do with it. However, instead of using “242”, we can use the int instruction, which is by convention the instruction to generate an interrupt:

pushl filename int 0x80 addl $4, sp

The 0x80 means that the int instruction wants to generate an interrupt of type 0x80 (or 128). There is an array (called interrupt descriptor table) of 256 entries that represents all the interrupt numbers (0x80 is one of them), and each element of this array contains the pointer to the corresponding code that’ll get executed in the kernel. Each element also has a bit called privileged bit. When hardware interrupts, the kernel will turn on the privilege bit of the corresponding interrupt number.

InterruptFigure 1: Interrupt Descriptor Table

On the hardware side, when trap occurs, the following will occur:

  1. The following information will be pushed onto the stack:
    ss Stack Segment
    esp Stack Pointers
    Flags Processor/Privileged Flags
    cs Code Segment
    eip Instruction Pointer
    Error Code Details of trap
  2. eip will be set to interrupt service routine.
  3. When reti is called, the above stack information will be popped. (reti is to trap as ret is to call)

This approach of calling int instruction is a lot slower than just calling a function, but offers more control and protection. For example:

Let’s assume a.out contains outb instruction, which is a privileged instruction

$ ./a.out

Since a.out contains illegal instruction (outb), kernel will kill the process.

Now, let’s assume a.out contains int

$ ./a.out

Since a.out contains int, the kernel can decides whether to just kill the process, or do a protected transfer of control.

From the example above, the application is really at the kernel’s mercy. This demonstrates the ability of kernel to protect itself from malicious application (1-way protection).

Furthermore, the kernel has its own protected memory, so an application cannot overwrite stuffs like the interrupt descriptor table or the interrupt service routine.

However, what if kernel gets an invalid instruction? Will kernel run recursively forever, or are there something like “kernel traps”? The solution to this is to design a perfect kernel without any bug.

Virtualizable Processor

With the virtualizable processor we have talked about so far, we get a layered system. Here is the first approach to a layered system.

First Approach to Layered SystemFigure 2: First Approach

Here, we can see two layers of abstraction and the interfaces between each component.

We can further break kernel into I/O device management and memory management. Here’s the second approach:

Second Approach to Layered SystemFigure 3: Second Approach

Here, we can see more layers and more interfaces.

Finally, we have another approach, this time by the x86. x86 uses the ring-structured or microkernel approach. x86 uses 4 rings, whereas Linux only uses 2 rings. So, x86 needs 2 bits to represent a processor level while Linux only needs 1 bit. The 4-ring approach gives extra level of protection, but causes loss in performance.

Third Approach to Layered SystemFigure 4: x86 Approach

Policy on Implementing System Calls

Using the above mechanism, which system call(s) should be implemented?

The following example demonstrates the policies involved in implementing system calls to read data off disk.

Low-level approach: Using the insl system call.

On C level, the insl may result from code segments:

char buf[1024];
insl(192, buf, 256);

On machine level, the above C code translates to:

movl $39, %eax
movl $192, %ebx
lea buf, %eax
movl $256, %edx
int 0x80

However, this approach has two major disadvantages:

  1. Application is able to read data from any location on disk.
  2. Implementation is too low level, and may not suit the App writer's needs.

High-level approach 1: Using FILE pointer

Here, f is a pointer to a FILE type, which is a high-level concept. This function returns a pointer to a sequence of characters

char* readline(FILE f);

In assembly code, the FILE variable f is represented via lower level machine words (%ebx):

movl $109, %eax
movl f, %ebx
int 0x80

This method is a low-level and simple representation of a high-level object.

The general ideas for implementation:

  1. A pointer to an acutal FILE object.
  2. An integer to be handed out by the OS.

However, idea 1 poses a security issue as the pointer may be directed to the internals of the OS.

High-level approach 2: Read line-by-line

Another high-level approach is to read the file line-by-line by defining a "line" to be a sequence of bytes terminated by \n:

char* readline(int);

Here, the int specifies the line nubmer to read in the file and returns a pointer to a sequence of characters.

The main drawback for this approach is that the kernel has to allocate user memory when creating the return value char*. Moreover, other drawbacks include:

  1. The OS must keep track of the read location in the file.
  2. A partial line may be read.
  3. The read-in line may be really long and use up all the buffer space.
  4. The read-in line may have no newline character.

In fact, drawback 1 can be solved by passing another argument to the function; 3 and 4 can be solved by returning a NULL pointer and error message. However, drawback 2 cannot be solved if the read is line-by-line.

High-level approach 3: Read by sectors

This approach reads by scanning a specified number of sectors from a given FILE pointer, and saving the result to a designated buffer pointer, returning nothing.

void read_sector(FILE f, char* buf, int no_of_sectors);

Here, the FILE type is used instead of the device number because device numbers are not portable. Therefore, an auxilary file-opening function should be implemented.

Also, the no memory allocation is needed as no value is returned.

However, one of the main drawbacks is the specification of sectors. As this value varies between systems, it is difficult for the developer to specify how many sectors to readin from the disk.

High-level approach 4: Read by Size in Bytes

This approach differs with approach 3 as the readin size is specified when the function is called:

void read(int file, char* buf, int size);

Here, file is a file descriptor, buf is the pointer to the buffer and size is the specified size to read.

This approach is suitable for sequential read, as the function does not care about the current location of the file and reads the size specified from the file.

High-level approach 5: Read by offset

This approach includes an offset in the function, and the read will be performed from that offset onwards.

void pread (int file, int offset, char* buf, int size);

Here, an integer offset is added and other arguments are kept the same as approach 4. This method is frequently used in database systems for indexing.

High-level approach 6: Combining read + lseek methods

This method is used in UNIX originally and uses both read and lseek functions to achieve the effect of an offset:

void read(int file, char* buf, int size);
void lseek(int file, int offset, flag);

Here, the lseek function's arguments includes an offset and a flag.

When flag is 0, the offset is relative to the start of the file, 1 means relative to the current location, 2 means relative to the end of file.

Error Handling

Reading off disk may result in errors such as disk failures or concurrent reads.

Therefore, the read function should return an int type that reports if the read was successfully performed:

int read(int file, char* buf, int size);

Furthermore, the int returned can also keep of track of the number of bytes read.

To conclude, the return int value equals to:

Similarly, the lseek function should return an int that reports error and useful information:

int lseek(int file, int offset, int flag);

The return int equals to:

Trivial Cases

When the read function is called to read the last byte of a file, an error check should be performed:

if( lseek(fd, -1, SEEK_END ) < 0)
{
    error();
}

Here, fd is the file decriptor, -1 is the offset and SEEK_END is the enumeration of 2 which means relative to the end-of-file.

Morever:

if( read(fd, &c, 1) != 1 )
{
    error();
}

The above two segments of error-checking code ensures the last byte is read.


In fact, the implementation of the lseek function adds another useful feature, which is to return the current read position in file:

int here_am_i = lseek(fd, 0, SEEK_CUR);

As the offset is 0 relative to SEEK_CUR, the updated offset will also be 0 (so as the return value).

Data Type Limitations

The unsigned int's range is from 1 to 231, so a file larger than 2GB cannot be addressed if int is used.

The double type cannot be used because gaps appear as the value grows larger.

The long type cannot be used since it is supported in x86-64 but not x86.

For the above reasons, the types off_t, size_t and ssize_t should be used, which is a 64-bit integer, included in:

#include <sys/type.h>

Therefore, the function read and lseek should be defined as:

ssize_t (int file, char* buf, size_t size);
off_t lseek(int file, off_t offset, int flag);

Miscellaneous Issues

Works Cited

Here are list of previous Scribe Notes I uesd for note/design reference: