CS 111 Operating Systems (Fall 09)

Scribe Notes for Nov 19, 2009. Lecture 16

Kung Kin Law, Siu Man Tang

Remote Procedure Calls (RPC)

Remote Procedure Calls allows two machines to communicate with each other, and even execute instructions on other machine over a network.
The goal of RPC is to look like an ordinary library call. However, there exist some major differences, which lead to several advantages and disadvantages.

Pros:

It enforces hard modularity since caller and callee can be different machines, errors and bugs happen on one machine do not affect the other one.
No call by reference, all data must be called by value since caller and callee probably have different address space.
Caller and callee can have different instruction sets (ie. X86 and MIPS), allowing greater flexibility, however, it also adds a layer of complexity.

Cons:

RPC is slower since requests are sent over the network. Cost of RPC relative to ordinary library call= cost of local call + 2*(marshalling + unmarshalling + network latency) + cost of actual work.
Note that cost of marshalling, unmarshalling and network latency are doubled, because RPC is a two-way action, data are first sent from caller to callee, and result are then sent back.
Instead of API/ABI that used for library call, we implement RPC with network protocol. However, it also leads to the problem of marshalling, which is the next section.

Marshalling

Marshalling is defined as the process of converting objects into network byte representation that can be correctly interpreted by machines regardless of their architectures.
This process is essential for RPC because the caller and callee can have different architecture that they interpret data differently ( i.e. the notion of big endian and little endian).
Marshalling is done on both ends of the communication so that the object is converted to network byte representation before it is transmitted, and transformed back into an object form when received.
However, sometimes people just give up if the object is too complicated to convert, for example, a harsh table.

The following example demonstrates the implementations of client/server communications with RPCs.

RPC is used when a browser want to access a web page, the browser sends
HTTP (Hyper-Text-Transfer-Protocol) request for URL (Universal Resource Locator) to the server. Following is such an example:

GET /foo HTTP/1.1\r\n
\r\n

The request is requesting /foo, using version 1.1 of the HTTP protocol.
The server would reply something like this:

HTTP/1.1 200 OK \r\n
\r\n
Content-Type: text/html \r\n
Content-Length: 1097 \r\n
\r\n
<html> … <html>\r\n

The first line specifies the protocol version and status. It is version 1.1 with status OK. The second and third lines describe attributes of the response: the content type is text and html, and the length of the response is 1,097 bytes.

Problems of RPC and corresponding solution

Problem:

Server can be overloaded, results in delay response or no response at all, and client has no way to distinguish which happens
Messages can get lost or corrupted when sending over the network.
Messages can be delivered out of order since route of packets can be different everytime.

Solution:

When corrupted/lost file is detected on sever by methods like check sum, ask for client to resend while sever will do nothing.
If no response is received from server, we need to set a timeout limit so that we resend the request when time expires. Note that there is no perfect timeout limit; a small limit may result in sending redundant requests, while a large limit may result in a long waiting of timeout. This approach can be further broke down into three categories, each is suitable for particular requests:

At least 1 PRC: request is sent to sever at least 1 time, until a response is received. This is suitable for idempotent requests, meaning that we won’t break anything even if this request is executed multiple times. For instance, a read request is an idempotent request because reading multiple times from a same file doesn’t do us any harm.
At most 1 RPC: request is sent to sever at most 1 time, if it fails, then an error is reported and nothing is changed. This is suitable for non-idempotent requests, meaning that something will go wrong if this request is executed multiple times. For instance, a request to transfer money from one account to another is non-idempotent, because multiple executions will result in a wrong balance.
Exactly one RPC: require server to maintain a cache of recent transaction by transaction ID, to avoid duplicated requests. But there is no perfect cache size, a too small or too large cache can also lead to different problems.

Performance of RPC

There are two types of RPC, namely, synchronous and asynchronous. In synchronous model, the client waits for the response of a request before sending a new request.
Suppose the X client want to draw a box of 200 pixels, it sends 200 drawpixel() requests to the X server. In this case, the latency is huge since it requires 400 network transactions.

One way to tackle this problem is to batch multiple requests, such as adding a new type of request called drawbox(), with arguments width and height. With this new type of request, only 1 request is needed to be sent versus 200. But what if the object we want to draw is of random shape such as a cloud, that we cannot batch the requests easily?

We are back to the old method of sending multiple drawpixel() requests to the server, but instead of sending it one at a time (serially), we can send a collection of requests at once instead of waiting on responses, in order to reduce total network latency. Such technique to solve the bottleneck is by sending requests asynchronously. The clients no longer have to wait for the response before sending a new request and it can send many requests at once and receive all the responses later.

Although this model boost up performance, it is not suitable for all type of requests because idempotency may be even more important here. For example, we cannot send request asynchronously if any of the requests depend on the success of previous request. For instance, we want to send three requests to the server in order, namely, create file, write text to file and read from file. Sending these requests asynchronously may result in error, because all the later request depend on the success of previous requests, and the server need not receive and execute the requests in the order we intended!

Multiple File Systems

Frequently, we want to access multiple file systems that are different from kernel's file system. For example, we use ext3 as our kernel's file system, but we want to access files that are stored in our USB drives which have FAT32/exFAT file systems. This immediately gives us a question of how to use multiple file systems simultaneously!

First approach: specify the file system name before each file

A:/etc/passwd
B:/usr/bin/sh
where A, B are file system name

This works, but clients or users need to know the file system of the target! In addition, what if there are more than 1 file systems that we want to use having same file system names (and same file directory and file name)? That is, A == B? Hence, let's try another approach...

Second approach: by mount() command

In Linux, there is a command called mount() to allow us accessing multiple file systems. This creates an illusion of a big file system to users or clients such that we do not need to specify target file systems for file operations and hence solved the problem in the first approach. To reverse the mount(), we can use the command umount()

Usage of mount: mount (device, directory)
example: mount /dev/mytemp /tmp/mytmp

General problems and solutions:

Files are no longer uniquely identified by inodes. In other words, there may exist a scenario that more than 1 file to be represented by 1 inode. This could happen when there are two file systems that having two inodes (one from each file system) both referring to some files. To solve that, we can include "device" as an identifier in addition of inode number, that is: (inode #, device)

link("a/b", "c/d") can fail if "c" and "a/b" are on different file system. That is, we cannot link files in different file systems easily. To solve that, use symbolic link instead. Often this suffices.

Loops can occur, by inner file system mounting outer file systems. To solve that, we do not allow more than 1 mount of a file system. That is, same file system cannot be mounted more than once.

mount can render files inaccessible. Consider this example:

my file is located in: /usr/bin/myfile
mount /usr /test
Then, everything in /usr will be temporarily masked, hided.

To resolve that, a student suggested that we should only allow mount to executed in an empty directory.
In reality, mount is a privileged command, so if root does not make such error, there would not be such problem. In addition, this also avoids possible exploits on permissions.

What if there is still a fd (file descriptor) on the file system and umount() is issued on that file system? umount() fails if some processes are still using the file system in current implementation.

How to implement mount:

We need an object oriented system:

The class of file system defines methods for creating linking. We also create "structs" in C programming to code in object-oriented style, which we have done in lab 3.

Network File System (NFS)

A network file system is any computer file system that supports sharing of files, printers and other resources as persistent storage over a computer network (Wikipedia). To communicate with those file systems across the network, we need to use protocols:

List of NFS protocols:

Request	Response
LOOKUP(dir file handle, name)	file handle + attributes
CREATE(dir file handle, name, attr)	file handle + attributes
MKDIR(....)
RMDIR(....)
REMOVE(....)
READ(....)/WRITE(....)

A complete list can be found in page 4-46, table 4-1 in the course notes

Property of NFS: "stateless" file server

Client does not know if the server has crashed and rebooted (rather than the fact that the client has to wait for some time for reboot). That is, the server has no state lost and state of server does not matter at all. To implement, file handles must persist across reboots, and file handles would look like (inode # + device identifier).

Since the server has no state, it could give possible problem due to racing conditions. Consider this scenario:

The problem is that we cannot control the time that the requests take to arrive at the server. Even client 1 send unlink("f") first before requests made by client 2 or client 1 send it before client 2 send CREATE("b"), both scenarios should result in errors. However, if the server somehow receives the request as the picture shown, the outcome would have file b removed!

To solve that, we include serial number in the file handle, such that it becomes (inode # +device identifier + serial number)
Serial number counts up for each operation.

NFS performance:

NFS reads and writes are done asynchronously. While reads requests are less problematic, but writes in out of order can be problematic, and in addition, apps may think write works when the server has I/O error!
This can be solved by synchronous writing, with the help of additional flags, but performance is sacrificed.

Solution with NFS Ver 4:

Cache writes on client for later replay on server

+ reliability
+ performance

- disk cost (varied on cache size. But a small cache size does not improve performance significantly...)
- complexity, server is no longer stateless. (Since cache size is limited, we need to know the server's state to decide whether cache space can be reclaimed)

Predictions with NFS Ver 4.1:

Parallelism. That is, suppose we have many clients but only a poor NFS server:

So, in this model, NFS servers only store file metadata, and clients can talk to the NFS server once and then communicate with data servers directly afterwards for the files.