CS 111: Lecture 17

Scribe Notes for 11/30/2010

Damian Ancukiewicz

NFS

NFS is a protocol to share files over a network. A computer sends a request for a file to a server. The server then finds the file on its local disk and sends the contents back to the client.

How do we implement NFS?

Special system calls
- nfs_open("name", ...)
However, this is very messy and violates modularity! Every program would have to be re-written to take advantage of NFS.
Change high-level system calls
- FILE *f = fopen("name", ...)
However, this doesn't work too well because many programs (like our own shell program) require lower-level functions.
Make NFS its own filesystem
This is the most reasonable choice, since the implementation then becomes transparent to the user. The files can be accessed as if they were in local storage, and the operating system takes care of sending requests to the NFS server via RPC.

File handles in NFS

Many parts of the NFS protocol look very similarly to Unix system calls:

LOOKUP(dirfh, name) → fh + attrs
CREATE(dirfh, name) → fh + attrs
MKDIR(dirfh, name) → fh + attrs
REMOVE(dirfh, name) → status
RMDIR(dirfh, name) → status
READ(fh) → status
WRITE(fh) → status

In these examples, "fh" stands for "file handle", and is the underlying file identifier. It is similar in concept to a Unix file descriptor in the sense that it is a unique identifier, but it has one important property: it is persistent. If a client accidentally disconnects during a session and then reconnects, the file handle it previously used is still valid.

Because of these properties, the most natural way of representing an NFS file handle appears to be a Unix inode, since these too are persistent and unique. There is one catch, however: an inode is a unique identifier on a physical filesystem, while an NFS server might have several different filesystems. Therefore, a better way to represent a file handle is as a (device, inode) pair, with a unique device ID for each physical filesystem and a unique inode for each file within. Although using the filesystem's direct inode number does have some security implications, the NFS driver is usually operating at the kernel level.

Concurrency

Because NFS is implemented over a network, it brings up nontrivial issues of concurrency, since several users on different computers may be attempting to access the same file at once. Additionally, because networks are unreliable, it is not guaranteed that every client's request will be processed.

In the diagram above, client A first opens the file, receiving a file handle. If client B decides to rename that file while A has it open, then that is no problem, since it doesn't change the (device, inode) identifier. The surprising fact, however, is that if client C decides to remove the file while A has it open and is reading from it, then that's allowed as well. On the next read operation, A will receive a "stale file descriptor" error: -ESTALE. This is done for robustness, since the NFS server doesn't have to decide whether A is still connected or not before giving C permission to remove. If A were to be on an unreliable connection, this would be very difficult to determine. Because of this, the stateless - it doesn't keep track of any client information in memory, and RAM is only used for caching. There are no locks to worry about.

Performance

The system, being based on RPC, is much faster if requests and responses can come unordered without any handshaking involved. Additionally, on the server side, there is usually extensive caching involved, both on the client and on the server. For both of these reasons, read/write consistency is not guaranteed: the result of a read() call may depend on the time and on who is issuing it. However, close-to-open consistency does work, but is slow; all relevant buffers need to be flushed to disk.

Because an NFS server is stateless, write caching is tricky. RAM doesn't work for this, since it introduces a state - if the server crashes when data is cached to RAM, then that data is lost. One solutioin is to use non-volatile RAM (NVRAM), which keeps its state on a power failure.

Authentication

Because protocols such as NFS operate across different computers, some authentication scheme is needed. The traditional approach was to make sure that every user has same user ID on all clients and that all clients are trusted. The modern approach is to use an authentication scheme such as Kerberos, which performs user ID remapping.

Security: not just a warm fuzzy feeling

In the real world, security defends against attacks via fource and fraud. Main forms of attacks via fraud are against:

Privacy (unauthorized release of information)
Integrity (tampering with data)
Service (DoS - denial of service)

General goals in Defense

Disallow unauthorized access (a negative goal) (this is hard! - poorly tested)
Allow authorized access (a positive goal) (often well tested)

Threat modeling

Attacks can take many different forms:

Network attacks

Denial of service attacks from outside a network
Explit bugs in Linux to break in (for example, a buffer overrun)
Packet sniffing for plaintext passwords, etc.
Brute-force password cracking
Virus
Drive-by downloads (downloads done without user's knowledge)

Device attacks

USB sticks with viruses
CDRs, DVDs, etc.

Other

Insiders (Wikileaks!)
Social engineering (very difficult to defend against)