NFS - Network File Systems

What are NFS?
NFS are essentially client server based file systems.

Example performance figures of a real life NFS from spec.org:
Sun ZFS storage 7320 Appliance (To be released in May)
-2 * storage controllers (general purpose computers)
- 2 * 10 Gb Ethernet adapters
- 8 * 512 GB SSD (used for read acceleration)
- 8 * 73 GB SSD ( used for write acceleration)
Each type of SSD is tuned to make either reading or writing more efficient.
- 136 * 300GB 15K rpm HDD

graphics1
Figure 1. NFS Architecture

Overall it has a 37TB capacity which is less than the sum of the space of the hard drives because there is inherent overhead in having to maintain a filesystem.
It actually has 32 file systems of ~1TB each which help improve its overall speed.

How does NFS compare to local disk performance?
8ms response time on local machine
1-3 ms response time using nfs

graphics2

NFS’s 3x speed increase over a local machine is due to the overhead of reading off an SSD and sending a packet over a network is much less than waiting for a disk arm to seek to the correct location of the disk and read the data off of it.

This example of an NFS is also extremely reliable. There are at least two of every component present in the system and thus there is no single point of failure in the system.
This is an important design property when it is desired to have a high level of reliability in a system.

This NFS emphasizes Redundancy and Performance.

Performance – how can we make it go fast?
RPC as part of NFS:
Traditionally, requests between the client and server were done sequentially, however, this method was extremely slow and now we opt for the client to communicate with the server in parallel such as through multiple threads. This parallel communication is known as Pipelining. Multiple threads operating in parallel work best if they are independent and can get responses much faster than if each request was made sequentially.

graphics3

Example:
Old web browsers used to operate in a sequential manner, but now they send requests for multiple pages at once and get back responses in parallel, thus improving the speed at which a user can view all of the content on a webpage.

Pitfall:

Failed Out of Order(OOD) requests

A downside to sending multiple requests at once is that once several requests are outstanding an earlier request could fail while a later one succeeds. If this occurs, it can cause major problems for applications that were expecting earlier operations to have succeeded.

write(fd,buf,27); have to notify the user that this operation failed
write(fd,buf,1000);
write(fd,buf,96);

2 Strategies to overcome Failed OOD Requests:
1.) Don’t pipeline and make the system slow. In this case, we wait for an actual response to assure that each request has been processed.
2.) Pipeline and be fast, but lie to the user about whether or not the write worked.
Errors are reported when an application closes, and this process is slow. The philosophy behind this method is that the benefits of having faster reads and writes outweighs the cost of being slow to close; since the application reads and writes much more frequently than it closes, the application is overall quicker.

In practice, most people choose option 2 over 1 as speed is very appealing and easy to sell, however, 1 may be used if correctness is prioritized.

Issues with RPC
+ Hard modularity because it is run on different machines
- Message speed is limited by the speed of light
- Messages can be lost because networks are always lossy
- Messages can be corrupted and the receiving end may get the wrong information
this problem can be combated with checksums at the end of packets to check for errors
- Network might be down or might be slow*
- Server might be down or might be slow*

*A huge issue with RPC is that you cannot tell the difference between the server being down and the server being slow.

Options if one end receives no response in RPC
1.)At least once RPC
Philosophy is that if there is no response, try again and keep trying.
This method is ok for idempotent operations such as reads and writes (if they are trying to be performed on the same location), however, it is not valid for dangerous operations.
2.) At most once RPC
This method returns an error to the caller if there is no response such as, “Sorry, rename failed because the network timed out.”. This method is preferred for dangerous operations .
3.) Exactly Once RPC
This is the ideal case in which an operation is performed one time and only one time.

Robustness Constraint in Design:
NFS assure a “stateless server” which means that if the server crashes and reboot’s the client should’nt know or care. The name comes from the fact that what is on the controller’s RAM doesn’t count as part of the state. The controller’s RAM is only a cache and cannot contain any information which could be lost.

How do we get this to work?

graphics4

NFS Protocol RFCS
all have familiar messages including
read(fh, bytes, data);
write(fh,bytes,data);
lookup(fh,name);
remove(fh,name);
create(fh,name,attribute) -> fh

What is a File Handle(fh)?
A file handle is an integer which uniquely identifies a file.

In actuality part of the NFS file server lives in the kernel to make the system work, however this compromises the modularity of the system.

How does NFS operate with 2 Clients?

graphics5

2 NFS Clients:

NFS no longer guarantees write to read consistency with multiple clients, but it does guarantee closed to open consistency because closing and reopening is a much slower process and will ensure all changes are accounted for.

Reliability:
Issues:
- Could have a bad network.
- Could have a bad client.
- Could have a disk go bad (media fault)
If there is a media fault the whole system is in jeopardy because if there are bad blocks in the write journal the cell data cannot be properly reconstructed if it is lost.

RAID (Redundant Arrays of Independent Disks):
RAID is a system in which multiple disks are used to comprise a single virtual drive. The original motivation behind RAID was to provide a cheaper alternative to the expensive single drives which provided the same memory capacity. RAID’s current motivation is to provide increases in speed and/or reliability of a system. There exist different types of RAID which each are tuned to prioritize certain system qualities.

Raid 0:
Raid 0 concatenates the disks and forms a virtual disk drive which is the sum of the capacities of each individual disk.

+ Striping – can spread out data between disks so that contiguous blocks may be read in parallel thereby increasing the read speed

graphics6
Raid 0 with striping

Raid 1:
Raid 1 has a single virtual drive which is represented by two physical drives. Both physical drives contain the same data which is called mirroring.

+ Faster read times because it can choose to read from the drive whose read head is closer to the desired sector to be read

- Writes may be slower since it must write to both disks in parallel

With RAID you can stack different levels of striping, concatenation, and mirroring to obtain the system you desire.

graphics7
Raid 1

Raid 4:
RAID 4 uses N-1 of its disks to hold data and one disk to hold an XOR combination of all of the data on the disks. This is advantageous because if any single drive is lost, it can be reconstructed using the remaining drives.

For example if Disk 2 in the picture below dies, we can reconsruct it using 2 = 1^3^(1^2^3)

The catch is that to gain the benefits of raid 4, one must always renew a drive as soon as it fails.
If two drives fail, data will be lost.

Since the Nth drive is very busy since it has to be updated any time any disk is written to, Raid 5 stripes the XOR disk across all of the disks so that no single disk is a hotspot.

graphics8