CS 111 Scribe Notes November 23rd, 2010

Lecture 16: Media Faults, RAID, and Distributed Systems

Kevin Hutchins

Media Faults

Assumptions:
1. Assume drive can fail, but power remains.
2. Assume failures are dectable. Failures are typically detected on read, but can be detected on write. To catch problems as quickly as possible, use read-after-write (read after every write to ensure correctness). This improves checking quality, but is much slower on disks, because we have to wait for the disk to rotate to the correct sector again.

RAID

RAID 0 - Concatenation
RAID 0
RAID 1 - Mirroring
RAID 1
RAID 0+1 and 1+0
RAID 01_1
The odd drives are mirrors of the even drives. Each drive pair is concatenated together. RAID 01_2
Drives 0-4 and drives 5-9 are concatenated together. Drives 5-9 are a mirror of drives 0-4. David Patterson RAID 4 - Block-level striping with dedicated parity
RAID 4
RAID 5 - Block-level striping with distributed parity
RAID 5

Failure Rates

Terminology:
Failure Rates Diagram

Distributed Systems

Dealing with architectural differences
Example of RPC
Scenario: A client talking to a window server.
Request: Draw 10 20 blue
Response: OK
Result: The pixel at coordinates (10,20) is turned blue.
RPC Failure Modes Solutions for Lost Messages:
AT-LEAST-ONCE RPC: If no response, resend the request (this isn't always the correct action to take).
AT-MOST-ONCE RPC: If no response, report an error.
EXACTLY-ONCE RPC: This is the ideal. It is also very hard.

Performance Issues
Synchronous vs. Asynchronous Calls