Lecture 13: File System Robustness

Scribe notes for Feb 27, 2013

Muh-Wei Chern

File System Goals

Performance

Durability Atomicity (Correctness) What can go wrong with durability?
image1
What failures are we worried about?

We want to come up with a model for failures; this model is crucial because it will affect our design. We want the model to be simple and corresponds to reality.

For now, let us worry about a simple model that has no battery and the only failure it has is losing power.
image2

Suppose you have a write_sector, what happens if you lose power in the middle of writing write_sector?
1. It might still work!
2. Bad sector results because it is writing in process.

To model this:
image3
image8

"Classic" Solution
3 copies:
image4
Write all three copies, one at a time.

To read:
if F1 = F2
      return F1
else
      F3

We want less than 3 copies

Lampson-Sturgis Assumptions

Rename system call: rename("a","b")
image5
1. Read inode 357 into memory
2. Read data block 952736 into RAM
3. Change "a" to "b" in RAM
4. Write data block out to disk
5. Update inode time stamp
6. Write inode to disk

What can go wrong?
Suppose there is a crash between steps 4 and 5, after reboot, data will change but inode will not be updated.

CS111 Assumptions
Single data block writes are atomic; either all blocks write successfully or none gets written, all-or-nothing.

rename("d/a", "e/a"): Renaming a file that used to be in directory d to a file in directory e.
image6

1. Write 1012
2. Write 952736

We want to do step 1 before step 2 because if we do step 2 first and it crashes, the file "a" would disappear.
The advantage of doing step 1 before step 2 is that we can keep track of the file, but the disadvantage is if it crashes between steps 1 and 2, it will lose the link to the data, which is bad.

The solution would be to do something between steps 1 and 2.
1. Write 1012
2. Write "921 with a link count of 2"
3. Write 952736
4. Write "921 with a link count of 1"
This solution would cause memory leak sometimes and we can still lose data if it crashes between the new steps 1 and 2.

The solution to the case above would be to switch the order of steps 1 and 2.
1. Write "921 with a link count of 2"
2. Write 1012
3. Write 952736
4. Write "921 with a link count of 1"
If it crashes between the new steps 1 and 2, it would cause memory leak but will not lose data; having memory leak is better than losing the whole data.

Models of Underlying Systems

Example: for BSD-style file system
Recall that a BSD-style file system has superblock, bitmap, inode table, and data blocking containing directories, indirect blocks and ordinary data.

Invariants for our system
(1) Every block in the file system is used for exactly one purpose.
(2) All referenced blocks are properly initialized with data appropriate for their type.
(3) All referenced data blocks are marked used in bitmap.
(4) All unreferenced data blocks are marked as free in bitmap.

Consequences of Violation
Violating (1), (2) or (3) would cause a disaster. Violating (4) would cause memory leak, so (4) is the only candidate to be removed.

Two ideas for Building Robust File Systems
1. Commit record
A separate record that is used to mark whether an atomic action is done or not done.
image7
File system code for doing commits, which appears to be atomic from a user's point of view:
BEGIN
      Pre-commit phase (invisible to upper level)
COMMIT or ABORT (make changes visible)
      Post-commit phase (invisible to upper level)
END

2. Journaling
Append-only file system: The downside of this system is that it wastes storage, but the advantages are that there is no link count problem, it solves many inconsistency problems and it can avoid seeks if it is mostly writing.

Hybrid file system: Journal (append-only) + Cell storage

Write-head log

Write-behind log