CS 111 Spring 2010

Scribe Notes for 5/13/10, Lecture 13

by Joseph Silva, Kevin Deggelman, and Khai Nguyen

 
File System Robustness

Useful Links

Terminology

  • error - a mistake made by the designer (in the designer's head)
  • fault - a bug in the code which has yet to be triggered
  • failure - a bug which has been triggered and noticed by users

File System Goals

  • durability - ability to survive limited failures in hardware (according to your failure model)
    • For example: a power failure while writing a block
  • atomicity - changes are either fully realized or not realized at all
  • performance - Throughput & latency
    • example: larger batch sizes increase throughput at a cost to latency

example system call

    rename("a","b")
a) find data block
b) read data into RAM
    b-1) check if "b" exists, if so mark it as unused
           if we update inode info here link count may be 1 less than it should (dangerous)
c) change "a" to "b"
d) write block to disk
    if we update inode info here link count may be 1 more than it should (preferable)

Note: Assume sector writes are atomic but block writes are not


Why not have rename("a","b") fail if b exists?
  • Professor Eggert needed to update small applications remotely
  • He wanted to enforce atomicity in entire /usr/local directory
  Solution:
  • create /usr/local2/ (not atomic)
  • rename ("usr/local2", "usr/local")
    • fails because they are directories
    • instead use a symbolic link and change that
                $ cd usr
                $ ln -s local2 tmp
                $ mv tmp local                //rename("tmp","local")

Suppose we want to rename("d/f","e/g")
  • We will have to change
    • data block in d
    • data blocks in e
    • inode for old e/g
    • bitmaps for old e/g
    • bitmaps for e

File System Correctness Invariants
  1. every block in file system is used for exactly 1 purpose
    1. boot block
    2. superblock
    3. bitmap
    4. inode
    5. data
  2. All reachable blocks are properly initialized
  3. All reachable data blocks are marked as used in the bitmap
  4. All unreachable data blocks are marked as unused in the bitmap
  5. Every inode's link count is no less than the number of directory entries
  6. Every inode's link count is no greater than the number of directory entries
Numbers 4 and 6 will cause memory leaks, all the others lead to disaster, obviously we prefer a memory leak to a disaster.  Memory leaks can be recovered from using fsck.


fsck

  • Checks file system for leaks, fixing bitmap accordingly
  • filesystem should be unmounted
  • to fix root file system must boot into single-user mode, mount root filesystem as read-only
  • can also fix potential disasters
  • orphaned files (positive link count with no directory entries) are put into lost+found
  • a file with a 0 link count but directory entries has its link counted increased to its number of directory entries
A common place a malicious program may hide files is in free space, which is only accessible from the raw device

Idea:
Think of write as a slow phase between scene a and scene b. The slow phase is undefined and contains garbage data.


How to implement atomic write atop hardware that doesn't write atomically
X = original file
Y = new file
? = middle of a write (unknown state)

Write 3 copies:
Copy/Case
1
2
3
4
5
6
7
A
X
?
Y
Y
Y
Y
Y
B
X
X
X
?
Y
Y
Y
C
X
X
X
X
X
?
Y
When reading:
if (A==B) return A
else return C

Write 2 copies:
Copy/Case 1
2
3
4
5
A
X
?
Y
Y
Y
B
X
X
X
?
Y
When reading:
if (A==B) return A 
else return B

Lampson-Sturgis assumption
  1. Writes may fail or corrupt other pieces of storage (a later read will catch the bad write)
  2. Storage may spontaneously decay (error correction will catch it)
  3. Errors are rare -> you can assume that they can be corrected as they happen without worrying that they might cascade

Performance as the Enemy of Robustness
  • Suppose you file system needs to do 3 dozen writes
  1. Sort by disk address
  2. Write in sorted order
  • This results in a big performance increase, however it may ruin decisions made at the higher level to prevent disaster
  • Solution: Pass dependencies to lower level and sort without violating them
  • Result: Not as great of a performance increase but still much better than not sorting

Suppose we are saving a file in Emacs ^X^S
write()
//CRASH HERE!
write()
write()

This violates the GOLDEN RULE OF ATOMICITY:
Never write over your last copy of info.

Try again:

IDEA #1

  • Write to new file system first, then copy back to actual
    • Again we have the problem of not knowing which copy is correct following a crash
  • Keep an extra bit of info to tell us which copy to use called the "commit record"
Write to new file system
Set bit
Copy to actual file system
Clear bit
  • Alternatively we could alternate between the two copies and use the bit to keep track of which one is current, saving us an extra copy each time
High level Atomic Operations
BEGIN
write write write
COMMIT
post commit phase //if needed for performance, fsck does this after crash                         
END

IDEA #2 : Journaling
  • Partition disk into two regions
    • cell data - normal filesystem content


















































    • journal - long array of commit records

      • writes to the log are sequential
      • could be put on a different physical drive for a performance increase
  • After a crash fsck looks for everything thats been committed but not finished and fixes it

Logging Protocol

A: Write Ahead Logging
Begin
log update into journal (e.g. "block #9637 should contain "xyzw----")
commit
write new data into cells
End

B: Write Behind Logging
Begin
log all old cell values
install new values into cell data
commit
End

Advantages of A
  • writes to the log are sequential
  • need not read old values
Advantages of B
  • better for buggy systems
  • fsck does less work -> faster reboot recovery