CS 111: Scribe Notes

File system design (11/04/2011)

Scott Johnson

Hard Links

Short Recap

link("a", "b") user$ ls -li
 9382313 -rw-r--r-- 2 user staff 9 Nov 8 10:45 a
 9382313 -rw-r--r-- 2 user staff 9 Nov 8 10:45 b
|-------|          |-|                        |-|
  inode        link count                  file name

unlink("a") user$ ls -li
 9382313 -rw-r--r-- 1 user staff 9 Nov 8 10:45 b
|-------|          |-|                        |-|
  inode        link count                  file name

User considerations

Can a user create a directory entry pointing to garbage?

Would result in a dangling pointer
A dangling pointer (or wild pointer) points to an unexpected value/object dangling pointer

What if we had a destroy syscall?

Motivation for destroy
User spock downloaded a video recording of a PBS special on the psychological effects of pon farr, and now wants to delete it because his home computer is almost out of disk space. He executes rm ponfarr.mpeg but no space has been freed! There must have been another hard link to it... Does he now have to search his computer for all hard links to the file?!

The destroy(inode) system call would Pros: This function could immediately remove a file
Cons: No practical way for the OS to where all the links are, so would have to either (1) search for the file starting at the root directory, or (2) complicate the directory structure by implementing a way to keep track of all hard links.

Partial solution: ftruncate(fd)
Destroys all blocks in a file (resulting size = 0). Hard links remain, but to a file of size 0. This can be done at shell level (assume big is a big file): user$ > big

Creating DAGs and cycles in directories

Don't worry, Unix doesn't allow you to do this! dag and cycle
Why doesn't Unix allow you to do this?
1) cd a; pwd  Should this report that you are in a or c?
2) cat a/d/a/d/a/d/a/d/b  Ridiculous! But would work.
3) find . -name b  Found it in a/b, a/d/a/b, a/d/a/d/a/b, ... That could go on forever.

The great Unix solution: no hard links to directories

Symbolic Links

General Introduction

What are symbolic links (symlinks)? creating a symlink
Take a look at the important directory entries for the above situation after the symbolic link is created (say, we want to access sh which is located in /usr/bin) directory structure after symlink
The bin symlink stores the path /usr/bin. If the user types /bin/sh from the root directory (designated / in Unix), the OS does textual substitution resulting in the path /usr/bin/sh. Note: symbolic links can be chained, resulting in multiple textual substitutions during name resolution.

Potential problems with symbolic links?

Connection back to GNU tar problem

What was the GNU tar problem?

New syscalls/ideas that attack this problem

File Name Resolution

  1. Look at the first byte. If /, start at the root directory, else start in the CWD (current working directory)
  2. Walk through the string left -> right and look for filename components
  3. For each filename component, resolve that component by looking for directory entries that match the name of the component
  4. If one of those entries is a symlink, splice in the contents of the symlink and recurse
    • Resolve the symlink relative to the symlink directory

Random notes on chdir and chroot

File System Robustness

Goals:
  1. Good performance - high-throughput/low-latency
  2. Durability - if there is a failure in the general underlying system, e.g. hardware, the data survives
  3. Atomicity - data remains consistent if there is a failure: actions are either done or not done, e.g. write("abc", ...) will never give just ab
Note: We must set certain limitations. We can't expect durability if someone detonates a thermonuclear bomb next to our system.

Robustness pertaining to SAVE in a text editor

SAVE writes the contents of RAM to disk, in general taking several writes. What if we assume a power failure can occur at any time?
Golden rule of atomicity: Never overwrite your only copy of the data!

Situation: You write to a copy and the power goes out. Which version is correct? The copy or the original?

Commit Records

What are commit records?

If we assume no atomicity in writing to the commit record

We want to write Y to the commit record to indicate that the contents of Y are correct instead of X. Let's look at a 1 block commit record over time if we assume the write is not atomic:
1 commit record
What if we add another block to the commit record? (1) We write Y to commit record 1 while commit record 2 indicates X; (2) Write Y to commit record 2. At every point we have either X or Y:
2 commit records
Ok, we can actually solve this with 3 records. At any given time, if (record 1) == (record 2), then use that value, else use the value in (record 3).
3 commit records

So, do we really triply redundant commit records?

Lampson-Sturgis Assumptions

  1. Writes can fail
  2. Storage (blocks) can spontaneously decay
  3. The read will tell you if a block is bad**
  4. Errors are rare
  5. ... there are more but not directly pertinent ...

** If this is true, then we only need doubly redundant commit records, because we can tell if one of the blocks is bad, then just use the other

Adventures with rename

rename("a", "b")
  1. Read directory block
  2. Modify "a" -> "b"
  3. Write directory block
If we assume (1) atomic writes and (2) atomic block writes, then we are fine - the system will never be in an inconsistent state.

rename("d1/a", "d2/c")
  1. Read d1's blocks into RAM
  2. Read d2's blocks into RAM
  3. Modify main memory copy
  4. Write d1's data
  5. Write d2's data
Fix by complicating the algorithm:

Correctness Invariants

  1. Every block should be used for exactly 1 purpose - data, indirect, bitmap, inode, superblock, boot block
  2. If a block is reachable, it should contain initialized data
  3. All reachable data blocks are marked as used in the bitmap
  4. All unreachable blocks are marked as free in the bitmap
Consequences of violating these (in same order as above):
  1. You lose data: DISASTER!
  2. You allocate data twice and lose data: DISASTER!
  3. You lose data: DISASTER!
  4. Actually not so bad, disk space leak, but the program fsck will try and fix this