Lecture 12 Scribe Notes – Computer Science 111, Winter 2012

Andrew Tan and Ryan Lewe

Lecture by Paul Eggert on Feburary 27, 2012

Lecture 12: File System Implementation

Aside: Bug in GNU Grep

Example of a weird pattern that causes a problem:

grep '^.*x\|' file

which means: find a line that begins with an arbitrary number of characters followed by an 'x' followed by an empty string followed by a copy of the empty string. Suppose the file has a line containing 2^31 + 2 bytes such that the pattern matches. Grep internally records the length of the match and stores it in an integer variable.

int len = strlen (“...x”);

Since int is a 32-bit quantity, the length of the match is too long, making len negative. Grep later uses this negative value to subscript into an array, scribbles on arbitrary pieces of memory, and an attacker can get grep to run arbitrary code on root. As a fix, use size_t instead of int.

Assume size_t is done. We still want to attack. So, launch a denial of service attack. Take advantage of the inode and file system structure. Use holey files.

Holey Files

Holey files – files full of zero data that look large from the user point of view. Holey files are cheap to create. To make a holey file, you can use:

$echo x | dd > holey bs=1MB seek=2040

which writes an x to the 2 GB point. Using seek allows you to avoid writing to earlier parts of the file.

Ideas to fix the denial of service problem by patching grep:

1.) grep is for text, it can do whatever it wants with non-text.

Idea: have grep skip zeros or treat them as newlines.

2.) Idea: detect holey files. Detect holey files using ls to compare nominal versus physical sizes of files.

Levels of a Unix-like File System

1.) Symbolic links – files that contain file names

2.) File names – whole strings with slashes

3.) File name components – part of name without slashes, names of individual directory entries

4.) Inodes – type ino_t

5.) File systems

6.) Partitions

7.) Blocks – linear array (1/16 size of sector)

8.) Sectors on a disk – linear array (in your head)

Implementing Symbolic Links

A) Use an inode for a symlink

B) Put symlink contents in the entries of the inode

Faster than A because you don't have to move disk arm back and forth. However, limited by symlink length of 96 bytes.

C) Symlinks as merely directory entires, not inodes.

Advantage: when you're following a symlink from a directory, which is the normal place you use a symlink, you can look at its contents immediately. Don't have to waste time looking in the inode table.

Linux -ext3 v2 Directory Entry Components:

1.) 32-bit inode

2.) 16-bit directory entry length field

3.) 8-bit name length

4.) 8-bit file type

Two length fields (directory entry length and name length) for efficienty of the unlink system call.

Removing Files

Say there is some file with a respective inode:

Suppose we want to delete files in a file system and do:

cd /my/file/system

rm -rf *

The problem is that even after removal command, data of files are still accessible because data blocks still exist including inodes. The data and metadata are still recoverable. If we want to wipe out the metadata information, we can create a bunch of files until the shell cannot create any more files because inode table space is full. However, data and data blocks still exist.

If we want to wipe out the data as well, we can pull data from /dev/random and overwrite the files. We can say:

dd if=/dev/random of=x

rm x

/dev/random is a device that gives you random bits that you can write into a file until it fills up. Then, you can remove the file. The data blocks still exist, but are full of random data. However, /dev/random is very slow. If we want to remove files faster, we can use:

dd if=/dev/urandom

if the entropy pool of the operating system runs low, urandom generates its own random bits which is faster, but not as random as /dev/random. A potential problem arises if /dev/urandom's algorithm for generating random bits is known. To remove files even faster, we can use:

dd if=/dev/zero

This will overwrite data blocks with zeros rather than with random data. This is much faster, however, there is a potential problem on a physical level. When you write a pattern onto a magentic drive, traces of those patters are still left over, especially if you overwrite with zeros. That is why overwriting files with random data is safer than overwriting with zeros.

Shred Command

We can use a command line application to remove files called shred which uses its own random number generator (faster than /dev/random or /dev/urandom), overwrites the files multiple times, and unlinks them.

$shred file

$shred /dev/dsk/a1

This leaves less remnants of the file on the magnetic drive as the data becomes jumbled up and randomized.

Shred issues:

1.) Bad blocks – In practice, some sectors go bad. At the low level, each sector contains extra information that lets the system know the block has gone bad. We can see that a block is about to go bad before we lose the data. The system retrieves the data from bad blocks before they go completely bad and remaps it to an extra region at the end of the disk. A bad block table keeps track of all the bad blocks and where the got remapped to. The problem is that $shred can only shred good blocks. Data from bad blocks can still exist. To fix this problem, we need an extra hardware interface to overwrite data from bad blocks.

2.) Shred is too slow – One way to implement a faster shredding is to encrypt the disk. All the data on the disk is encrypted and the key is kept somewhere else. To “shred” the disk, throw away the key.

Symbolic Links vs. Hard Links

We use two different system calls for symbolic links and hard links. To create a symbolic link at the system command level, for example:

ln -s d/a e/b symlink(“a”,”b”)

To create a hard link, for example:

ln d/a e/b link(“a”,”b”)

The difference is in the way the file system operates. In a hard link, we have two different directory entries that point to the same inode. Somewhere in this inode is a link count. The link count is incremented each time a hard link is created. For symbolic link, an inode contains a symbolic link which does not point directly to the original file. Some main differences:

1.) Symbolic links can dangle, hard links cannot. Under the scenario with the hard link, after you remove the original link, the clone still exists so the file still exists. If however you create a symbolic link and then remove the source, you will then have a symbolic link that points to nothing.

2.) Symbolic link is relative to the containing directory. The interpretation of a symbolic link depends on the directory where it is found. (This does not apply if the file name starts with a slash, where the containing directory is absolute.) We can do:

ln -s foo d/x

ln d/x e/y

cat d/x is then equivalent to cat d/foo. This symbolic link is relative to d. cat e/y is then equivalent to cat e/foo. The symbolic link is then relative to e.

3.) Symbolic links can loop. With symbolic links for example, we can do:

ln -s a b

ln -s b a

Opening the file “a” will fail because it will cycle through a loop. The operating system can detect loops by following a chain of symbolic links and stopping after a certain number of symlinks (approximately 16).

4.) Can't create hard link to directories. This avoids hard link loops. If we allowed it, programs like ls would loop and it would be very difficult to implement special checking.

Name Look Up

The kernel walks through file names and maps each file name component to an inode. Each inode maps to a data page which directs the kernel to the next inode.

Special cases:

1.) Multiple slashes: “a//b/c”, treat multiple slashes as a single slash

2.) Slash at the end: “a/b/c/” works only if c is a directory, so check if c is a directory otherwise invalid.

3.) Slash at the beginning: “/a/b/c”, start with the root directory.