CS 111 Lecture 2 Scribe Notes (Fall 2010)

prepared by Eric Bollens for a lecture by Professor Paul Eggert on September 28, 2010

Table of Contents

  1. Operating System Goals
  2. Example Problem: Current GNU tar Exploit
    1. tar in General
    2. Exploiting tar
    3. Race Conditions
    4. Extending the Exploit to VMs
    5. Fixing tar
    6. Issues Indicative of Complexity
  3. A Thought Experiment: Simple, Secure OS
  4. Bootstrapping
    1. BIOS
    2. Chain Loading
  5. Returning to the Thought Experiment
  6. Programmed I/O
    1. Reading IDE Sectors
    2. Main Program
    3. Where from Here?

Operating System Goals

In considering an operating system, first let's consider several goals:

Example Problem: Current GNU tar Exploit

A directory traversal vulnerability currently exists within GNU tar.

tar in General

General syntax for archival using GNU tar:

tar cf archive.tar dir

System administrators often use tar to save the state of a file system to another storage device. In the above command line, archive.tar could be a backup of a file system (or a portion of a file system) under dir. One might then store archive.tar on an external device. Later, at a future time, if one desired an archived version of dir or any of its contents, one could restore it from archive.tar, retaining the same permissions, structure and data as existed for dir at the moment the archive was created.

General syntax for extracint an archive using GNU tar:

tar xf archive.tar

More information about GNU tar in general is available here.

Exploiting tar

To exploit the vulnerability, let's set up the situation with a couple assumptions:

Generally, attacker cannot access the file /home/victim/private/secret.txt because the permission on /home/victim/private prohibits it. However, through an exploit in tar, this protection vanishes if an archive is executed over the home directories of both attacker and victim; in this case, the attacker can manuever a copy of the secret.txt file into their home directory.

To leverage the exploit, attacker should create /home/attacker/d and then wait for tar to scan it for archival. At the moment after tar assesses the type of the folder, the attacker should issue the following command line:

rm -f /home/attacker/d; ln -s /home/victim/private /home/attacker/d;

The tar command, having already assessed the file as type directory, prepares to handle the contents it finds within /home/attacker/d as directory contents. Because tar was executed with root permissions, presumably so it could archive all home directories, it can follow sym links that attacker could not explicitly follow, such as the one substituted in as /home/attacker/d. Now, as it passes through the sym link while handling it as a directory, it will take the contents of /home/victim/private and store them within /home/attacker/d. The tar image shall thus contain /home/attacker/d/secret.txt. Then, if attacker were to request the archived version of the home folder from this archive, they would gain access to the file originally stored at /home/victim/private/secret.txt.

The attacker has approximately a microsecond to leverage this exploit. However, because of the scheduled nature of many backups, a script could sit their swapping the directory and sym link, and a viable chance exists that the exploit would succeed. However, it is also worth note that if one simply set up the sym link before tar assessed it, then tar would see it typed as a sym link and treat it correctly, choosing not to archive the linked-to contents.

Race Conditions

At a conceptual level, GNU tar utility performs two steps to handle a node:

  1. tar looks at the type of the file system node
  2. tar acts based on type with a system call

If one can replace the directory node with a sym link node after (1) and before (2), then one can fool tar.

Technical Aside

Not covered during class, but appended after the lecture, here is a brief explanation of the exploit, as so far as the author sees it. Disclaimer: the author is just a student in the CS 111 class without expert knowledge of this package and thus encourages you to view the code for yourself and make your own opinion. To view the code for GNU tar, both for the code before and after the fix, please see the source repository here.

At a technical level, for the exploit case described above, this bug is visible as follows (in the GNU tar source file create.c with definitions in tar.h):

  1. The routine void dump_file (struct tar_stat_info *parent, char const *name, char const *fullname) is initiated for the maliciously set up directory.
  2. The dump_file() routine calls static void dump_file0 (struct tar_stat_info *st, char const *name, char const *p) for the file system node in question.
  3. The dump_file() routine, processing the node, invokes the system call int fstatat (int dirfd, const char *path, struct stat *buf, int flags) upon the node, which populates the stat for the maliciously set up directory.
  4. The dump_file() routine procedes through other logic and then reaches a point when, perceiving the node as a directory, it initiates static bool dump_dir (struct tar_stat_info *st), and from there it begins dumping the contents of the folder.

The exploit itself works by racing with the dump_dir() routine after GNU tar has stored the file, thus swapping out the directory in favor of a sym link after the tar program has already defined it as a directory. This thus bypasses the defined SYMTYPE behavior that it should apply to the new, malicious sym link, instead treating it as though the node is of type DIRTYPE, as it was at the time of the stat.

A large subset of reported bugs and vulnerabilities stem from time-driven exploits such as this one, colloquially and formally called race conditions/hazards, where a particular ordering of asynchronous events (such as processs interleaving) may influence the final result.

Extending the Exploit in VMs

Physical Host
Attacker Virtual Machine
Victim Virtual Machine

The GNU tar directory traversal vulnerability may be extended to the virtual machine paradigm if, in the case of isolated virtual machines, the physical host executes GNU tar at the system level over the whole of the virtual machines, or at least over the attacker and victim VMs. In this case, the protection afforded to the victim from the attacker is generally believed more strongly enforced than just a standard permission scheme, the two operating environments explicitly and physically partitioned from each other. However, while again the attacker cannot follow a sym link to another VM, GNU tar executed at the system level will if it is fooled as described in the previous section.

One may hypothesize that the exploit can actually have even greater impact in the VM context. Let us assume that victim has protected the secret.txt file properly with file system permissions, such as 600 (rw-------). In the basic case, should secret.txt have possessed these permissions, then attacker could not have accessed the file. However, in the VM context, this protection can become almost irrelevant. tar considers file ownership based on UID, and therefore, if attacker either controls root on their domain or can facilitate a UID collision with victim user in the other VM, then the file system permission security model completely collapses.

Fixing tar

How do we fix GNU tar?

  1. Replace every system call in tar with system calls that won't follow sym links.
  2. Make file system read-only before doing archive.
  3. Use a different operating system paradigm for sym links.

Of these three choices, (3) is the least viable. A paradigm could be used where the user must have permission to traverse through the sym link when they create the sym link, but this leads to integrity challenges after the file system has changed further. To avoid this, the operating system would have to continually ensure the integrity of the sym link, and this is non-viable. Of the two remaining choices, GNU tar was patched with (1), where it was rewritten to utilize a different set of system calls that would not follow sym links under any circumstance. This was likely done because of the broad range of uses tar sees, including cases where it archives a live file system (which one should note is generally not a reflection of the file system at any single moment). Option (2), meanwhile, would have worked in some cases, but for archiving a live system, it would not have worked.

That said, in general, how do we ensure the integrity of an archive?

Issue Indicative of Complexity

The GNU tar program has existed, in almost the same form, since the eighties. That said, how could a bug be discovered nearly twenty years later?

The complexity of a large system, in this case through a combination of features, is obstructing the operating system's goal of protection. In this case, the features involved include virtual machines, symbolic links, dumping a live file system and backup/restore.

This issue is also indicative of issues raised by performance: the operating system cannot spend all its cycles checking sym links, nor can it guarantee that the sequential operations used by tar shall occur in an atomic state such that a race isn't possible.

Ultimately, this is an issue of complexity.

A Thought Experiment: Simple, Secure OS

A professor working on a grant proposal aims to have a simple word count program that runs simply and securely, protected from any other code written to disk or that may execute on the processor. Let us assume the following:

From the most simplistic of perspectives, we need to write a program that will get the program off of disk into main memory. For a moment, we will thus shelve this discussion while we consider this challenge in general, and will return to it here.

Bootstrapping

“Suppose you’re trying to get up on a horse, and, if you don’t have any other way to get up on the horse, you just lift yourself up by your bootstrap.”

In the early days of computers, before starting the computer up, one would key in machine instructions from toggles on the front panel of the machine. In this way, one could pick the starting addresses of a program, and once stored, one could activate the machine and it would run the selected program. For the layman and with an increase in the complexity of programs, this approach has since proven unviable in almost all situations, and has thus been replaced by a mechanism that uses a program, the bootstrap, to initialize and process the basic hardware environment and begin the bootstrapping process that eventually hands off to a full operating system.

BIOS

The basic program that begins the bootstrap is known as the BIOS (Basic Input/Output System), stored at a predefined address commonly in EEPROM (Electrically Erasable Programmable Read-Only Memory) or a derivative. As opposed to random access memories like SRAM and DRAM, this non-volatile memory remains durable (does not change or degrade to zeroes) across a power outage. In most cases, the BIOS segment of memory is also read-only and modifiable either only at the system level or even in some systems only via a special switch in the hardware.

When a computer powers on, it first jumps to the EEPROM region. For a very simple program, one could place the executable code directly in the EEPROM region. However, such a solution will not scale, and thus instead the BIOS is meant to be the catalyst to start the boot process. The EEPROM-driven BIOS is generally OS-independent and is intended to contain a minimal set of instructions that any system shall need.

Goals of the BIOS:

  1. Hardware checks to determine the general environment.
  2. Integrity checks against itself, memory, devices, etc.
  3. Old, primitive drivers for backwards compatibility.*
  4. Survey devices for a bootable device, and boot the first such device.

* The BIOS contains some old drivers, but most modern operating systems neglect the code in the BIOS for handling hardware, instead using their own, OS-specific drivers for the hardware.

Chain Loading

Ultimately, the final purpose of the BIOS is to determine a bootable volume and then boot it. As it looks for devices, it checks each for a master boot record. The master boot record resides in the first sector of a device, and contains, as the last two bytes of the first sector, 0xAA55 (Little Endian).

The master boot record (MBR) is one sector (typically 512 bytes) containing the following three regions:

1 2 3
  1. 446-byte region at the front with executable code, optional disk signature, and optional nulls.
  2. 64-byte region after the executable region for a partition table.
  3. 2-byte region at the end of the sector to designate it as a master boot record.

When the BIOS finds a master boot record, it copies it into a well-known location in RAM (0x7c00) and sets the program counter to it. If an operating system fits into 446 bytes, could be included directory in the MBR, but more commonly, further bootstrapping is required. Generally, the MBR will then load a volume boot record (VBR), the first sector of a bootable partition, from the first bootable partition as defined in the partition table.

The progression through various boot layers is known as chain loading:

  1. BIOS starts running and reads in the master boot record.
  2. Master boot record starts running and loads in the volume boot record.
  3. Volume boot record starts running and loads in the kernel.

In Linux, one more intermediate step is included with GRUB (the GRand Unified Bootloader):

  1. BIOS starts running and reads in the master boot record.
  2. Master boot record starts running and loads in the volume boot record.
  3. Volume boot record starts running and loads in GRUB.
  4. GRUB starts running and loads in the kernel.

More information on GNU GRUB is available here.

One of the best ways to break into a machine is to substitute a hostile version of the BIOS. For this reason, some computers require you to flip a jumper to overwrite the EEPROM, but the vast majority allow flashing the BIOS through a software layer. The best way to infect a machine is to get as close to the hardware as possible. If not the BIOS, then the MBR, and if not the MBR, then the VBR, and if not the VBR, then the kernel. This paradigm continues up through the system stack.

Returning to the Thought Experiment

With the booting process now established through the bootstrap paradigm, let's return to the thought experiment. To ensure functionality, let alone to ensure security, the system now faces two major problems:

  1. Disk Layout - Because of the boot process, a couple things have already been set. An MBR resides at the front of the disk and, for simplicity, let's assume the disk contains only one partition with a VBR at the front of it. Further, let's then assume that the VBR reads the application code and places it into memory. In addition, let's assume that the file is located at 0x1000000.
  2. Memory Layout - The BIOS is loaded into memory and a copy of the MBR, via the bootstrap paradigm, resides in 0x7c00. Let us then assume that we place the VBR at position 0x10000 and the word count program at 0x100000.

We need the ability to copy data off of disk into memory. Because disk is almost certainly bigger than memory, we will need to have a buffer as part of our word count program that reads the sectors off of disk into the file incrementally.

Programmed I/O

The programmed I/O approach represents one of the oldest and most basic approaches of moving data from disk to memory. At the most basic level, it involves a number of registers that serve as registers of the disk controller, utilizing an instruction to copy one byte from the disk controller register to the CPU and vice versa. In x86, this functionality is provided by the inb and outb instructions.

The basic process:

  1. Wait until the controller is ready, signified when 0x1f7 is set to 0x40.
  2. Store the number of sectors for the controller to read into 0x1f2.
  3. Store the disk sector offset into 0x1f[3-6], a 32-bit value representing the offset on disk.*
  4. Store the read command into 0x1f7.
  5. Set results in CPU and store in RAM.

* The 32-bit offset implies 232 sectors times 29 bytes per sector for a total representable pool of 241 bytes, a la 2 TB. Therefore, recently there has been a push towards 4 KB sectors to extend this maximum.

Reading IDE Sectors

The following routine reads an IDE sector into the buffer:

void read_ide_sector ( int s, char buf[512] )
{
  while( (inb(0x1f7) & 0xc0) != 0x40 )
    continue;
  outb( 0x1f2, 1 );
  outb( 0x1f3, s&0xff );
  outb( 0x1f4, (s>>8)&0xff );
  outb( 0x1f5, (s>>16)&0xff );
  outb( 0x1f6, (s>>24)&0xff );
  outb( 0x1f7, 0x20 );
  while( (inb(0x1f7) & 0xc0) != 0x40 )
    continue;
  insl(0x1f0, buf, 128);
}

A few considerations about this routine:

This routine can thus be refined slightly optimally as:

void wait_for_ide ( )
{
  while( (inb(0x1f7) & 0xc0) != 0x40 )
    continue;
}
void read_ide_sector ( int s, char buf[512] )
{
  wait_for_ide( );
  outb( 0x1f2, 1 );
  outb( 0x1f3, s );
  outb( 0x1f4, s>>8 );
  outb( 0x1f5, s>>16 );
  outb( 0x1f6, s>>24 );
  outb( 0x1f7, 0x20 );
  wait_for_ide( );
  insl(0x1f0, buf, 128);
}

The M/VBR would then load the program as such:

for( i=0; i<size_of_program_in_sectors; i++ )
  read_ide_sector(i, 0x10000+(i-1)*512);
goto 0x10000;

Main Program

Once loaded, our main routine is relatively simple:

#include <ctype.h>
int main( void )
{
  int nwords = 0;
  bool inword = false;
  int s = 1000000000/512; /* number of sectors to starting address of file */
  for(;;){
    char buf[512];
    read_ide_sector(s, buf); /* read current sector to buffer */
    for(int j=0; j<512; j++){
      if(buf[j] == 0){ /* if null terminator is reached in buffer */
        nwords +=inword;
        write(nwords); /* write output to display */
        return 0;
      }
      bool thisalpha = isalpha((unsigned char) buf[j]); /* must cast to unsigned char for isalpha */
      nwords += inword & ~thisalpha;
      inword = thisalpha;
    }
  }
  return 0;
}

Where from Here?

0xB8000 is a memory-mapped region that is written to the display. With that given, consider for next time how to define a write() routine.

Putting all this together, this is a very minimal kernel for an operating system. That said, the next question to ask: what can go wrong here?