Lecture 6 Scribe Notes - Files and Signals

1/28/2013 - by Andy Huang and Stephen Phillips

Table of Contents

  1. Last Lecture
  2. The Big Idea of Unix
    1. Two Different types of files
  3. Processes and File Descriptors
    1. File Descriptors
  4. What Can Go Wrong With This Approach
    1. Write to a file Descriptor That is Closed
    2. Closing and Opening Aliases
    3. File Descriptor Leaks
    4. Devices that Vanish
    5. Reading/Writing to Files That Are Unlinked
  5. Creating a Temporary File
    1. Solutions
      1. System Call
      2. Library Function
      3. Library function That Checks for Other Processes
      4. Library Function Without Race Condition
      5. Library Function That Doesn't Unlink the Temporary File
    2. File Locks
  6. Piping
    1. Advantages and Disadvantages of Pipes
    2. Implementing Piping in the Shell
    3. What Can Go Wrong With a Pipe
  7. Interrupts and Signals
    1. Power Supply
    2. How to Inform the User Processes of the Situation

0. Last Lecture

We discussed orthogonal design of systems. Then we talked about the process API in Linux and how processes work in Unix based systems. We ended by beginning to talk about files and file descriptors.

1. The big idea of Unix

1.1. Two Different types of files

Stream Oriented Random Access
Like keyboard Like disks
  • Spontaneous data generation
  • Infinite Input
  • Request/ Response
  • Storage
  • Finite Capacity

Read works for both. lseek works only on random access, and fails on streams

2. Processes and File Descriptors

Last time, we talked about doing the Process Table

File Descriptors on Process Table

2.1. File Descriptors

Some file descriptors are fixed by convention

These are important to remember

How many file descriptors can you have? No matter what choice you make, it will be wrong. Too many, and you have too much overhead in the processes. Too few, and application writers can't do what they need to because they can't open up files. Generally the comprimise is 1024 files are allowed to be opened

3. What Can Go Wrong With This Approach

  1. 3.1. Write to a File Descriptor That is Closed

    close(1);
    i = write( 1, “x”, 1);
    if(i < 0)
      print(strerror(errno));

    Write simply returns an error. Simple solution for this problem

  2. 3.2. Closing and Opening Aliases

    // Say this equaled 12
    int fd = open(“foo”, ORDONLY);
    close(fd);
    // This will equal 12 too - since the kernel reuses file descriptors
    int fd1 = open (“bar”, O_RDONLY);
    read(fd, buf, sizeofbuf); // You will read the wrong file

    No simple solution for this. The kernel can't just never use a file descriptor after it's been closed, since it has finite memory. The programmer must keep track of this himself/herself.

  3. 3.3. File Descriptor Leaks

    for(i = 0, i < N, i++) {
      int fd = open(file[i], O_RDONLY);
      if(fd < 0);
       error();
      Read_and_Copy(fd);
    }
    //No file descriptors were closed, Leaking!

    This is a problem that also cannot be fixed by the kernel, but must be handled by the application writers

  4. 3.4. Devices that Vanish

    fd= open(“/dev/usb/flash01”, O_RDONLY)
    // ...
    // Physically remove flash drive
    read(fd, buf, size buf);

    This returns 1 with special errno which is not standardized

  5. 3.5. Reading/Writing to Files That Are Unlinked

    fd= open(“/tmp/foo”, ORDONLY);
    unlink(“/tmp/foo”);
    // This can be done by your own application or another
    read(fd, buf, sizeof buf);

    This Succeeds! Kernel keeps a pointer to the file in the storage, but we can’t find that file in the directory.

    Why is this? To keep unlink and read orthogonal and so they don’t interefere with each other. So, if multiple processes are reading a file, and one of them unlinks the file, the other processes may continue reading the files. After the processes are done using this file (exit, etc.) then the system will reclaim that memory it used.

4. Creating a Temporary File

So, seeing as that we can close files and still use them, we might be able to use that for temporary files. So, how do we do it?

4.1. Solutions

  1. 4.1.1. System Call

    // Not a real system call
    // Creates a temp file that is not visible to
    //   any directory, or to the system
    fd = mktempfile();
    // Now the application does stuff with the file
    write();
    read();
    close(fd); // Reclaims

    The problem with this is that we already have this functionality in the open system call

  2. 4.1.2. Library Function

    int mktmpfile(void) {
      int fd = open(“/tmp/foo”, O_RDWR | O_CREAT | O_TRUNC);
      // Flags:
      // O_RDWR - Give read write access
      // O_CREAT - Make a file if not already there
      // O_TRUNC - Truncate the file to length 0 if it already exists
      if (0 <= fd) // Race can happen here!
        unlink(“/tmp/foo”);
      return fd;
    }

    This library function is buggy! There is a race condition between the if statement and the unlink. What if two files do this at the same time? Then they wil both be writing and reading to the same file. Obviously we need to handle this.

  3. 4.1.3. Library Function That Checks for Other Processes

    int mktmpfile(void) {
      struct state st;
      if(state(“/tmp/foo”) &st) == 0) {
        errno = EEXIST;
        return -1;
      }
      int fd = open(“/tmp/foo”, O_RDWR | O_CREAT | O_TRUNC);
      if (0 <= fd) // Race can happen here!
        unlink(“/tmp/foo”);
      return fd;
    }

    This still has the race condition! If a process links to the file between the if and the unlink.

  4. 4.1.4. Library Function Without Race Condition

    int mktmpfile(void) {
      // Make file name unique to process id
      char foo[sizeof(“tmp/foo”) + sizeof(pid_t) * 8 ];
      sprintf( foo, “/tmp/foo %lld”, (long long) getpid());
      int fd = open(“foo, OLRDWR|O_CREATE|O_TRNUC, 0600);
      // The 0600 for security so only your proc can read/write to the file
      if(0 <= fd)
        unlink(foo);
      return fd;
    }

    However, there is still a problem with this code. If a application makes a lot of files and unlinks them, then we won't be able to see who using disk space. If we run out of disk, we won't be able to find out who to close to free it. So it's actually best not to unlink the files.

  5. 4.1.5. Library Function That Doesn't Unlink the Temporary File

    int mktmpfile(char *name, size_t size) {
      struct stat st;
      do { // Can have the SAME race condition here
        generate_random_file_name(name, size);
      } while (stat(name, &st) == 0);
      int fd = open( name, O_CREAT | O_RDWR | O_EXCL | 0600)
      //O_EXCL is a flag that says if file exists, fail
      return fd;
    }

    This code STILL has the race condition. So instead:

    int mktmpfile(char *name, size_t size) {
      struct stat st;
      do {
        generate_random_file_name(name, size);
      } while((fd = open(name, O_CREAT | O_RDWR | O_EXCL| 0x600) < 0 
                  && ok_errno(errno))
      // Added the errno so that if there is an error that prevents constantly
      // the generation of a good random file name, then it will recognize the
      // situation and end
      return fd; 
    }

4.2. File Locks

This temporary file buisness is very complicated. Can’t we lock the files somehow?

File locks do exist in POSIX/Unix

fcntl(fd, flags, p);

p is a flock struct

struct flock * { short l_type, l_len, l_start };

flag is for what kind of lock you want to get

These are primitives, BUT:

Only in a place where everyone cooperates do these file locks work. That's why the database crowd likes these. However, in most environments, using these will be of no benefit to you, since there are many "impolite programs" out there.

5. Piping

To get the output of one file to the input of another, we must compare

a > t
b < t
versus
a | b

5.1. Advantages and Disadvantages of Pipes

Advantages of pipes:

Downsides of Pipe:

Pipe Objects

Pipe Object

5.3. What Can Go Wrong With a Pipe

If we have A | B then we might run into some problems.

  1. B reads but pipe is empty (i.e. A hasn’t written anything, yet)?

    Solution:
    • Read waits
  2. A writes but pipe is full (i.e. B hasn’t read recently)?

    Solution:
    • A hangs
  3. A writes, but B closed its end of pipe

    Solutions:
    • Writes -1, errno = ESPIPE
    • Process A gets SIGPIPE signal (default) (more on signals later)
  4. B reads, but a has closed.

    Solution:
    • Read returns 0 (EOF). This is a very natural solution in this case.
  5. A is done, generating output, but forgets to close its end

    Solution:
    • There is no solution. This is the application writers issue. Read hangs forever

5.2. Implementing Piping in the Shell

We want to be able to use a | b in shell, so to implement this, we have to use the dancing pipes.

3 processes:

So we’ll have to run fork twice

int fd[2];
// When you fork, the file descriptors are also passed over
bp = fork();
if(bp == 0) {
  pipe(f);
ap = fork();
if( ap == 0)
  // This is a hand-wavy function
  run(a); // clone fd[1], into 1
else
  run(b); // clone fd[0], into 0
}

Now suppose a exits and suppose b uses read(0, ...). Can this hang?

Yes if b forgot to close fd[1]. Otherwise, it will wait to read from itself, and it will never write.

6. Interrupts and Signals

6.1. Power Supply

Uniterruptable Power Supply

How should Kernel deal with running out of power?

Take a shapshot of RAM and the registers, copy to disk, then shut off power, and the programs don’t even notice!

Downsides

For these reasons, we want the applications/processes to know what is going on, and to handle a power outage accordingly.

6.2. How to Inform the User Processes of the Situation

  1. Use a file "/dev/power/". Read from it, and if you get one character, say '!', then the power is low, if you read another, say ' ', then the power is OK.

    This process is called polling. Every program that cares about it. Every program that cares about power must take every loop that might execute for more than a few secconds and insert a check.

    Problems

    • Pain to program
    • Can chew up CPU time
      if(power_is_low())
        fix_situation();
  2. Use signals

    With signals, the kernel “magically” inserts a check into your program when power gets low, and calls the "fix_sitation" appropriately

And to find out how that works, we'll have to wait until next time