Lab 1. Time travel shell

Introduction

You are a programmer for Big Data Systems, Inc., a company that specializes in large backend systems that analyze big data. Much of BDS's computation occurs in a cloud or a grid. Computational nodes are cheap SMP hosts with a relatively small number of processors. Nodes typically run simple shell scripts as part of the larger computation, and you've been assigned the job of speeding up these scripts.

Many of the shell scripts have command sequences that look like the following (though the actual commands are proprietary):

  sort < a | cat b - | tr A-Z a-z > c
  sort -k2 d - < a | uniq -c > e
  diff a c > f

In this example, the standard POSIX shell executes the code serially: it waits for the command in the first line to finish before starting the command in the second line. Your goal is to speed this up by running commands in parallel when it is safe to do so. In this example, it is safe to start running the second command before the first has finished, because the second command does not read any file that the first command writes. However, the third command cannot be started until the first command finishes, because the first command writes a file c that the third command reads.

Your goal is to write a prototype for a shell that runs code like the above considerably faster than standard shells do, by exploiting the abovementioned parallelism. If this prototype works well, the idea is that you'll later (i.e., after CS 111 is over...) improve the prototype until it is production quality.

To simplify things, your prototype needs to exploit parallelism only at the top level, such as in the examples shown above. It need not parallelize subcommands. For example, it is OK if your prototype executes the following commands in sequence without parallelizing them, because all the commands are subsidiary to the parentheses at the top level:

  (sort < a | cat b - | tr A-Z a-z > c
   sort -k2 d - < a | uniq -c > e
   diff a c > f)

Your company's shell scripts all follow some simple rules which should make the prototype easier to write:

Your implementation will take three phases:

Implementation

A skeleton implementation will be given to you on CourseWeb. It comes with a makefile that supports the following actions. Your solution should have similar actions in its makefile.

Your solution should be written in the C programming language. Stick with the programming style used in the skeleton, which uses standard GNU style for C. Your code should be robust, for example, it should not impose an arbitrary limit like 216 bytes on the length of a token. You may use the features of C11 as implemented on the SEASnet GNU/Linux servers. Please prepend the directory /usr/local/cs/bin to your PATH, to get the versions of the tools that we will use to test your solution. Your solution should stick to the standard GNU C library that is installed on SEASnet, and should not rely on other libraries.

You can run your program directly by invoking, for example, ./timetrash -p foo. Eventually, you should put your own test cases into a file testsomething.sh so that it is automatically run as part of "make check".

More details on syntax and semantics of the shell subset

The "time travel" feature of your shell is feasible partly because of the restricted subset of the shell that you need to implement. Also, for this assignment, the shell has been simplified further so as to avoid some work that can be deferred until a production version.

Shell syntax subset

Your implementation of the shell needs to support only the following small subset of the standard POSIX shell grammar:

If your shell's input does not fall within the above subset, your implementation should output to stderr a syntax error message that starts with the line number and a colon, and should then exit.

Your implementation can have undefined behavior if any of the following features are used. In other words, our test cases won't use these features and your program need not diagnose an error if these features are used.

Time travel limitations on computations

Your implementation of the shell, when it is run in time-travel mode, can have undefined behavior if any of the following limitations are violated. When your shell is run in normal mode, it should not impose these limitations.

Don't care behaviors

Similarly, in some cases, your company's scripts don't care how your implementation behaves, and it's OK for it to depart from established semantics when it is run in time-travel mode.

You can simplify your shell in one other way, regardless of whether it is run in time-travel mode:

Submit

After you implement Lab 1a, submit via CourseWeb the .tar.gz file that is built by "make dist". Similarly for 1b and 1c. Your submission should contain a README file that briefly describes known limitations of your code and any extra features you'd like to call our attention to.

We will check your work on each lab part by running it on the SEASnet GNU/Linux servers, so make sure they work on there. Lab 1 parts are due at different times, but we will not grade each part separately; the lab grade is determined by your overall work on all three parts.

Design problem ideas

Here are some suggestions for design problems, if you have been assigned a design problem for Lab 1. You may implement one of them, or design your own. If you design your own, get approval from us before committing significant work to it. Your implementations should include test cases.

For Lab 1a:

For Lab 1b:

For Lab 1c:


© 2012 Paul Eggert. See copying rules.
$Id: lab1.html,v 1.6 2012/04/04 16:21:07 eggert Exp $