CS 111 Lecture 3 notes

Scribes: Moiz Mian, Sakib Shaikh, Mohammad Poswal, Syed Moizullah

Lecture date: Monday, January 23

Quick Links
Modularity & Abstraction
Mechanisms for Modularity

Issues with the standalone app we made last week:

Multiple copies of read sector exist in MBR, VBR, BIOS, and the app itself. We want to have only 1-2 copies max if we can.
The x86 code will not work on machines with other ISA's
Our data goes from disk-> CPU-> RAM, but we would like to have the disk controller send it directly to RAM. (This is called Direct Memory Access)
Since reading memory takes so long, we should take advantage of that downtime to do computations. To do this we can use Double Buffering: one buffer holds the sector of memory that we are currently counting, while the second buffer is accepting the next sector being written to it by the disk.

buffering

Other Software Engineering Issues:

It is difficult to change our code
It is difficult to reuse our code
It is difficult run simultaneous programs
It is difficult to debug the code

Modularity & Abstraction

Software Collection: In a large system, we can have so many functions that it is too hard to remember everything at once.
Complexity is the problem
Modularity is the key

Modularity: Breaks the big problem into smaller more manageable pieces
Abstraction: The pieces have "nice" boundaries between each other

A Side Note: Debugging Time

Lets say we have

L lines of code total
B bugs, scaled to L
T time to find a bug. Also scaled to L

So T = BL ~ L²
If we have M modules, there are B/M bugs per module, which gives us L/M lines per module. Therefore, T=L²/M and we can 'decrease' our running time.
Note- this is a very simplistic calculation. Obviously if we made one module per line it would be really impractical.

When to modularize?

Some metrics to think about:

Performance- Usually gets hurt by modularity.
Robustness- Tolerance of faults and errors. Often a benefit of modularity.
Simplicity- Ease with which the software system can be learned and used.
Flexibility- Lack of assumtions about the environment (neutrality).

Depending on the application, some metrics may be more important than others. For example, our word counting app did not need any extra flexibility because it was designed for only the English professor's machine. However, simplicity was a factor for the professor because he or she only wanted to have to boot up the computer and have the app do everything else.

How to make the program modular?
Lets look at the read_sector function as an example

void read_sector (int s, char* buf)
This is our original function. It assumes that we will pass in an address of the start of a sector of memory and it will store the result in buf. This code will have good performance because it does exactly what it was intended for and nothing more. But what if we don't necessarily want to read from the start of a sector?

void read_data(int o, int n, char* buf)
o is the bytecount from start of the disk. n is the number of bytes to read, starting from o. This function is more modular because it no longer assumes 512 as the size of the input to be read. Flexibility is increased, but this assumes that the read will be succesful, and that is of course not always the case.

int read_data_err(...)
This function returns the number of bytes read or -1 if there was an error. This code is more robust, but what if there is more than one device? We need more flexibility.

int read_device_data_err(int d, int o, char *buf, int n)
d is the device number, which is unique for each device. However, the problem with this one is that it assumes we have direct access to the place on the disk, but we could handle serial devices like keyboards if we get rid of the bytecount variable o. This code is a little more flexible, but still not enough.

int read_and_allocate(int d, int o, char **p, int *n)
read_and_allocate() stores to p and n, and returns how many bytes were read. However, this is not as simple as having separate functions for doing reading and writing. This function also has the same problem as above with serial devices, so its kind of a step backward.

int read_any_device(int d, int n, char *buf)
This function will read from an device, as long as it has the device number. Now we have a modular function that doesn't try to do too much, is able to accept a wide variety of input devices, and can tolerate errors. Let's compare this to one of the standard POSIX read functions:

int lseek(int d, int o)
This can read from any direct access device, and has even fewer arguments, making it simpler to use. However, it probably does more work than our original read_sector() does to read 512 bytes from a sector of the disk, and so it will not have as good performance. This hit in performance comes at the expense of all the other positive modular characteristics we have gained. The documentation for lseek() can be found here.

Some faults with using device numbers:

No way to agree on a standard for labeling devices
Multiple programs could try to access the same device and cause problems

Here's how the problem was solved:
int open(char *name, int flags)
This function assigns a file descriptor to a device as a level of abstraction. The documentation for open() can be found here.

ssize_t read(int fd, char *buf, size_t bufsize)
And this function reads the device as if it is a file, referring to its file descriptor (returned from open()). It reads the input into a buffer of size bufsize. The documentation for read() can be found here.

A quick note about the modular types:

int - signed 32-bit
size_t - unsigned 64-bit
ssize_t - signed 64-bit (used for reporting errors as negative numbers)

Mechanisms for Modularity

Function Call Modularity as implemented in UNIX program data

stack

With this model we can think of the API (Application Programming Interface) as a contract between caller and callee. There is also and ABI (Application Binary Interface)

Lets look at the ABI of an example function. The ABI for read() is enforced by machine code, but it is very complicated to use as an example. Instead, let's look at the factorial function:

int fact(int n)

{if(n)return n*fact(n-1); else return 1;}
This function is assuming that n will be positive and small enough that there's no overflow.

`fact:`	`pushl`	`%ebp`
	`movl`	`$1, %eax`
	`movl`	`%esp, %ebp`
	`subl`	`$8, %esp`
	`movl`	`%ebx, -4(%ebp)`
	`movl`	`8(%ebp), %ebx`
	`testl`	`%ebx, %ebx`
	`jne`	`L5`
`L1:`	`movl`	`-4(%ebp), %ebx`
	`movl`	`%ebp, %esp`
	`popl`	`%ebp`
	`ret`

The ABI makes the 0th argument of fact() the return address, then grows the stack and does the work, then cleans up.

`L5:`	`leal`	`-1(%ebx), %eax`	Subtract 1
	`movl`	`%eax, %esp`
	`call`	`fact`	Pushes return address onto stack
	`imull`	`%ebx, %eax`
	`jnp`	`L1`

The initial function call: fact(12)

`pushl`	`$12`
`call`	`fact`	Calls the function
`addl`	`$4, %esp`	Put answer in `%esp`

Now we have nice modularity between function calls, but what can go wrong here?

Callee can mess with the local variables in the caller's frame (This is a stack smash)
Callee might not return, or might go into an infinite loop
Caller can supply bad stuff to the callee. Also, some optimizations for things like tail recursion might do tricky stuff and ignore return calls for performance reasons
Callee can set caller's registers

This is an example of soft modularity, a voluntary wall that is not enforced. It relies on the cooperation of programs involved. In many cases we may want hard modularity, where the caller and callee don't trust each other blindly.