CS 111 Lecture 3 notes

Scribes: Moiz Mian, Sakib Shaikh, Mohammad Poswal, Syed Moizullah

Lecture date: Monday, January 23

Quick Links
Modularity & Abstraction
Mechanisms for Modularity

Issues with the standalone app we made last week:

buffering

Other Software Engineering Issues:

Modularity & Abstraction

Software Collection: In a large system, we can have so many functions that it is too hard to remember everything at once.
Complexity is the problem
Modularity is the key

Modularity: Breaks the big problem into smaller more manageable pieces
Abstraction: The pieces have "nice" boundaries between each other

A Side Note: Debugging Time

Lets say we have

So T = BL ~ L2
If we have M modules, there are B/M bugs per module, which gives us L/M lines per module. Therefore, T=L2/M and we can 'decrease' our running time.
Note- this is a very simplistic calculation. Obviously if we made one module per line it would be really impractical.

When to modularize?

Some metrics to think about:

Depending on the application, some metrics may be more important than others. For example, our word counting app did not need any extra flexibility because it was designed for only the English professor's machine. However, simplicity was a factor for the professor because he or she only wanted to have to boot up the computer and have the app do everything else.

How to make the program modular?
Lets look at the read_sector function as an example

void read_sector (int s, char* buf)
This is our original function. It assumes that we will pass in an address of the start of a sector of memory and it will store the result in buf. This code will have good performance because it does exactly what it was intended for and nothing more. But what if we don't necessarily want to read from the start of a sector?

void read_data(int o, int n, char* buf)
o is the bytecount from start of the disk. n is the number of bytes to read, starting from o. This function is more modular because it no longer assumes 512 as the size of the input to be read. Flexibility is increased, but this assumes that the read will be succesful, and that is of course not always the case.

int read_data_err(...)
This function returns the number of bytes read or -1 if there was an error. This code is more robust, but what if there is more than one device? We need more flexibility.

int read_device_data_err(int d, int o, char *buf, int n)
d is the device number, which is unique for each device. However, the problem with this one is that it assumes we have direct access to the place on the disk, but we could handle serial devices like keyboards if we get rid of the bytecount variable o. This code is a little more flexible, but still not enough.

int read_and_allocate(int d, int o, char **p, int *n)
read_and_allocate() stores to p and n, and returns how many bytes were read. However, this is not as simple as having separate functions for doing reading and writing. This function also has the same problem as above with serial devices, so its kind of a step backward.

int read_any_device(int d, int n, char *buf)
This function will read from an device, as long as it has the device number. Now we have a modular function that doesn't try to do too much, is able to accept a wide variety of input devices, and can tolerate errors. Let's compare this to one of the standard POSIX read functions:

int lseek(int d, int o)
This can read from any direct access device, and has even fewer arguments, making it simpler to use. However, it probably does more work than our original read_sector() does to read 512 bytes from a sector of the disk, and so it will not have as good performance. This hit in performance comes at the expense of all the other positive modular characteristics we have gained. The documentation for lseek() can be found here.

Some faults with using device numbers:

Here's how the problem was solved:
int open(char *name, int flags)
This function assigns a file descriptor to a device as a level of abstraction. The documentation for open() can be found here.

ssize_t read(int fd, char *buf, size_t bufsize)
And this function reads the device as if it is a file, referring to its file descriptor (returned from open()). It reads the input into a buffer of size bufsize. The documentation for read() can be found here.

A quick note about the modular types:

Mechanisms for Modularity

Function Call Modularity as implemented in UNIX program data

stack

With this model we can think of the API (Application Programming Interface) as a contract between caller and callee. There is also and ABI (Application Binary Interface)

Lets look at the ABI of an example function. The ABI for read() is enforced by machine code, but it is very complicated to use as an example. Instead, let's look at the factorial function:

int fact(int n)

{if(n)return n*fact(n-1); else return 1;}

This function is assuming that n will be positive and small enough that there's no overflow.

fact:  pushl %ebp  
  movl $1, %eax  
  movl %esp, %ebp  
  subl $8, %esp  
  movl %ebx, -4(%ebp)  
  movl 8(%ebp), %ebx  
  testl %ebx, %ebx  
  jne L5  
L1:  movl -4(%ebp), %ebx  
  movl %ebp, %esp  
  popl %ebp  
  ret    

The ABI makes the 0th argument of fact() the return address, then grows the stack and does the work, then cleans up.

L5:  leal -1(%ebx), %eax Subtract 1
  movl %eax, %esp  
  call fact Pushes return address onto stack
  imull %ebx, %eax  
  jnp L1  

The initial function call: fact(12) 

  pushl $12  
  call fact Calls the function
  addl $4, %esp Put answer in %esp


Now we have nice modularity between function calls, but what can go wrong here?

This is an example of soft modularity, a voluntary wall that is not enforced. It relies on the cooperation of programs involved. In many cases we may want hard modularity, where the caller and callee don't trust each other blindly.