CS 111

Brian Geffon and Daniel Collier

Lecture 9

October 22nd, 2009


Implementing Spinlocks

When a computer runs multiple threads, the synchronization of is critical. When multiple threads try to access and modify memory at the same time, you must in some way ensure that the threads do not run into each other. As was described in the previous lecture, one way to control the synchronization of threads is the use of locks. One type of lock that is commonly used is a spin lock. When using a spin lock, if a thread wants to acquire a lock, it will loop (spin) until the lock is available. In other words, a spin lock keeps the CPU busy until it receives the lock. Here is an initial attempt at the implementation of a spin lock:

typedef int mutex_t;

  void lock(mutext_t *m) {
    while(*m) continue;
    *m = 1;
  }

However, this above implementation is not quite right. Between the time that the value of the mutex is checked and the time that it is assigned a new value, the value in memory could have changed. This section needs to be done atomically, which is accomplished using the atomic x86 xchg instruction:

typedef int mutex_t

  void lock( mutex_t *m ) {
    while( xchg(m, 1) == 1 ) continue;
  }

  void unlock( mutex_t *m) {
    *m = 0;
  }

Preconditions: It is important to remember the preconditions for calling lock and unlock. Lock may only be called when you do not own the lock. Unlock can only be called when you own the lock.

Are there other atomic x86 instructions we could use to implement a spin lock? Actually, there are several x86 instructions that are atomic:

  1. lock incl x
  2. xchg
  3. Compare and Swap (cas)

Using compare and swap we can compute any function atomically

//compare and swap implementation in C
  bool cas( int *p, int old, int new ) {
    if( *p == old ) {
      *p = new;
      return true;
    } else {
      return false;
    }
  }

Here is an example of how we could compute a factorial atomically:

void factorial_transaction (int *p) {
    do {
      int old = *p;
      int new = fact(old);
    } while( !cas(*p, old, new) );
  }

Instead of the factorial function in the above example, we can compute any function there! Compare and swap allows us to implement any function atomically.

Where should we put the lock?

So does compare and swap solve all of our problems? Look at the following example of a bank account that is shared by Dr. Eggert and his brother where they both share an account that currently has $100. What happens if at the same time, Paul tries to withdraw $80 and Kurt tries to withdraw $40?

How can we solve the problem of both people withdrawing money from the account at the same time?

struct account {
    long long balance;
    ...
    volatile mutex_t lock;
  }

Withdraw money:

lock( &p -> lock );
  p->balance -= 8000;
  unlock( &p -> lock );

Now, let's try to apply what we learned from giving the bank account its own lock to an implementation of pipes in linux:

struct pipe {
    char buf [N];
    size_t r, w;
    mutex_t m;
  }

There are two possible ways to use locks with a pipe - either each pipe has its own lock or there is a global variable lock for all of the pipes:

  1. Course-grained (global mutex_t)

  2. Fine-grained (each pipe has its own mutex_t lock)

Adding Read/Write Locks

Is it possible to make a finer grained locking system? Right now each pipe has its own lock. However, a reader should not exclude others from being able to read as well. We can create greater parallelism and create a finer grained locking system by changing our single lock for the pipe to have two locks, one for reading and one for writing.

struct pipe {
    char buf[N];
    size_t r, w;
    mutex_t rm, wm;
  }

Now when we write, we get a write lock and when we read, we get a read lock.

bool writec( struct pipe *p, char c ) {
    lock( &p->wm );
    if( p->w - p->r == N ) {
      unlock( p->wm);
      return false;
    }
    p->buf[p->w++ % N] = c;
    unlock( &p->wm );
    return true;
  }

If you want writec to be a void function instead, you can change it to:

void writec( struct pipe *p, char c ) {
    lock( &p->wm );
    if( p->w - p->r == N ) {
      continue;
    }
    p->buf[p->w++ % N] = c;
    unlock( &p->wm );
  }

This function will spin until a reader helps us out and releases the lock, allowing us to claim the write lock until we finish writing.

Choosing where to place critical sections

We have now discussed the design decision of where to place locks within a data structure. However, equally important is the decision of where to call the lock() and unlock() functions within code.

How big should a critical section be?

How do we find the minimal critical section?

  1. Look for writes to shared objets ("shared writes")
  2. Look for reads used to compute those share writes ("dependent reads") Keep repeating this process of expanding the critical section until the small region possible is found. Put lock() / unlock() around smallest region containing these.

What if some threads want read-only access to the whole object? If they're only reading, we don't want to block all other readers. We want to allow N > 1 simultaneous readers OR 1 writer.

Our previous solution blocks readers unnecessarily. We need a new type of mutex that supports both readers and writers.

typedef {
    mutext_t m;
    int readers;
  } rwmutex_t;

If you want a read lock, grab the mutex, add 1 to readers, and return. Decrement readers when unlock.

If you want a write lock, grab the mutex, check if readers == 0, don't release the mutex until done writing.

Problem with spinlocks:

Possible solution:

while( xchg( ... ) )
    yield();

However, this stills up some CPU since it hurts context switching time.

Blocking mutexes

typedef struct {
    mutex_t m; //controls access to this lock
    bool locked; //someone owns this lock
    proc_t *blocked_list;
  } bmutex_t;

  void acquire( bmutex_t *b ) {
    for( ; ; ) {
    lock( &b -> m );
    //make self runnable
    //add self to b-> blocked list

    if( !b-> locked )
      break;
    unlock( &b -> m );
    schedule();
    }
    b-> locked = true
    //mark self runnable
    //remove self from b->blocked list

    unlock( &b->m );
  }

  void release( bmutex_t *b ) {
    lock( &b -> m);
    b->locked = false;
    //mark 1st process in p->blocked list as runnable
    unlock(&b->m);
  }

Note: two possible methods for which process on the blocked list to mark as runnable:

  1. Notify method: more intuitive since you only wake up the next item
  2. Notify all method: more flexibility for who is scheduled next

Semaphores

Acquire : lock--; //p - prolaag
Release: lock++; //v - verhoog

  struct pipe {
    bmutex_t b;
    char buf[N];
    size_t r, w;
  }

  void writec( struct pipe *p, char c ) {
    for( ; ; ) {
      acquire( &p->b );
      if( p->w - p->b != N)
        break;
      release( &p -> b );
    }
    p->buf[...]--;
    release(&p->buf);
  }

However, this still chews up CPU time trying to acquire and release over and over again.

Condition Variables

We need:

  1. A condition (Boolean expression)
  2. A blocking mutex protecting the condition
  3. A condition variable (OS data type) representing the condition
wait( condvar_t *c, bmutex_t *b ) notify( condvar_t *c ) notify_all( condvar_t *c )   struct pipe {
    bmutex_t b;
    char buf[N];
    size_t r,w;
    condvar_t notfull, notempty;
  }

  void writec( struct pipe *p, char c ) {
    acquire( &p-> b);
    for( ; ; ) {
      if( p-> w - p->r != N ) {
        break;
      }
      wait( &p -> nonfull, &p->b );
    }
    p->buf[...]--;
    release( &p->b );
    notify( &p->nonempty );
  }