CS 111 Lecture 9

CS 111

Brian Geffon and Daniel Collier

Lecture 9

October 22nd, 2009

Implementing Spinlocks

When a computer runs multiple threads, the synchronization of is critical. When multiple threads try to access and modify memory at the same time, you must in some way ensure that the threads do not run into each other. As was described in the previous lecture, one way to control the synchronization of threads is the use of locks. One type of lock that is commonly used is a spin lock. When using a spin lock, if a thread wants to acquire a lock, it will loop (spin) until the lock is available. In other words, a spin lock keeps the CPU busy until it receives the lock. Here is an initial attempt at the implementation of a spin lock:


typedef  int mutex_t;


  void lock(mutext_t *m) {

    while(*m) continue;

    *m = 1;

  }

However, this above implementation is not quite right. Between the time that the value of the mutex is checked and the time that it is assigned a new value, the value in memory could have changed. This section needs to be done atomically, which is accomplished using the atomic x86 xchg instruction:


typedef int mutex_t



  void lock( mutex_t *m ) {

    while( xchg(m, 1) == 1 ) continue;

  }



  void  unlock( mutex_t *m) {


    *m = 0;

  }

Preconditions: It is important to remember the preconditions for calling lock and unlock. Lock may only be called when you do not own the lock. Unlock can only be called when you own the lock.

Are there other atomic x86 instructions we could use to implement a spin lock? Actually, there are several x86 instructions that are atomic:

lock incl x
- incl is not atomic, but you can make an increment long instruction atomic by placing lock in front of it.
- lock incl is atomic, but it is very slow.
- The uses of this instruction are limited, since it only allows you to increment a variable and nothing else.
xchg
- As we have seen in previous examples, xchg works and is more general in its use than lock incl
- Works well for boolean variables.
Compare and Swap (cas)

One advantage of compare and swap is it does not modify the value if it has changed before being able to modify it.
Allows us to do any function atomically!

Using compare and swap we can compute any function atomically

//compare and swap implementation in C

  bool cas( int *p, int old, int new ) {


    if( *p == old ) {

      *p = new;

      return true;


    } else {

      return false;

    }

  }

Here is an example of how we could compute a factorial atomically:


void factorial_transaction (int *p) {

    do {


      int old = *p;

      int new = fact(old);

    } while( !cas(*p, old, new) );

  }

Instead of the factorial function in the above example, we can compute any function there! Compare and swap allows us to implement any function atomically.

Where should we put the lock?

So does compare and swap solve all of our problems? Look at the following example of a bank account that is shared by Dr. Eggert and his brother where they both share an account that currently has $100. What happens if at the same time, Paul tries to withdraw $80 and Kurt tries to withdraw $40?

How can we solve the problem of both people withdrawing money from the account at the same time?

Proposed solution 1: Use cas
Problem: cas only works on ints, but the balance needs to be a 64-bit value.
Proposed solution 2: Run cas twice, once for the lower 32 bits and once for the upper 32 bits
Problem: This does not work. There is no way to easily connect these two parts together accurately.
Actual solution: Add a lock to the account data structure.


struct account {

    long long balance;

    ...

    volatile mutex_t lock;

  }

Withdraw money:



lock( &p -> lock );

  p->balance -= 8000;

  unlock( &p -> lock );

Now, let's try to apply what we learned from giving the bank account its own lock to an implementation of pipes in linux:


struct pipe {


    char buf [N];

    size_t r, w;

    mutex_t m;

  }

There are two possible ways to use locks with a pipe - either each pipe has its own lock or there is a global variable lock for all of the pipes:

Course-grained (global mutex_t)
- Saves memory
Fine-grained (each pipe has its own mutex_t lock)
- More parallelism
- Helps prevent bottlenecks by not forcing pipes to wait for another unrelated pipe to give up a lock.
- Usually preferred, since memory is not a major concern.

Adding Read/Write Locks

Is it possible to make a finer grained locking system? Right now each pipe has its own lock. However, a reader should not exclude others from being able to read as well. We can create greater parallelism and create a finer grained locking system by changing our single lock for the pipe to have two locks, one for reading and one for writing.


struct pipe {

    char buf[N];

    size_t r, w;


    mutex_t rm, wm;

  }

Now when we write, we get a write lock and when we read, we get a read lock.


bool writec( struct pipe *p, char c ) {

    lock( &p->wm );

    if( p->w - p->r == N ) {


      unlock( p->wm);


      return false;

    }

    p->buf[p->w++ % N] = c;

    unlock( &p->wm );

    return true;

  }

If you want writec to be a void function instead, you can change it to:


void writec( struct pipe *p, char c ) {

    lock( &p->wm );

    if( p->w - p->r == N ) {


      continue;

    }

    p->buf[p->w++ % N] = c;

    unlock( &p->wm );


  }

This function will spin until a reader helps us out and releases the lock, allowing us to claim the write lock until we finish writing.

Choosing where to place critical sections

We have now discussed the design decision of where to place locks within a data structure. However, equally important is the decision of where to call the lock() and unlock() functions within code.

How big should a critical section be?

NOT TOO LARGE: If the critical section is too large, this will lead to a bottleneck. The lock will be held for a long time, preventing others from using it.

NOT TOO SHORT: The critical section ensures that locked item is not modified while operations are performed on it. A critical section must be long enough to include all of the necessary reads and writes that need to be performed.

How do we find the minimal critical section?

Look for writes to shared objets ("shared writes")
Look for reads used to compute those share writes ("dependent reads") Keep repeating this process of expanding the critical section until the small region possible is found. Put lock() / unlock() around smallest region containing these.

What if some threads want read-only access to the whole object? If they're only reading, we don't want to block all other readers. We want to allow N > 1 simultaneous readers OR 1 writer.

Our previous solution blocks readers unnecessarily. We need a new type of mutex that supports both readers and writers.


typedef {


    mutext_t m;

    int readers;

  } rwmutex_t;

If you want a read lock, grab the mutex, add 1 to readers, and return. Decrement readers when unlock.

If you want a write lock, grab the mutex, check if readers == 0, don't release the mutex until done writing.

Problem with spinlocks:

Spin locks chew up CPU. This is okay if there are few bottlenecks and critical systems are small.
Does not work when long waits are common

Possible solution:



while( xchg( ... ) )

    yield();

However, this stills up some CPU since it hurts context switching time.

Blocking mutexes

Avoid polling
Tell OS what you're waiting on so OS doesn't wake you up until that lock is available


typedef struct {

    mutex_t m; //controls access to this lock

    bool locked; //someone owns this lock

    proc_t *blocked_list;

  } bmutex_t;



  void acquire( bmutex_t *b ) {

    for( ; ; ) {

    lock( &b -> m );

    //make self runnable

    //add self to b-> blocked list

    if( !b-> locked )

      break;

    unlock( &b -> m );


    schedule();

    }

    b-> locked = true

    //mark self runnable

    //remove self from b->blocked list

    unlock( &b->m );

  }


  void release( bmutex_t *b ) {


    lock( &b -> m);

    b->locked = false;

    //mark 1st process in p->blocked list as runnable

    unlock(&b->m);

  }

Note: two possible methods for which process on the blocked list to mark as runnable:

Notify method: more intuitive since you only wake up the next item
Notify all method: more flexibility for who is scheduled next

Semaphores

A blocking mutex in which the locked variable is an integer (locked when == 0, unlocked when > 0)
Good for resource pools

Acquire : lock--; //p - prolaag
Release: lock++; //v - verhoog


  struct pipe {

    bmutex_t b;

    char buf[N];


    size_t r, w;

  }


  void writec( struct pipe *p, char c ) {


    for( ; ; ) {

      acquire( &p->b );

      if( p->w - p->b  != N)

        break;

      release( &p -> b );


    }

    p->buf[...]--;


    release(&p->buf);

  }

However, this still chews up CPU time trying to acquire and release over and over again.

Condition Variables

We need:

A condition (Boolean expression)
A blocking mutex protecting the condition
A condition variable (OS data type) representing the condition


wait( condvar_t *c, bmutex_t *b )

Precondition: caller must have acquired b
Action: release b, blocks until notified, reacquires b, returns


notify( condvar_t *c )

Wake up one waiter


notify_all( condvar_t *c )

Wake up all waiters


  struct pipe {

    bmutex_t b;


    char buf[N];

    size_t r,w;

    condvar_t notfull, notempty;

  }


  void writec( struct pipe *p, char c ) {


    acquire( &p-> b);

    for( ; ; ) {

      if( p-> w - p->r != N ) {


        break;

      }

      wait( &p -> nonfull, &p->b );


    }

    p->buf[...]--;

    release( &p->b );

    notify( &p->nonempty );


  }