CS111

by Liem Do

 

Lector 10: System Performance

 

BASIC IMPLEMENTATION OF LOCK/UNLOCK

We begin by examining a simple function to write to a pipe with blocking mutex. Recalling that the idea behind blocking mutex is to grab a lock if it is available or release the CPU.

     void writec(struct pipe *p, char c) {
    acquire(&p->v);
    // write a byte
    release(&p->b);  
}

struct pipe {  
    bmutex_t b;
}

 

We have to tell the kernel we are not only waiting on an object O but we are also waiting for an event condition C, a condition variable. So below is a revised version:

 void writec(struct pipe * p, char c) {
    acquire(&p->v, p->nonfull);
    // write a byte
    release(&p->b);
} struct pipe
{
    bmutex_t b;
    condvar_t nonfull, nonempty;
    // The write pointer minus the read pointer less than the size of buffer
    size_t r, w;
    char buff [N];
}

 

we need a condition defining the variable, a blocking mutex protecting this condition, a condition variable standing for this. Consequently, we added the condition to our structure.

After we modified the structure, we need API:

 

 void writec(struct pipe *p, char c)
{
    acquire(&p->v, pipe->nonfull);
    while(p->w - p-> == N)
         wait(&p>nonfull, &p->b);
    // not full, write to the pipe
    notify(&p->;nonempty);
    realease(&p->b);
}

 

CLASSIC SYNCHRONIZATION

Semaphore can be used to effectively avoid synchronization problem. In this technique, we block the mutex with int but not bool. By doing this, we can share with n (integer) people who are trying to access the resource. That means the object is locked when counter equals to 0 or unlocked when the counter is greater than 0.

DEAD LOCK

Dead lock itself is a race condition. In most cases, the dead lock occurs when processes lock each other. The interesting thing about dead lock is it is actually the cost to avoid race condition. Dead lock is hard to debug because it does not happen very often.

Four conditions needed for deadlock:

  1. Circular wait: a process that waits for something and eventually other processes in the chain waits for the first process.
  2. Mutual exclusion
  3. No preemption of locks: often happens when there is an important object that is not released
  4. Cannot hold and wait for others

We can decrease the deadlock by reengineer to eliminate as many as conditions above.

 

STRATEGIES TO AVOID DEADLOCK

If you need to acquire more than one lock, you get the lock in numeric order (eg. Their address). The following code demonstrates this observation.

/*  if fails, we release all the locks */
for (i = 0; i < 5; i++)
{
    if (try_acquire(&l[i]) < 0)
         for (j = 0; j < i; j++)
              release(&l[j]); 
    i = -1;
}

 

FILE SYSTEM PERFORMANCE

cpu vs disk

 

Barracuda XT 3TB
Corsair Force SSD 120 GB

Cost: $200
Data transfer rate: 6 Gb/s

Sustained data transfer rate: 138 Mb/s

 

Average latency: 4,16ms
- This is actually the spin speed

 

Random read seek time: 8.5ms
- The time moves to the correct track

 

Random write seek time: 9.5ms
- Longer because to make sure write to the correct sector

 

Unreadable read errors: 1 in 10^14

Average operating power: 9.23 W

Average idle power: 6.29W

Cost: $150
3 Gb/s
280 MB/s read, 270 MB/s write




50 K IOPS






 

 

 

 


2.0 W

0.5 W

 

hard drive components


Diagram labeling the major components of a computer hard disk drive

 

Barracuda XT 3TB

ST 330000 5N1A5AS

Capacity: 3TB = 3 * 10^12 bytes

Cache: 64MB -> The cache/disk ratio is really small

Sectors: 3, 907, 029, 168

Spin speed: 7200 rpm = 120Hz

I/O data transfer rate: 600MB/s

  • In average:
    Seek            8.5ms
    + latency      4.16ms
    + I/O           0.1 ms (out of cache)

                      0.4 ms (off disk)

                      0.1 ms CPU overhead

---------------------------------------------

                     13.72ms = 70 bytes/sec

 

Even though Corsair Force SSD is a lot faster than Barracuda XT, HDD is still a good choice now because of its price.

 

I/O PERFORMANCE ISSUES

With a hard drive that has I/O data transfer rate: 600 MB/s and interface speed: 6 Mb/s, how can we improve our read() function?

The answer is we can use a buffer read(fd, buf, 1024). But the question here is how large the buffer can be? If we increase the buffer size to 1GB, the seek time will grow as we read more sectors now. The CPU overhead time also grows. Furthermore, we're forcing the program to read more data than it needs.

User sees: read(fd, &c, 1)

OS sees: sys_read(disk, buf, 1024*1024*1024);

Another approach is to read more data than requested. By combining with some algorithms that can guess which data will be fetched next, we can significantly reduce the latency. But please keep in mind that when the throughput goes up, latency goes down.

Throughput

Latency

70 bytes/s

1372 ms

70 KB/s

1372 ms

130 MB/s

7017 ms

 

In most system, they don't go with too big block because it increases the latency.

 

Direct memory access - DMA

This is a feature that allows subsystems to access memory without sending the request to CPU, causing an interrupt so CPU causes next IO request. The interrupt = INT + RETI + etc… will take a lot of CPU. By using DMA with no interrupt, the CPU just pulls the device when needed.

 

DISK SCHEDULING

CPU is a limited resource and shared among different processes. We have a lot of accesses to the same disk from multiple processes. We need to figure out the correct order.

The following is a list of what we want out of disk scheduling:

 

Spring 2012 UCLA - CS 111 - Professor Paul Eggert