CS 111

Scribe Notes for 12/3/09

by Ryan Harris, Mike Hess

Cloud Computing

Mainframes - 1960's

  • Data-intensive
  • Had problems with data optimization - couldn't figure out how to stream data to CPU at right time
  • Reliability
  • IBM, Fujitsu

Clusters - 1990's

A cluster is basically a large amount of Linux boxes linked together by an IP network. Each box is it's own computer (could be thought of as a mini-mainframe). Also the machines don't need to be identical, therefore they are heterogeneous.

  • Beowulf
  • SGE (Sun Grid Engine)

Clouds

Clouds can be though of as "clusters of clusters." Since there are already prexisting clusters around the country, they can just be linked toghether through networks, therefore the clusters are phyiscally separated from each other. This present some problems, like ownership, because different people already own the individual clusters.

  • Not just one owner, user, or organization
  • Primary obstacle to clouds are political issues
    • Who controls the cloud? (since clusters are spread around)
    • Who pays?
  • Political issues get merged into techinical issues
    • Security
    • Resource management
  • Amazon EC2, Globus

Advantages over clusters and grids

  • Short-term commitment
    • Buy computing power of someone else's cloud
    • Don't need capital investment
  • Pay as needed
  • No predicting what resources you will need
  • Can grow quickly - fast scaling

Disadvantages

  • Price to make cloud
  • It all depends
    • If you know how much computing you need and its relatively stable, buy a cluster because its cheaper
    • Run the numbers vs. clusters
  • Privacy
    • Data confidentiality
    • Encrypt data to and from the cloud
    • Must trust whoever runs the cloud - could be a bugged virtual CPU so they can see code
  • Network latency - don't want to run real-time applications on clouds
  • Data transfer bottlenecks
    • BIG unsolved problem
    • Archive data?
    • "sneakernet" - style technology
  • Bugs
    • Hard ones that show up as you scale
    • No easy solution - unsolved (if solving cheaply)
  • Other security issues
    • Denial of service attack
    • Physical attacks
  • Overload risks
    • Everyone's needs could exceeed the cloud's capacity
    • Multiple suppliers
    • Societal risk
    • Overload of data acces is often biggest problem - scalable storage

Vendor Lock-In

  • Could get stuck to the vendor who owns the cluster, like Microsoft

Software Licensing

  • Can't license tons of Windows copies for millions of machines - too expensive!
  • Big-bucks problem (licensing formulas)
  • Proprietary
  • Free software - problem: you take Linux, run it in cloud, don't distribute it though

Security Again

We need some sort of access control, so we prohibit "bad" accesses and allow "good" accesses. However, we want to accurately be able to tell which are "good" and which are "bad". We don't want to incorrectly label an access and deny "good" ones and accept "bad" ones. Traditional Unix had permissions, with a user, group, and other part (rwxrwxrwx). In original unix the user belonged to 1 group, but in BSD the user can belong to multiple groups.

ACLs - Access Control Lists

  • Owner of a resource can specify an access list (list of principals & their accesses)
  • Key idea - make sure default ACLs are right when a resource is created

Role-Based Access Control (RBAC)

  • ACLs etc.: each resource has an ACL, etc. attached to it - all accesses are mediated by the OS
  • Capabilities: each principal has a "RCL" (set of ccap

Trusted Software

From an OS viewpoint, OSes don't trust applications, because they don't trust users, and applications run on behave of users. However there are some trusted applications, and login is one of them. Login uses a syscall of setuid(id) so you can change who the user is.

  • Which programs do we trust? - as few and as small of list as possible.
  • How can we trust login? - Cryptographic checksum of program
  • How does vendor trust login?
    • Reflections on Trusting Trust - K. Thompson
    • Thompson explains how login cannot be trusted and proves it.
    • He says he can just change Linux to produce bugged code so he can log in on any Linux system
  • Trusted COmputing Base