CS 111

Scribe Notes for 12/3/09

by Ryan Harris, Mike Hess

Cloud Computing

Mainframes - 1960's

Data-intensive
Had problems with data optimization - couldn't figure out how to stream data to CPU at right time
Reliability
IBM, Fujitsu

Clusters - 1990's

A cluster is basically a large amount of Linux boxes linked together by an IP network. Each box is it's own computer (could be thought of as a mini-mainframe). Also the machines don't need to be identical, therefore they are heterogeneous.

Beowulf
SGE (Sun Grid Engine)

Clouds

Clouds can be though of as "clusters of clusters." Since there are already prexisting clusters around the country, they can just be linked toghether through networks, therefore the clusters are phyiscally separated from each other. This present some problems, like ownership, because different people already own the individual clusters.

Not just one owner, user, or organization
Primary obstacle to clouds are political issues
- Who controls the cloud? (since clusters are spread around)
- Who pays?
Political issues get merged into techinical issues
- Security
- Resource management
Amazon EC2, Globus

Advantages over clusters and grids

Short-term commitment
- Buy computing power of someone else's cloud
- Don't need capital investment
Pay as needed
No predicting what resources you will need
Can grow quickly - fast scaling

Disadvantages

Price to make cloud
It all depends
- If you know how much computing you need and its relatively stable, buy a cluster because its cheaper
- Run the numbers vs. clusters
Privacy
- Data confidentiality
- Encrypt data to and from the cloud
- Must trust whoever runs the cloud - could be a bugged virtual CPU so they can see code
Network latency - don't want to run real-time applications on clouds
Data transfer bottlenecks
- BIG unsolved problem
- Archive data?
- "sneakernet" - style technology
Bugs
- Hard ones that show up as you scale
- No easy solution - unsolved (if solving cheaply)
Other security issues
- Denial of service attack
- Physical attacks
Overload risks
- Everyone's needs could exceeed the cloud's capacity
- Multiple suppliers
- Societal risk
- Overload of data acces is often biggest problem - scalable storage

Vendor Lock-In

Could get stuck to the vendor who owns the cluster, like Microsoft

Software Licensing

Can't license tons of Windows copies for millions of machines - too expensive!
Big-bucks problem (licensing formulas)
Proprietary
Free software - problem: you take Linux, run it in cloud, don't distribute it though

Security Again

We need some sort of access control, so we prohibit "bad" accesses and allow "good" accesses. However, we want to accurately be able to tell which are "good" and which are "bad". We don't want to incorrectly label an access and deny "good" ones and accept "bad" ones. Traditional Unix had permissions, with a user, group, and other part (rwxrwxrwx). In original unix the user belonged to 1 group, but in BSD the user can belong to multiple groups.

ACLs - Access Control Lists

Owner of a resource can specify an access list (list of principals & their accesses)
Key idea - make sure default ACLs are right when a resource is created

Role-Based Access Control (RBAC)

ACLs etc.: each resource has an ACL, etc. attached to it - all accesses are mediated by the OS
Capabilities: each principal has a "RCL" (set of ccap

Trusted Software

From an OS viewpoint, OSes don't trust applications, because they don't trust users, and applications run on behave of users. However there are some trusted applications, and login is one of them. Login uses a syscall of setuid(id) so you can change who the user is.

Which programs do we trust? - as few and as small of list as possible.
How can we trust login? - Cryptographic checksum of program
How does vendor trust login?
- Reflections on Trusting Trust - K. Thompson
- Thompson explains how login cannot be trusted and proves it.
- He says he can just change Linux to produce bugged code so he can log in on any Linux system
Trusted COmputing Base