Lecture 1: Course Introduction and Overview

Download Report

Transcript Lecture 1: Course Introduction and Overview

CS194-3/CS16x
Introduction to Systems
Lecture 1
What is a “System”?
August 27, 2007
Prof. Anthony D. Joseph
http://www.cs.berkeley.edu/~adj/cs16x
Who am I?
Professor Anthony D. Joseph
• 465 Soda Hall (RAD Lab)
• adj AT cs.berkeley.edu
• Office hours Mon/Tue 1-2pm in 413 Soda
• Background:
– MIT undergrad and grad student
• Research areas:
– Current: Network security, OS security, very
large security testbeds
– Other: Mobile computing, wireless networking,
cellular telephony
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.2
Goals for Today
• Motivation for a new course
• Topics:
– Operating systems, Databases, Networking,
Security, Software engineering, Distributed
systems
• Complexity
Interactive is important!
Ask Questions!
Note: Some slides and/or pictures in the following are
adapted from slides ©2005 Silberschatz, Galvin, and Gagne.
Slides courtesy of Kubiatowicz, AJ Shankar, George Necula,
Alex Aiken, Eric Brewer, Ras Bodik, Ion Stoica, Doug Tygar,
and David Wagner.
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.3
Why Change CS 162?
• Only minor changes since early 1990’s…
– Slides!
– Java version of Nachos
– Content: More crypto/security, less databases and
distributed filesystems
– Time to update again!!
• Most CS students take CS 162 and 186
– But, not all take EE 122, CS 169/161
– We’d like all students to have a basic
understanding of key concepts from these classes
• Each class introduces the same topics with classspecific biases
8/27/07
– Concurrency in an Operating System versus in a
Database
– Introduce concepts with a common framework
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.4
Rapid Underlying Technology Change
• “Cramming More Components onto Integrated Circuits”
– Gordon Moore, Electronics, 1965
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.5
Computing Devices Everywhere
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.6
People-to-Computer Ratio Over Time
From David Culler
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.7
Increasing Software Complexity
From MIT’s 6.033 course
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.8
But, Latency Improves Slowly…
From MIT’s 6.033 course
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.9
Heat is a Major Problem!
From MIT’s 6.033 course
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.10
The Internet
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.11
The Dark Side of the Internet…
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.12
Zombie Networks
• Click on the link and you join the STORM zombie
network of 250K-10M “0wned” PCs
• Zombies used by malicious hackers (crackers) for
phishing, spamming, identity theft, extortion
• Crackers build zombie networks of 10K-1M
compromised machines & sell services
– Ex: Take down competitor's website for $1K
• Hugely profitable!
– Massive spamming, ID fraud through phishing
– Roughly half of all spam is sent by zombies
• How can we secure our machines against folks
like this?
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.13
Complexity
• How to manage complexity at all levels?
• Many issues and many tradeoffs
• Need a global view of systems
– Decompose into components
• Need a global understanding of systems
– Applications, networks, databases, operating
systems, security, software engineering…
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.14
Course Administration
• Instructor: Anthony D. Joseph (adj AT cs.berkeley.edu)
465 Soda Hall
Office Hours: M/Tu 1-2 in 413 Soda
• TAs:
Kai Xia
([email protected])
• Website:
http://www.cs.berkeley.edu/~adj/cs16x
• Reader and book: TBA
– Most likely: Silberschatz, Galvin, and Gagne,
Operating Systems Concepts, 7th Ed., 2005
• Projects: First project will likely be Nachos-based
• Grading: TBA
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.15
Topic Coverage
•
•
•
•
•
•
•
•
•
•
•
•
8/27/07
Managing complexity (abstractions, layering, modularity)
Team programming, IDEs, documentation style
OS, memory, database, and network security
Kernel and address spaces, Address translation, Caching,
TLBs, demand paging
I/O Systems, File systems, directories, database buffer
pools, tuple layouts, files of tuples
Internet evolution, architectures, protocols, routing, P2P
and overlay networks,
Concurrency, processes, threads, ACID
Enforcing mutual exclusion, serializability, 2PL, logging,
recovery, deadlock
Viruses, worms, and botnets, DDoS,
Cryptographic algorithms: RSA, MD5, DES
Simple authentication protocols, PKI
Query (dataflow) operators, map-reduce
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.16
Creating Software Is Awesome
• It’s like art:
– There’s a vision, a realization, an aesthetic appeal,
a sense of ownership and satisfaction
• It’s not like art:
– The end result is useful
» To you, and anyone else
• It’s immensely satisfying to do
– Your project is your baby
» It’ll keep you up at night, make you proud…
» But won’t disown you when it’s 14 (though you might
disown it)
• Good software engineering can be learned
8/27/07
– But it is hard to teach
– Most people only learn through experience (making
mistakes) Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.17
Group Project Simulates Industrial Environment
• Project teams have 4 or 5 members in same
discussion section
– Must work in groups in “the real world”
• Communicate with colleagues (team members)
–
–
–
–
–
Communication problems are natural
What have you done?
What answers you need from others?
You must document your work!!!
Everyone must keep an on-line notebook
• Communicate with supervisor (TAs)
– How is the team’s plan?
– Short progress reports are required:
8/27/07
» What is the team’s game plan?
» What is each member’s responsibility?
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.18
Typical Lecture Format
Attention
20 min. Break 25 min. Break 25 min. “In Conclusion, ...”
Time
•
•
•
•
•
•
•
1-Minute Review
20-Minute Lecture
5- Minute Administrative Matters
25-Minute Lecture
5-Minute Break (water, stretch)
25-Minute Lecture
Instructor will come to class early & stay after to answer
questions
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.19
Academic Dishonesty Policy
• Copying all or part of another person's work, or using reference
material not specifically allowed, are forms of cheating and will
not be tolerated. A student involved in an incident of cheating will
be notified by the instructor and the following policy will apply:
•
•
•
•
http://www.eecs.berkeley.edu/Policies/acad.dis.shtml
The instructor may take actions such as:
– require repetition of the subject work,
– assign an F grade or a 'zero' grade to the subject work,
– for serious offenses, assign an F grade for the course.
The instructor must inform the student and the Department Chair
in writing of the incident, the action taken, if any, and the
student's right to appeal to the Chair of the Department
Grievance Committee or to the Director of the Office of Student
Conduct.
The Office of Student Conduct may choose to conduct a formal
hearing on the incident and to assess a penalty for misconduct.
The Department will recommend that students involved in a second
incident of cheating be dismissed from the University.
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.20
Computer System Organization
• Computer-system operation
– One or more CPUs, device controllers connect
through common bus providing access to shared
memory
– Concurrent execution of CPUs and devices
competing for memory cycles
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.21
Example: Some Mars Rover Requirements
• Serious hardware limitations/complexity:
– 20Mhz powerPC processor, 128MB of RAM
– cameras, scientific instruments, batteries,
solar panels, and locomotion equipment
– Many independent processes work together
• Can’t hit reset button very easily!
– Must reboot itself if necessary
– Always able to receive commands from Earth
• Individual Programs must not interfere
– Suppose the MUT (Martian Universal Translator Module)
buggy
– Better not crash antenna positioning software!
• Further, all software may crash occasionally
– Automatic restart with diagnostics sent to Earth
– Periodic checkpoint of results saved?
• Certain functions time critical:
– Need to stop before hitting something
– Must track orbit of Earth for communication
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.22
How do we tame complexity?
• Every piece of computer hardware different
– Different CPU
» Pentium, PowerPC, ColdFire, ARM, MIPS
– Different amounts of memory, disk, …
– Different types of devices
» Mice, Keyboards, Sensors, Cameras, Fingerprint
readers
– Different networking environment
» Cable, DSL, Wireless, Firewalls,…
• Questions:
– Does the programmer need to write a single program
that performs many independent activities?
– Does every program have to be altered for every
piece of hardware?
– Does a faulty program crash everything?
– Does every program have access to all hardware?
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.23
OS Tool: Virtual Machine Abstraction
Application
Operating System
Hardware
Virtual Machine Interface
Physical Machine Interface
• Software Engineering Problem:
– Turn hardware/software quirks 
what programmers want/need
– Optimize for convenience, utilization, security,
reliability, etc…
• For Any OS area (e.g. file systems, virtual memory,
networking, scheduling):
– What’s the hardware interface? (physical reality)
– What’s the application interface? (nicer abstraction)
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.24
Interfaces Provide Important Boundaries
software
instruction set
hardware
• Why do interfaces look the way that they do?
– History, Functionality, Stupidity, Bugs, Management
– CS152  Machine interface
– CS160  Human interface
– EE122  Protocol stack
– CS169  Software engineering/management
• Should responsibilities be pushed across boundaries?
– RISC architectures, Graphical Pipeline Architectures
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.25
Virtual Machines
• Software emulation of an abstract machine
– Make it look like hardware has features you want
– Programs from one hardware & OS on another one
• Programming simplicity
–
–
–
–
Each process thinks it has all memory/CPU time
Each process thinks it owns all devices
Different Devices appear to have same interface
Device Interfaces more powerful than raw hardware
» Bitmapped display  windowing system
» Ethernet card  reliable, ordered, networking (TCP/IP)
• Fault Isolation
– Processes unable to directly impact other processes
– Bugs cannot crash whole machine
• Protection and Portability
– Java interface safe and stable across many platforms
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.26
Four Components of a Computer System
Definition: An operating system implements a virtual
machine that is (hopefully) easier and safer to
program and use than the raw hardware.
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.27
Virtual Machines (con’t): Layers of OSs
• Useful for OS development
– When OS crashes, restricted to one VM
– Can aid testing programs on other OSs
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.28
What does an Operating System do?
• Silerschatz and Gavin:
“An OS is Similar to a government”
– Begs the question: does a government do anything useful by
itself?
• Coordinator and Traffic Cop:
– Manages all resources
– Settles conflicting requests for resources
– Prevent errors and improper use of the computer
• Facilitator:
– Provides facilities that everyone needs
– Standard Libraries, Windowing systems
– Make application programming easier, faster, less error-prone
• Some features reflect both tasks:
– E.g. File system is needed by everyone (Facilitator)
– But File system must be Protected (Traffic Cop)
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.29
OS Systems Principles
• OS as illusionist:
– Make hardware limitations go away
– Provide illusion of dedicated machine with infinite
memory and infinite processors
• OS as government:
– Protect users from each other
– Allocate resources efficiently and fairly
• OS as complex system:
– Constant tension between simplicity and
functionality or performance
• OS as history teacher
– Learn from past
– Adapt as hardware tradeoffs change
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.30
BREAK
Data Complexity
• Need to store information?
• Can put it in a file
• Too big or too complex?
– Use a database
• How big is the web?
– 400 million hosts
– 15-30 billion pages
(http://www.pandia.com/sew/383-web-size.html)…
• With a billion users looking for information
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.32
What is a Database System Today?
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.33
More Complex Database Systems
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.34
So… What is a Database?
• We will be broad in our interpretation
• A Database:
– A very large, integrated collection of data.
• Typically models a real-world “enterprise”
– Entities (e.g., teams, games)
– Relationships (e.g. The A’s are playing in the World
Series)
• Might surprise you how flexible this is
– Web search:
» Entities: words, documents
» Relationships: word in document, document links to
document.
– P2P filesharing:
» Entities: words, filenames, hosts
» Relationships: word in filename, file available at host
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.35
What is a Database Management System?
• A Database Management System (DBMS) is:
– A software system designed to store, manage, and
facilitate access to databases.
• Typically this term used narrowly
– Relational databases with transactions
» E.g. Oracle, DB2, SQL Server
– Mostly because they predate other large
repositories
» Also because of technical richness
– When we say DBMS in this class we will usually
follow this convention
» But keep an open mind about applying the ideas!
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.36
Is the WWW a DBMS?
• Fairly sophisticated search available
– Crawler indexes pages on the web
– Keyword-based search for pages
• But, currently
– data is mostly unstructured and untyped
– search only:
» can’t modify the data
» can’t get summaries, complex combinations of data
– few guarantees provided for freshness of data, consistency
across data items, fault tolerance, …
– Web sites typically have a (relational) DBMS in the
background to provide these functions.
• The picture is changing quickly
– Information Extraction to get structure from unstructured
– New standards e.g., XML, Semantic Web can help data
modeling
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.37
“Search” versus Query
• What if you wanted to
find out which actors
donated to John Kerry’s
presidential campaign?
• Try “actors donated to
john kerry” in your
favorite search engine.
• If it isn’t
“published”,
it can’t be
searched!
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.38
A “Database Query” Approach
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.39
“Yahoo Actors” JOIN “FECInfo”
8/27/07 (Courtesy
of the Telegraph
research
group
Joseph CS194-3/16x
©UCB
Fall @Berkeley)
2007
Lec 1.40
Why Study Systems – OS/Net/DB/Sec/SE?
• Learn how to build complex systems:
– How can you manage complexity for future projects?
• Engineering issues:
– Why is the web so slow sometimes? Can you fix it?
– What features should be in the next mars Rover?
– How do large distributed systems work? (e.g. Skype)
• Business issues:
– Will my web services application scale to 1M users?
• Buying and using a personal computer:
– Why different PCs with same CPU behave differently?
– Should you upgrade to Vista or wait?
– Why does Microsoft have such a bad name (and Apple
a good name)?
• Security, viruses, and worms
– What exposure do you have to worry about?
8/27/07
Joseph CS194-3/16x ©UCB Fall 2007
Lec 1.41