Privacy-Aware Computing

Download Report

Transcript Privacy-Aware Computing

CEG7380 Cloud Computing
Lecture 1
Keke Chen
Outline
 Syllabus





Scope of this course
Tentative schedule
Prerequisites
Resources
Assignments
 Introduction
Scope of this course
 Understand the basic ideas of cloud
computing
 Get familiar with
 Tools
 Systems
 Expose to some research topics
Two major parts:
 Processing large data with the cloud
 Scaling up/down web applications
with the cloud
Note: some programming parts need
self-study
Prerequisites
 Some programming skills
 Java, python, shell
 Comfortable with learning new
programming frameworks
 Sufficient knowledge about
 Data structure and databases
 Operating systems
 Distributed systems
Assignments and Grading
 Reading papers (~3) (10%)
 Some miniprojects (4~5) (60%)
 Help you master the concepts
 Learn to use tools and systems
 Self-motivated research projects are
strongly encouraged!
 Final exam (20%)
 Class attendance and discussion
(10%)
Resources
 updated reference list
 Inhouse hadoop cluster
 AWS access
 coupon code for each student
 Pilot
 Submitting reading assignments and
projects
Tentative Schedule
 Parallel data processing
 Distributed file systems (GFS, HDFS)
 MapReduce
 High-level distributed data management
 Cloud infrastructures
 Virtualization
 AWS and Eucalyptus
 Interactive front-end – Google App Engine
 Cloud security and privacy
 Research topics
In projects, we will learn to use




Hadoop
Mapreduce, Pig Latin
AWS
google app engine
Cloud Computing
lecture 1-2
Some slides are borrowed from UC
Berkeley RAD Lab
Keke Chen
Outline





What is cloud computing?
Why now?
Cloud killer applications
Cloud economics
Challenges and opportunities
 “above the cloud”
 “Clairemont Report”
What is Cloud Computing?
 Old idea: Software as a Service (SaaS)
 Def: delivering applications over the
Internet
 Recently: “[Hardware, Infrastrucuture,
Platform] as a service”
 Utility Computing: pay-as-you-use
computing
 Illusion of infinite resources
 No up-front cost
 Fine-grained billing (e.g. hourly)
12
Cloud computing vs. grid
computing
 Cloud computing = virtualization+
grid + services + utility computing
 Grid computing: resource provisioning,
load balancing, parallel processing
 Views of different users
 System admin/hadoop users: grid
 Application owners/service users:
service, utility
Users and cloud providers
Why Now?
 Experience with very large datacenters –
profitable for cloud providers




economics of scale
Pervasive broadband Internet
Fast x86 virtualization
Pay-as-you-go billing model






Online payment
Online Ads
Content distribution
Web 2.0 lowers the entry point to e-business
more small e-business owners
Large user base of clouds
 Large user base
15
Spectrum of Clouds
 Instruction Set VM (Amazon EC2,
3Tera)
 Bytecode VM (Microsoft Azure)
 Framework VM
 Google AppEngine, Force.com
Lower-level,
Less management
EC2
Higher-level,
More management
Azure
AppEngine Force.com
16
Cloud Killer Apps
 Mobile and web applications
 Batch processing / MapReduce
 Data analytics (big data)
 E.g., OLAP, data mining, machine learning
 Extensions of desktop software
 Matlab, Mathematica
17
Cloud Economics
Resources
Capacity
Demand
Resources
• Pay by use instead of provisioning for peak
Capacity
Demand
Time
Time
Static data center
Data center in the cloud
Unused resources
18
Economics of Cloud Users
• Risk of over-provisioning: underutilization
Capacity
Resources
Unused resources
Demand
Time
Static data center
19
Economics of Cloud Users
2
1
Time (days)
Capacity
Demand
Capacity
2
1
Time (days)
Demand
Lost revenue
3
Resources
Resources
Resources
• Heavy penalty for under-provisioning
3
Capacity
Demand
2
1
Time (days)
Lost users
3
20
Economics of Cloud Providers
 5-7x economies of scale [Hamilton
2008]
Resource
Cost in
Medium DC
Cost in
Very Large DC
Ratio
Network
$95 / Mbps / month
$13 / Mbps / month
7.1x
Storage
$2.20 / GB / month
$0.40 / GB / month
5.7x
Administration
≈140 servers/admin
>1000 servers/admin
7.1x
 Extra benefits
 Amazon: utilize off-peak capacity
 Microsoft: sell .NET tools
 Google: reuse existing infrastructure
21
Adoption Challenges
Challenge
Opportunity
Availability
Multiple providers & DCs
Data lock-in
Standardization
Data Confidentiality,
Auditability, and privacy
Encryption, VLANs,
Firewalls; Geographical Data
Storage; Privacy preserving
data outsourcing
22
Growth Challenges
Challenge
Opportunity
Data transfer
bottlenecks
FedEx-ing disks, Data
Backup/Archival
Performance
unpredictability
Improved VM support, flash
memory, scheduling VMs
Scalable storage
Invent scalable store
Bugs in large distributed Invent Debugger that relies
systems
on Distributed VMs
Scaling quickly
Invent Auto-Scaler that
relies on ML; Snapshots
23
Policy and Business Challenges
Challenge
Opportunity
Reputation Fate Sharing
Offer reputation-guarding
services like those for email
Software Licensing
Pay-for-use licenses; Bulk
use sales
24
Research Challenges Mentioned by
Database Community (Claremont
Report)
Functionality and operational
cost
 Background: compare massive-scale
data intensive computing systems
with today’s DBMS
 Limited functionality
 Simple APIs (e.g. mapreduce)
 Pushes more burden on developers
 Benefits
 Easier to manage
 Lower operational cost
 Service Level Agreement (SLA) that is
hard to provide for a SQL DBMS
P.S. DB Systems are notorious for their expenses in
installation and maintenance.
Manageability
 Features of cloud systems




Limited human intervention
High variance workloads
A variety of shared infrastructures
No DBAs or Administrators to assist developers
 Systems need to do work automatically
 Self-managing
 Adaptive (autonomous) computing
Data security and privacy
 Users sharing physical resources in a
cloud
 Protect from each other (security)
 Protect from curious cloud providers
(privacy)
 Successes may depend on specific
target usage scenarios
 Examples
 Query based services
 Mining based services
Datasets over multiple clouds
 Interesting datasets might be
available in different clouds
 Different cloud providers
 Private or public clouds
 Services mashing up datasets
 Inevitably crossing clouds
 Federated cloud architectures
Algorithms on Big data
 Working on “Big Data”
 Data mining
 Machine learning
 Visualization
 Traditionally assume data is in
 flat files or relational databases
 Distributed data organization puts
new challenges
 Redesign algorithms
 Redesign frameworks