Privacy-Aware Computing
Download
Report
Transcript Privacy-Aware Computing
CEG7380 Cloud Computing
Lecture 1
Keke Chen
Outline
Syllabus
Scope of this course
Tentative schedule
Prerequisites
Resources
Assignments
Introduction
Scope of this course
Understand the basic ideas of cloud
computing
Get familiar with
Tools
Systems
Expose to some research topics
Two major parts:
Processing large data with the cloud
Scaling up/down web applications
with the cloud
Note: some programming parts need
self-study
Prerequisites
Some programming skills
Java, python, shell
Comfortable with learning new
programming frameworks
Sufficient knowledge about
Data structure and databases
Operating systems
Distributed systems
Assignments and Grading
Reading papers (~3) (10%)
Some miniprojects (4~5) (60%)
Help you master the concepts
Learn to use tools and systems
Self-motivated research projects are
strongly encouraged!
Final exam (20%)
Class attendance and discussion
(10%)
Resources
updated reference list
Inhouse hadoop cluster
AWS access
coupon code for each student
Pilot
Submitting reading assignments and
projects
Tentative Schedule
Parallel data processing
Distributed file systems (GFS, HDFS)
MapReduce
High-level distributed data management
Cloud infrastructures
Virtualization
AWS and Eucalyptus
Interactive front-end – Google App Engine
Cloud security and privacy
Research topics
In projects, we will learn to use
Hadoop
Mapreduce, Pig Latin
AWS
google app engine
Cloud Computing
lecture 1-2
Some slides are borrowed from UC
Berkeley RAD Lab
Keke Chen
Outline
What is cloud computing?
Why now?
Cloud killer applications
Cloud economics
Challenges and opportunities
“above the cloud”
“Clairemont Report”
What is Cloud Computing?
Old idea: Software as a Service (SaaS)
Def: delivering applications over the
Internet
Recently: “[Hardware, Infrastrucuture,
Platform] as a service”
Utility Computing: pay-as-you-use
computing
Illusion of infinite resources
No up-front cost
Fine-grained billing (e.g. hourly)
12
Cloud computing vs. grid
computing
Cloud computing = virtualization+
grid + services + utility computing
Grid computing: resource provisioning,
load balancing, parallel processing
Views of different users
System admin/hadoop users: grid
Application owners/service users:
service, utility
Users and cloud providers
Why Now?
Experience with very large datacenters –
profitable for cloud providers
economics of scale
Pervasive broadband Internet
Fast x86 virtualization
Pay-as-you-go billing model
Online payment
Online Ads
Content distribution
Web 2.0 lowers the entry point to e-business
more small e-business owners
Large user base of clouds
Large user base
15
Spectrum of Clouds
Instruction Set VM (Amazon EC2,
3Tera)
Bytecode VM (Microsoft Azure)
Framework VM
Google AppEngine, Force.com
Lower-level,
Less management
EC2
Higher-level,
More management
Azure
AppEngine Force.com
16
Cloud Killer Apps
Mobile and web applications
Batch processing / MapReduce
Data analytics (big data)
E.g., OLAP, data mining, machine learning
Extensions of desktop software
Matlab, Mathematica
17
Cloud Economics
Resources
Capacity
Demand
Resources
• Pay by use instead of provisioning for peak
Capacity
Demand
Time
Time
Static data center
Data center in the cloud
Unused resources
18
Economics of Cloud Users
• Risk of over-provisioning: underutilization
Capacity
Resources
Unused resources
Demand
Time
Static data center
19
Economics of Cloud Users
2
1
Time (days)
Capacity
Demand
Capacity
2
1
Time (days)
Demand
Lost revenue
3
Resources
Resources
Resources
• Heavy penalty for under-provisioning
3
Capacity
Demand
2
1
Time (days)
Lost users
3
20
Economics of Cloud Providers
5-7x economies of scale [Hamilton
2008]
Resource
Cost in
Medium DC
Cost in
Very Large DC
Ratio
Network
$95 / Mbps / month
$13 / Mbps / month
7.1x
Storage
$2.20 / GB / month
$0.40 / GB / month
5.7x
Administration
≈140 servers/admin
>1000 servers/admin
7.1x
Extra benefits
Amazon: utilize off-peak capacity
Microsoft: sell .NET tools
Google: reuse existing infrastructure
21
Adoption Challenges
Challenge
Opportunity
Availability
Multiple providers & DCs
Data lock-in
Standardization
Data Confidentiality,
Auditability, and privacy
Encryption, VLANs,
Firewalls; Geographical Data
Storage; Privacy preserving
data outsourcing
22
Growth Challenges
Challenge
Opportunity
Data transfer
bottlenecks
FedEx-ing disks, Data
Backup/Archival
Performance
unpredictability
Improved VM support, flash
memory, scheduling VMs
Scalable storage
Invent scalable store
Bugs in large distributed Invent Debugger that relies
systems
on Distributed VMs
Scaling quickly
Invent Auto-Scaler that
relies on ML; Snapshots
23
Policy and Business Challenges
Challenge
Opportunity
Reputation Fate Sharing
Offer reputation-guarding
services like those for email
Software Licensing
Pay-for-use licenses; Bulk
use sales
24
Research Challenges Mentioned by
Database Community (Claremont
Report)
Functionality and operational
cost
Background: compare massive-scale
data intensive computing systems
with today’s DBMS
Limited functionality
Simple APIs (e.g. mapreduce)
Pushes more burden on developers
Benefits
Easier to manage
Lower operational cost
Service Level Agreement (SLA) that is
hard to provide for a SQL DBMS
P.S. DB Systems are notorious for their expenses in
installation and maintenance.
Manageability
Features of cloud systems
Limited human intervention
High variance workloads
A variety of shared infrastructures
No DBAs or Administrators to assist developers
Systems need to do work automatically
Self-managing
Adaptive (autonomous) computing
Data security and privacy
Users sharing physical resources in a
cloud
Protect from each other (security)
Protect from curious cloud providers
(privacy)
Successes may depend on specific
target usage scenarios
Examples
Query based services
Mining based services
Datasets over multiple clouds
Interesting datasets might be
available in different clouds
Different cloud providers
Private or public clouds
Services mashing up datasets
Inevitably crossing clouds
Federated cloud architectures
Algorithms on Big data
Working on “Big Data”
Data mining
Machine learning
Visualization
Traditionally assume data is in
flat files or relational databases
Distributed data organization puts
new challenges
Redesign algorithms
Redesign frameworks