The Claremont Report on Database Research
Download
Report
Transcript The Claremont Report on Database Research
Intro to Cloud Computing
Source: http://www.free-pictures-photos.com/
Cloud Computing
• No longer the next big thing – the current big
thing
– Cloud Computing first used in 1996
• Presentation by Compaq Computer Company
– Name cloud inspired by cloud symbol
representing internet in diagrams
– Amazon popularized idea of the cloud
What is Cloud Computing?
• But what is it?
• Everyone has a different opinion on what it is
• Is it trendy?
• “The computer industry is the only industry that is
more fashion-driven than women’s fashion”
– Larry Ellison
Questions to answer
• What clouds have you used today (yesterday)?
Cloud Computing
• Everyone has an opinion on what to use a
cloud for
– Applications on the internet – email, tax prep
– Storage for business, personal data
– Web services for photos, maps, GPS
– Rent a virtual server, load software on it, turn it
on/off, clone it if sudden workload demand
– Store, secure data for authorized access (really?)
– Use a platform including OS, Apache, MySQL,
Python, PHP
Questions to answer
• What is a cloud?
Cloud Computing Characteristics
• So what are its characteristics?
• Described as: On-demand computing, pay as you
go, software as a service, utility computing
• Usually costs, but cost-effective
• Emphasizes availability
• Virtualization
• Scalable (expand on current hardware)
• Elastic (dynamically add hardware as needed)
• Distributed and highly parallel approach
• Replication, replication, replication …
Cloud Computing Hands On Approach
Definition
• 1st five NIST definition:
–
–
–
–
–
On-demand self-service – no interaction with provided needed
Broad network access – over network using standard access, platform indep.
Resource pooling – virtual resources
Rapid elasticity – scales horizontal (number) or vertical (capacity)
Measured service – usage charge based on metric
–
–
–
–
–
Performance - improvement
Reduced cost
Outsourced management – IT infrastructure, software
Reliability – 99.99% uptime guarantee
Multi-tenancy – virtual: computing and storage resources shared
organic: every component shared | OS, DB servers, etc.
– Ease of utilization (no licenses), QOS, simplified, low barrier to entry
What is Cloud Computing?
• Cloud is a metaphor for the internet
• Internet is:
For the cloud user - Applications
• What does cloud computing actually do?
– Consider applications you may currently be
running on laptop, desktop, phone, server
– Cloud has them also, or can potentially bring them
to you
– Brings applications, views, manipulates, shares
data
Applications
• Allow access to applications other than on
local computer or internet connected device
• Instead, cloud provider hosts your application
- Advantages?
– No more licenses, service packs, etc.
– Less hardware, etc.
– Can access anywhere
Clouds
• Allow access to applications other than on
local computer or internet connected device
But
Only as long as have internet connection
Potential Problems
• Internet connection
• Cloud site failure
• Sensitive information
• Application integration – (exchange info when
local and on cloud)
What Motivated Cloud Computing
Initial motivation:
– Web-scale problems
Solutions:
– Large data centers
How to access:
– Highly-interactive Web applications (thin client)
Next Step:
– Different models of computing
Initial motivation: Web-Scale
Problems
• Characteristics:
– Definitely data-intensive
– May also be processing intensive
• Examples:
– Crawling, indexing, searching, mining the Web
– “Post-genomics” life sciences research
– Other scientific data (physics, astronomers, etc.)
– Sensor networks
– Web 2.0 applications
–…
How much data?
• Google processes over 100 PB a day; 3M servers
PB = 1,000,000,000,000,000 bytes - 1015
– Stack of 1 TB hard disks that is 25,400 km high
• CERN’s LHC generates 15 PB a year
• Rendering ‘Avatar’ required > 1 PB of storage
• Facebook – 300 PB + growing at 600TB per day; 35% of
world’s photos
• Sloan Digital Sky Survey – 0.5 PB /month in 2015
• “all words ever spoken by human beings”
•
~ 5 EB – 1018 (probably 8,400 times larger)
• LARGE data is the next frontier
How much computation?
• Facebook > 180,000 servers (2012)
• Google is though to have > 1M servers,
planning for 10 M
Solution: Large Data Centers
• Web-scale problems? Throw more machines at it!
• Need for scalability; same current services
• In the future businesses will not need to invest in a data
center – Cloud providers will take care of it
• History
– Decades ago – computing power in mainframes in computer rooms
– Personal computers changed that
– Now, network data centers with centralized computing are back in
vogue
– BUT, no more mainframes
Scaling up
• From PCs to clusters
Clusters
• Many machines, close interconnection
• Special standardized hardware (racks, blades)
• Owned by a single organization
Improvements since ‘80s
• Disk capacity
– From 10s MB to several TB – orders of magnitude
– Seagate 60TB SSD largest hard drive
– IBM built 120PB storage array – 200,000 individual
hard drives:
http://www.theregister.co.uk/2011/09/01/ibm_120pb_array/
• Latency
• Bandwidth
– 50X
From clusters to?
• Clusters can be too power hungry for one
building
• Build separate facility
• Lots of cooling and power
• Result: Data center
racks
networking
Maximilien Brice, © CERN
Emergency power supplies
Cooling
Cloud Components
• Datacenter
• Clients
Google’s Large Data Centers
• Although Google famous for innovating web
searching, Google’s architecture as much a
revolution
– Instead of few expensive servers, use many cheap servers
($5000 instead of $100,000)
• 1/2M servers in ~ 12 locations)
• With thin, wide network
– Derive more from scale of the whole than any one
part – no hub
• Cloud – robust and self-healing
– Uses a lot of power
• Need cheaper power solutions
Google’s Data Centers
• Redundancy
– Redundancy is the key to the success of clouds
– Google approach – cheap components that fail, so
replicate all processing and storage
•
•
•
•
Efficiency
Utilization
Management
Virtualization
Cloud Components - Clients
• Clients
– Mobile
• Phones, PDAs
– Thin
• no internal hard drives, lets servers do all work, displays
info
– Thick
• Laptops, web browsers
– Which is the best?
• Thin - lower costs, security, power consumption, easy
to replace, less noise
Data Center
• Data Center
– What if the data center is not big enough?
Modular data center
Data Centers
– Distributed data centers
• geographically disparate
• Dynamic datacenter so can increase as needed
– Need to be close to users (physics)
– Cheaper resources
– Protect against failures
Types of Clouds
• Public, Private, Hybrid Clouds
• Names do not necessarily dictate location
• Type may depend on whether temporary or
permanent
Public Clouds
• 3rd party , applications from different
customers mixed together
• Typically hosted away from customer premises
Private Cloud
• Built for exclusive use of one client – utmost
control over data, service, QOS
• Company owns infrastructure – may be
located at enterprise or at colocation
• Built and managed by
enterprise IT or
cloud provider
Virtual private data center
• Can create a virtual private data center for
single client within public cloud
– Located in same facility
Hybrid Cloud
• Combine both private and public models
• Augment private cloud with public resources
– Cloud services for CRM, mail, word processing
– Business application inventory-tracking software on own private cloud
• Good for surge computing
• How to distribute data across both challenging
• Best for smaller data-why?
Community Clouds
• Shares infrastructure among several organizations
from specific community with same concerns
• Hosted Internally or externally
• Can be managed internally or by 3rd party
• Cost spread over members of community/share
infrastructure
• An even more complex hybrid cloud …
A cloud within a cloud within…
• Cloud cell – provides a distinct fundamental
service which may be re-used by other clouds
or cloud cells
– Web server cell
– DB cell
– Storage cell
Really?
• Personal cloud
– Network attached storage device that backs up
your data
Questions to answer this semester
1. IS CLOUD COMPUTING JUST A BUSINESS
MODEL AND NOT A COMPUTING MODEL?
2. IS THERE ANYTHING NEW IN CLOUD
COMPUTING OR IS IT JUST DISTRIBUATED
COMPUTING WITH A DIFFERENT NAME?
3. IS IT REALLY ALL ABOUT MONEY??
• http://www.salesforce.com/cloudcomputing/
The Result of Clouds:
Different Computing Model
“Why do it yourself if you can pay someone to do it for you?”
Software-as-a-Service (SaaS)
Platform-as-a-Service (PaaS)
Infrastructure-as-a-Service (IaaS)
IaaS
• Infrastructure as a Service (IaaS) – aka Hardware as a
Service (HaaS) and Utility computing
– Why buy machines when you can rent cycles?
– Utility computing billing – based on what used
– Provides basic storage and compute capabilities as
server
• Servers, storage systems, CPU cycles, switches,
routers, etc.
IaaS
• Does not provide applications to customers
(SaaS and PaaS do)
• Saves cost of purchasing
• Infrastructure can be scaled up or down
• Multiple tenants can use equipment at the
same time – called multitenant
• Device independence – access systems on
different hardware
• Low barriers to entry
IaaS Components
– Computer hardware – rented out, provider set up
as a grid for scalability
• Network – hardware for firewalls, routers, etc.
• Internet connectivity so user can access hardware
– Allows clients to run the VM they want
Questions/Problems
• How do you use this hardware?
• If they provide the hardware and software to
use it, is it no longer IaaS?
• If you want to use their servers, do you have
to create your own VM? Do they have VMs
available?
Comment
• If you create your own VMs, etc. This is not
easy …
IaaS Examples
– Look for IaaS, get cost estimates
• Ex: Oracle’s Ravello vs. Amazon’s EC2
PaaS
• Platform as a Service (PaaS) aka cloudware
– Give me nice API and take care of the implementation
– Supplies all resourced needed to build apps and services
without having to download or install software
– Provides a computing platform and solution stack
• E.g for web application need OS, web server, DB, prog language
• Provides support to create user interfaces (HTML, Javascript)
• Provides automatic facilities for concurrency management,
scalability, failover, and security
– Services include:
• app design, development, testing, deployment, hosting
PaaS
– Customer interacts with platform through API
– Layer of software encapsulated provided as service to
build higher level services
– Runtime services – allows application to leverage
infrastructure
– Platform manages and scales
– Team collaboration, web service integration, database
integration, security, scalability, storage, state
management, versioning
PaaS
• Supports web development interfaces
– SOAP (simple object access protocol), REST
(Representational state transfer), allow construction of
multiple web services (mashups)
– Interfaces able to access DBs, reuse services
• Options:
– Add-on development facilities
• Stand-alone environments for general development
• Customize SaaS applications
• Application delivery-only environments for hosting level
services (e.g. security, on-demand scalability) not
development, debugging and testing
PaaS provides
• Development teams across world to work
together
• Merge web services from multiple sources
• Cost savings from using built-in security,
scalability and failover
• Cost-savings from using higher-level
programming abstractions
Problems with PaaS
• Vendors used proprietary services or
languages – developer may be locked in
• Lack of portability and interoperability – if
develop on one cloud, can’t move to another
(unless pay …) – Lock-in
• What if provider goes out of business?
Zimki
– Zimki hosted JavaScript environment
• One of the original PaaS around 05-06
– Announced 9/2007 would close 12/2007
– Wanted to go open source, but parent company
had other ideas
– Interesting blog by former CEO
• Look for Examples of PaaS, costs
• Ex: Google Apps Engine
SaaS
• Software as a Service (SaaS) – web based
applications
– Just run it for me!
– Software available on cloud for use
– Application hosted as a service to customers who
access via the internet
– Single instance runs and services multiple end
users
SaaS
• Good candidates for SaaS:
– Simple task with little interaction with other
systems
– Customers who want high powered apps but do
not want to develop
•
•
•
•
•
•
Customer resource management CRM
Video conferencing
IT service management
Accounting
Web analytics
Web content management
SaaS
• Unlike earlier distributed computing tools,
SaaS specifically to use web tools
• Built with multitenant
• Can access from anywhere as long as have
internet
• SaaS often used as a component of another
application – mashup or plugin
Benefits to SaaS
•
•
•
•
•
•
Everyone knows WWW, little training needed
Smaller IT staff needed
Easier to customize
Better marketing by providers, accommodate more
Web reliability
Security (SSL used), don’t need VPNs (Virtual private
networks on back-end)
• More bandwidth – low latencies
SaaS
• Pros/Cons
– Customer doesn’t have to maintain or support SW
– Out of customer’s hands when hosting service
changes it
– Use software out of box
– Instead of just paying for its once, billed
– Don’t have to pay as much up front, cheaper more
reliable
– Stronger protection of intellectual property
(no more open source??)
Obstacles to SaaS
• Specific computational need not addressed –
may have to buy own
• Lock-in – can’t move to new vendor without
penalty
• Open source and cheaper hardware
Example Applications Benefiting
• Using Hadoop tool, open-source MapReduce
– NY Times converted 11 M articles, images in
archive to PDF
– Instead of 7 weeks, using Hadoop took 24 hours, <
$300
• Animoto’s mashup tool – create videos from
set of images and music
– Scaled from 50 to 3500 servers in 3 days
– Application built to be horizontal
• Look up Examples SaaS, costs
• Ex: salesforce.com, Gmail
Future of SaaS
• Move all processing power to the cloud and
carry ultralight input device
– Already starting to happen?
• E-mail
• Google Docs
• Implications for Microsoft, software as purchasable
local application
– Windows Live (Microsoft’s cloud)
– Adobe web based photoshop
IaaS, PaaS, SaaS
In summary - IaaS, PaaS, SaaS
• With IaaS
– Provider doesn’t know what you are going to do
with HW
– Just ask for resources
– So you can specify how many machines, how
many VMs per machine, etc.
– Can create your own PaaS, or SaaS on Iaas
IaaS, PaaS, SaaS
• With PaaS
– Ask for specific web services, DBs, etc.
– Restricted to using only those, can modify only
within constraints of platform
– System decides what hardware and how many
VMs you get, e.g. scaling
• With SaaS
– Just say which software and you use it
Cloud Computing Characteristics
• So what are its characteristics?
• Described as: On-demand computing, pay as you
go, software as a service, utility computing
• Usually costs, but cost-effective
• Virtualization
• Scalable (expand on current hardware)
• Elastic (dynamically add hardware as needed)
• Distributed and highly parallel approach
• Emphasizes availability
• Replication, replication, replication …
When should you use Cloud
(Public) Computing?
• Consider
– Cost/benefit ratio
– Speed of delivery
– How much capacity
will be used
– Whether data
is regulated
– Organization’s
corporate IT structure
Cost Effective?
CostCloud = ∑(UnitCostCLOUD X (Revenue-CostCLOUD))
– unitCost – cost of resource per hour
– Doesn’t scale linearly
• Compare CostCLOUD to cost of running own Datacenter
CostDATACENTER = ∑(UnitCostDATACENTER X (RevenueCostDATACENTER/Utilization))
May have to calculate the above for each system, e.g. networking, cooling, etc.
SLA
• Service level agreements between provider and client
– Standardized –little negotiated
– Some SLAs enforceable as contracts, but more like Operating Level
Agreements
– May not have the force of law
– SLA’s Address:
•
•
•
•
•
Availability (uptime)
Reponses times or latency
Reliability of components
Responsibilities of each party
Warranties
SLA
– Specifics:
• The specific parameters, minimum levels required for each
element of the service, remedies for failure to meet requirements.
• Affirms ownership of data stored on the service provider’s system,
specifies your rights to get it back.
• System infrastructure and security standards to be maintained by
the service provider, your rights to audit their compliance.
• Specifies your rights and cost to continue and discontinue using
the service.
• More detailed SLA
Cloud Computing Characteristics
• So what are its characteristics?
• Described as: On-demand computing, pay as you
go, software as a service, utility computing
• Usually costs, but cost-effective
• Virtualization
• Scalable (expand on current hardware)
• Elastic (dynamically add hardware as needed)
• Distributed and highly parallel approach
• Emphasizes availability
• Replication, replication, replication …
Virtualization
• Allows infrastructure to be:
– Flexible, scalable, economical
• What is virtualization?
– Software implementation of a computer that
executes programs like a physical machine
– Installation of one machine runs on another
– All software runs on a server within virtual machine
– First appeared in 1967 with IBM CP-40 system
– Intel Virtualization Technologies (IVT) extensions
and AMD-Virtualization made it doable
Virtualization
• Why is it useful?
– Abstracts hardware so software stacks can be deployed
without tied to specific physical server
• Can
– Share computer among multiple users
– Run applications and different operating systems
on same machine
– Isolate users from each other and control
program
– Emulate software and/or hardware on another
machine
Virtualization
• Virtual Machine VM
– isolated guest OS installation within a normal host OS
– Runs on top of the OS of the server machine
– Object of deployment
Virtualization
• Hypervisor – Virtual Machine Manager VMM
• One level higher than supervisory program
• Installed directly on server hardware or run within an OS
• Hypervisor is specialized OS, runs VMs instead of apps
• Easily create copies of existing environments
• Can exist on same servers or different machines
• Single server multiple OS instances, minimize CPU idle
time
App
App
App
App
App
App
OS
OS
OS
Operating System
Hypervisor
Hardware
Hardware
Traditional Stack
Virtualized Stack
Hypervisor
• Hypervisor comprised of
– Memory manager
– Process scheduler
– I/O stack
– Device drivers
– Security manager
– Network stack
– Etc.
Types of Hypervisors
Virtualization
• In a cloud - application needs a VM on which
to run
• Application will be associated with that VM
• Entire user interface resides in single window
– Provide all facilities of OS inside a browser
• Program must continue running even as
number of users grows
Virtualization
• Virtual Machine VM
– isolated guest OS installation within a normal host OS
– Runs on top of the OS of the server machine
– Object of deployment
• Virtual Machine Image –
– Static data containing software (OS, apps, data files) the VM will run
once started
– Used to create VM instance
– Typically stored on disk
• Amazon: AMI
• Virtual Machine Instance –
– Running virtual machine
– Started from image, runs OS and processes, computes, etc.
– dynamic object you can interact with
– snapshot of a VM at a given time
Virtualization
• Virtual Appliance
– pre-configured virtual machine that includes software
partially or fully configured to perform a specific task
– Built to host a single application
• VMs are deployed – copy image from Appliance Library to
machine (hypervisor) with specific Virtual Appliance
configuration
Full Virtualization
• Full virtualization
– Complete installation of one machine runs on
another
– emulate entire system
– Guest OS unaware it is a virtualized environment
Paravirtualization
• Virtualization may not be efficient
• Paravirtualization instead
– Doesn’t emulate entire system like in full (e.g. BIOS, drive)
– uses resources efficiently
– OS adjusted to work in virtual machine
– Better performance, only emulate some elements
– Guest OS is aware it is a guest
– Instead of issuing HW commands, issues commands to
host OS (hypervisor)
Windows
Tux -Linux penguin
Cs10431368
Paravirtualization
• Better scaling,
– Allows multiple OS to run on a single hardware device at
same time
• but, sacrifices security and flexibility
– Software running inside VM is limited to the resources and
abstractions provided by VM
– Cannot break out of environment
Full vs para?
• Seems like full virtualization is still dominant
• If guest OS is same as host OS, can share the
kernel
• Windows runs unmodified as a guest OS, but
paravirtualization open-source drivers are
being developed
Amazon
• Amazon Machine Images (AMI) use 2 types of virtualization:
– Paravirtual PV (Micro instance doesn’t imply paravirtualization)
– Hardware Virtual Machine HVM (efficient full) – can use HW
extensions
• Only Linux AMIs can use PV
– Used to have better performance than HVM but no longer true
• Linux and Windows AMIs can use HVM
– Same as if OS run on a bare metal machine
– Take advantage of hardware extension to provide fast access to
underlying hardware on host
Amazon
• PV used to perform better – used special drivers for I/O
avoiding overhead of emulating network and disk hardware
• HVM had to translate to emulate hardware
• Now PV drivers available for HVM guests
– OS like Windows can get advantages in storage and
network I/O by using them
• http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtu
alization_types.html
Virtualization – KVM paper
• Read: KVM paper
• Computer architectures now available used to
only be for mainframes - Intel x86
– Multithreading with 8 or more cores
– Large memory systems with NUMA
Virtualization of x86
• Security of x86 platform
– Ring 0 – OS kernel access to HW, most privileged
– Ring 3 – user apps
• Problem with virtualization – run in ring 3
– If VM makes privileged call, HW traps instruction and issues a fault,
destroys VM
Solution:
– OS running in VM, move OS kernel to ring 1
– Early x86 hypervisors emulated CPU in software
• slow
Binary Translation
• Another model pioneered by VMware:
– Run VM directly on CPU
– When privileged instruction, issue trap handles by
hypervisor and emulated
– Any x86 OS can run unmodified on the hypervisor
– Complex to implement, better performance
Xen
• Paravirtualization
– Modify guest OS in VM and place all privileged
instructions with direct calls to hypervisor
– Modified guest OS can cooperate with hypervisor
for improved scheduling and I/O, no need to
emulate hardware
– Paravirtualization requires changes to OS to be
implemented by OS vendor
Xen
• Xen Hypervisor platform: Two components
– Xen hypervisor
• CPU, memory virtualization, power management, VM
scheduling
• Loads privileged VM – Dom0, direct access to HW,
device drivers, I/O management for VM
• Implemented using Linux, modified kernel
• Not in upstream kernel, Vendors ship Xen as a forked
copy of Linux kernel
• Thin Type-1 hypervisor
Xen
– VMs
• unprivileged domU
• Modified Linux kernel interfaces with hypervisor not
HW
• domU runs in ring1, user space ring 3
HW Assisted Virtualization
• Extensions to x86 developed by intel and AMD
to simplify CPU virtualization
– New operating mode added to CPU - Host or guest
mode
– Guest mode – rings 0-3, CPU traps instructions
then return control to hypervisor
– Reduces overhead but more resources expended
by the hypervisor
• Guest OS cannot access memory - hypervisor provides
virtualized memory implementation
– Implemented in HW (MMU) faster than SW implementation
KVM
• Latest generation of open source virtualization
• Converts Linux kernel into bare metal hypervisor
• Hypervisor is a specialized OS, differs only because
runs VMs rather than apps
• Leverages HW features to assist virtualization
– Does not support legacy hardware
• Build on what already there, don’t reinvent
• In upstream Linux kernel
KVM
• VM implemented as a regular Linux process,
scheduled by Linux scheduler
• Same security model as any Linux process
– SELinux – security enhanced linux NSA
– Any virtual environment only as secure as the
hypervisor
KVM
• Inherits memory management of Linux, same
as any other Linux process
– NUMA non-uniform memory access, large
amounts of memory
• Can access local memory faster than non-local
• Also, extended page table, merges pages
shared by same VM
• Since part of Linux, leverages all HW devices
KVM
•
•
•
•
Can use any storage supported by Linux
Supports VM images on shared file systems
Supports live migration
Supports:
– OpenBSD, FreeBSD, MS Windows Server, Red Hat
Enterprise Linux
– Hybrid virtualization, can use I/O interface rather
than emulation
– Uses VirtIO – standard for device drivers
• hypervisor independent
KVM
• Performance and scalability of Linux
• Process scheduler – completely fair scheduler
– Guarantees resources to VMs
– Real-time extensions for mission critical workloads
– Kernel processes with long CPU time slices divided
into smaller components for scheduling
Open-source Cloud - Hypervisors
• Hypervisors
– KVM, Xen, VMware, Oracle VM
• Run on a host OS, but can emulate using
virtualization many guest OSs
– E.g. KVM host must be Linux, but supports guest
OSs Linux, Windows, Solaris, BSD
Open-source Cloud - Hypervisors
• Hypervisors
– KVM, Xen, VMware, Oracle VM
– KVM: host OS has to be Linux
• Can’t use in older CPUs before virtualization extensions
– Xen: been around a lot longer
• Can use on machines that don’t have virtualization
extensions
• Currently better performance
• Amazon’s EC2 uses Xen
– VMware
• Geared towards performance
Type 1 vs Type 2 hypervisors
• Xen, MS HyperV, Oracle VM are type 1,
• VMware workstation and VirtualBox are type
2
• KVM type 1 or type 2? – debatable
Ring 1
Traditional stack
1st generation of virtualization
Ring 3
Rewrites privileged code before executes
CPU, mem,
scheduler
Binary Translation
Xen
paravirtualization
Incorporated
into kernel
CPU, mem,
scheduler
Xen
KVM
Cloud Computing Characteristics
• So what are its characteristics?
• Described as: On-demand computing, pay as you
go, software as a service, utility computing
• Usually costs, but cost-effective
• Virtualization
• Scalable (expand on current hardware)
• Elastic (dynamically add hardware as needed)
• Distributed and highly parallel approach
• Emphasizes availability
• Replication, replication, replication …
Scalable
• Use what you need
– Hardware, platform (OS), software
• Company has a temporary surge in business, use
cloud resources instead of invest in new computing
equipment
(because of virtualization)
Impact on Software Developers
• Scalability - WHAT IMPACT ON SW DEVELOPERS?
• Developer:
– Chooses load balancer, DB server, Web server
– configures each component to make custom
image
– Chooses pattern for the images and deploys them
– Secure high available Web application up and
running
– Layers code into new architecture
Impact on Software Developers
• Layers code into new architecture
• Shifts responsibility for architecture decisions
from architects to developers
• Developer creates initial composition on virtual
machine using providers API
– see how scales and evolves to accommodate
workload changes
– Used to create new threads, now can create new
virtual machines
– When do one versus the other?
Cloud Computing Characteristics
• So what are its characteristics?
• Described as: On-demand computing, pay as you
go, software as a service, utility computing
• Usually costs, but cost-effective
• Virtualization
• Scalable (expand on current hardware)
• Elastic (dynamically add hardware as needed)
• Distributed and highly parallel approach
• Emphasizes availability
• Replication, replication, replication …
Elastic
• Cloud infrastructure used depends on
application
– Only need one server to run small job OR
– Massive number of servers needed
• ELASTIC – unlimited resources
• Cloud provider keeps adding hardware to
satisfy your demand
Public Cloud Providers - Amazon
• Amazon
– One of the first to offer cloud services to public
• Elastic Compute Cloud EC2 – VM and CPU cycles
– Which as a service?
– IaaS
• Simple Storage Service S3
– Store items up to 5GB
• Simple Queue Service (SQS)
– Allows machines to talk to each other using message passing
API
– Which as a service?
– PaaS
Amazon
• Simple DB
– Web service for running queries on structured data in RT
– Works with S3 and EC2 to store, process, query
– Now has an interface
– Root privilege
– Applications written on own machine and
uploaded to cloud
– http://aws.amazon.com
Public Cloud Provider - Google
• Google
– Compute Engine
• https://cloud.google.com/compute/
– App Engine
• Must use Google’s DB to store data
• Write a layer of python between user and DB
• Which as a service?
• PaaS
• Handy debugging features
• http://code.google.com/appengine/
– Cost?
• http://cloud.google.com/pricing/
Public Cloud Provider - Microsoft
• Microsoft
– Windows Azure
• Service hosting, low-level scalable storage, networking
• Operating System that allows clients to run Windows
apps and store files and data
• Which as a service?
• IaaS
Microsoft
– Azure Services Platform
• Developers can establish user identities, manage workflows,
synchronize data
• Microsoft SQL Services
– DB services and reporting
• Microsoft NET Services
– Service-based implementations of the .NET framework
• Live Services
– To share, store and synchronize documents, photos, and files
• Microsoft Sharepoint Services and Dynamics CRM Services
– For collaboration, solution development for business
• Which as a service?
• PaaS
Microsoft
– Also offer
• Office 365, Basecamp, Salesforce
• Which as a service?
• SaaS
• Browser-based Office – not all features?
• http://www.windowsazure.com
• Cost?
– http://www.windowsazure.com/en-us/pricing/calculator/
Public Cloud - Scenarios
• Cloud Storage
– One of first cloud offerings
– 100s of cloud storage vendors
• Compute clouds
– Amazon EC2, Google App Engine, Berkeley Open
Infrastructure for Network Computing
– May not be good for large organizations, do not
offer monitoring and governance capabilities
– Amazon offers enterprise-class support
Public Cloud - Scenarios
• Cloud Applications
– Utilize software apps that rely on cloud
infrastructure
•
•
•
•
SaaS (Google Apps)
P2P (BitTorrent and Skype)
Web apps (Facebook and YouTube)
Software plus services (MS Online Services)
Ring 1
Traditional stack
1st generation of virtualization
Ring 3
Rewrites privileged code before executes
CPU, mem,
scheduler
Binary Translation
Xen
CPU, mem,
scheduler
Xen
KVM
When not to use a Cloud
• Server Control
– If you need control over everything running, e.g.
amount of memory, CPU, hard drive specs or
interfaces, cloud not for you
• Hardware Dependencies
– If you need specific drivers, chips, etc.
– Cloud may not have or may change chipsets in
future
When not to use a Cloud
• Cost
– Over time cloud may cost more
• Lack of need
– If current solution OK, don’t worry about fashion
• Integration with existing apps
– Should not have one locally and one on cloud
• Security, speed, reliability problems
• Latency Concerns
– Slower in the cloud
• Throughput Demands
– Cost increases as throughput increases
• E.g. high def video over 100 sources
When not to use a Cloud
• Legislative Issues
– Laws and policy allow freer access to data on a cloud
than private server
• FBI can access data without warrant or owner’s consent
• Geopolitical concerns
– If in Canada, cannot store data on U.S. cloud – Why?
• (because of patriot act…)
https://www.techsoupcanada.ca/en/community/blog/clou
d_privacy_law
– What about storing your data on clouds outside of
USA?
When not to use a Cloud
• Health data
– HIPAA data could comingle on a server with
another organization’s data
– Still cloud providers were offering personal
ehealth records
– MS HealthVault
– Google Health – permanently discontinued
– Penalties:
from AMA website
Civil monetary penalties
Tier
Penalty
1. Covered entity or individual did not
know (and by exercising reasonable
diligence would not have known) the act
was a HIPAA violation.
$100-$50,000 for each violation, up to a
maximum of $1.5 million for identical
provisions during a calendar year
2. The HIPAA violation had a reasonable
cause and was not due to willful neglect.
$1,000-$50,000 for each violation, up to a
maximum of $1.5 million for identical
provisions during a calendar year
3. The HIPAA violation was due to willful
neglect but the violation was corrected
within the required time period.
$10,000-$50,000 for each violation, up to a
maximum of $1.5 million for identical
provisions during a calendar year
4. The HIPAA violation was due to willful
neglect and was not corrected.
$50,000 or more for each violation, up to a
maximum of $1.5 million for identical
provisions during a calendar year
Criminal penalties
Tier
Potential jail sentence
Unknowingly or with reasonable cause
Up to one year
Under false pretenses
Up to five years
For personal gain or malicious reasons
Up to ten years
HHS Imposes a $4.3 Million Civil Money Penalty for
HIPAA Privacy Rule Violations
The HHS Office for Civil Rights (OCR) has issued a Notice
of Final Determination finding that a covered entity,
Cignet Health of Prince George’s County, MD (Cignet),
violated the Privacy Rule of the Health Insurance
Portability and Accountability Act of 1996 (HIPAA). HHS
has imposed a civil money penalty (CMP) of $4.3 million
for the violations, representing the first CMP issued by the
Department for violations of the HIPAA Privacy Rule. The
CMP is based on the violation categories and increased
penalty amounts authorized by Section 13410(d) of the
Health Information Technology for Economic and Clinical
Health (HITECH) Act.
Cloud Limitations
• Certain applications not ready
– Needs lot of bandwidth to communicate
(expensive)
– Effort to integrate with other applications
– Mashup – data from multiple sources (although
becoming easier)
– May not be compatible with variety browsers and
operate using SSL
– Cannot communicate securely
– SECURITY
Cloud Benefits
• Scalability
• Simplicity – don’t have to configure new
equipment
• Knowledgeable vendors
• More internal resources – hire fewer people
• Security – strict privacy policies, employ
proven cryptographic methods
The same old things or
New contributions?
– Can use own data center or clouds
– Illusion resources are infinite
– Predominant model – Infrastructure as a service
IaaS
– Builds on established trends driving cost of
delivery
– Increases speed and agility for sketching
application architecture to actual deployment
– Virtualization, on-demand deployment, internet
delivery of services and open source software
Different view on what is new
about clouds
• Build on established practices, but changes
how we
– Invent, develop, deploy, scale, update, maintain
and pay for application and infrastructure
• See if you agree with this at the end of the
semester
Can you create your own
local/private cloud?
•
•
•
•
IaaS
Local – if stored in-house
Private – only used by enterprise
Everyone wants to be compatible with AWS EC2
(most popular public cloud)
• APIs consistent with AWS API so can reuse tools,
images and scripts
• 70% of "private clouds" aren't really clouds at all
Open-source Clouds
• Open-source cloud wars • Why do they all have “stack” in their name?
– Because are moving up the stack from layer 1
(physical) to layer 7 (applications) of the OSI (Open
Systems Interconnection) model
• So how do they make money?
• All of them use hypervisors
Open-source Clouds
• OpenStack
– Started by Rackspace (storage files) and NASA in 2010
– Openstack Foundation
– Both Ubuntu and Red Hat distributions
– Written in Python
– Hypervisors: KVM, Xen and VMware
• CloudStack being revived under Apache (stable release 6/16)
– OpenStack owner (Citrix) bought CloudStack (2011) “turned over” to
Apache (2012)
– More “Amazon like”
– Written in Java
– Hypervisors: KVM, vSphere, XenServer, Oracle VM
– Better for enterprises
Open-source Clouds
• OpenNebula
– Developed 2008 – European (stable release 6/16)
– Written in C++, C, Ruby, Java, shell script, etc.
– Xen, KVM, Vmware
– Interfaces: EC2, OGF OCCI, vCloud
• Nimbus
– Developed 2009
– Written in Jama, Python
– EC2/S2 compatible
– Xen, KVM
– Combine with OpenStack, Amazon, others
Open-source Clouds
– Eucalyptus
“Elastic Utility Computing Architecture for Linking Your
Programs To Useful Systems”
• Developed 2008 (stable release 4/16)
• Written in C, Java
• VMware, Xen, KVM
• Only project based on GPL (GNU general SW license)
and not ASL (Apache SW license)
• Implement AWS API on top of Eucalyptus
• AWS agrees to support Eucalyptus, users can migrate
workloads between the two, applications compatible
with both