cloud computing

Download Report

Transcript cloud computing

Опыт использования нечетких
распределенных вычислений
(cloud computing) в
геоинформатике
М.Н. Жижин
Геофизический центр и
Институт космических исследований
РАН
New technologies and innovations
• Long-term preservation with metadata and lineage (Virtual
Observatories)
• Parallel/disrtibuted data storage with Interactive data query
and network transfer of large datasets (MapReduce)
• Relational -> Object -> XML -> Array databases (SciDB)
• HPC data processing and modeling algorithms (Grid)
• Event detection, interrelation and data mining
(AlphaSearch)
• Web technologies for visualization of different data types
with geolocation (Neogeography)
• Collaborative data visualization (Videowalls)
• Scalable virtualization of CPU/network/storage resources
(Cloud Computing)
Multiplets of regional earthquakes
Downhole multipoint measurement at
Soultz geothermal reservoir
Global Lambda Integrated Facility
Available Advanced Network Resources
GLIF is a consortium of institutions, organizations, consortia and country National
Research & Education Networks who voluntarily share optical networking resources
and expertise to develop the Global LambdaGrid for the advancement of scientific
collaboration and discovery.
Visualization courtesy of Bob Patterson, NCSA; data compilation by Maxine Brown, UIC.
www.glif.is
Source: Joe Mambrotti
GLORIAD: 10Gb Worldwide Ring
Source: Natalia Bulashova
USA-Russia Lightpath for Fast Data Transfer of
Terabyte-sized Scientific Datasets
• National Center for Data Mining (NCDM) at the University of Illinois at
Chicago, Geophysical Center RAS and Space Research Institute RAS have
successfully moved 1.4 TB of data in 4.5 hours over a 1 Gbps lightpath
between Chicago and Moscow as part of the Teraflow Network initiative
• Using NCDM’s open-source UDP-based Data Transfer protocol (UDT), we
were able to transfer the MS SQL database with SDSS astronomy catalog.
The 2.5 TB database dump was compressed to 1.4 TB, split into 60 files,
transferred over a 1 Gbps lightpath and then decompressed in Moscow
and loaded back to MS SQL Server
• The SkyServer portal and the SDSS database were developed by Jim Gray
at MSR and Alex Szalay at JHU. Russian language mirror now resides at
www.skyserver.ru in Moscow
• Direct Lightpath link from IKI in Moscow to NGDC NOAA in Boulder has
been successfully tested
Russian Skyserver mirror:
www.skyserver.ru
Past Observations + Predictive Model =
Reanalysis
1. Direct observations in the past – including raw and
processed data, e.g. meteorological station or
satellite, 105 observations of atmosphere each 6 h
2. Predictive numerical model – “knows” physics,
uses direct observations as boundary values, e.g.
Global Circulation Model, 360 lat X 180 lon X 20
levels X 100 parameters= 1.3 X 108 data values
each 6 hours
3. Reanalysis – accumulated output of the numerical
model forecasts each corrected for the available
direct observations for a long time period, 50 years
at 6 h time step
Why OGSA-DAI service container?
• Standard tool in the Grid community
• Supports distributed workflow (in
version 3.*)
• Built in support for asynchronous
transactions
• Compatible with Web (Axis) and Grid
(OMII, UNICORE, GT4)
• Looked at alternatives like OpenDap,
WCS, … –documentation of our
analysis is available
• Problem 1: it is very complex
– Solution: REST wrapper
• Problem 2: supports only File, SQL and
XML data types and queries
– Solution: implement additional data
sources and functions for data in
multidimensional arrays
Web technologies for visualization of different
data types with geolocation
KML & geoRSS
Web-services for
CDM data sources
OGC Web
Map Services
WMS/WFS/WCS
MS Virtual Earth
Google Maps
Terraserver tile server by Jim Gray in 1998
http://terraserver.microsoft.com
Large database on the
Web (3 TB)
Operational since June
1998
Public access to USGS
topo maps and
aerial images
Low resolution images
No global coverage
GPS market not ready
12
Core box set image pre-processing
At the core warehouse images are acquired for the whole box set
To visualize them we split them into separate samples
Original box sets
Processed
New ways to mashup raster data
Above the Clouds:
A Berkeley View of Cloud Computing
Cloud Computing refers to both the applications delivered as services over the
Internet and the hardware and systems software in the datacenters that
provide those services.
• The services themselves have long been referred to as Software as a Service
(SaaS).
• The datacenter hardware and software is what we call a Cloud.
When a Cloud is made available in a pay-as-you-go manner to the public, we
call it a Public Cloud; the service being sold is Utility Computing:
• AmazonWeb Services,
• Google AppEngine, and
• Microsoft Azure.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html
Amazon AWS
Microsoft Azure
Google
AppEngine
VM
•x86 32 and 64 bit architecture
via Xen VM
•Computation elasticity allows
scalability, but developer must
build the machinery, or third
party must provide it
•Microsoft Common
Language Runtime
(CLR) VM;
•Automatic load
balancing
•Predefined
application in Python
•Persistent state
stored in MegaStore
Automatic scaling
Storage
•Range of models from block
store (EBS) to augmented
key/blob store (SimpleDB)
•Scaling varies from no scaling
(EBS) to fully automatic
(SimpleDB, S3)
•APIs vary from standardized
(EBS) to proprietary (S3)
•SQL Data Services
(restricted view of
SQL Server)
•Azure storage
service
•MegaStore/BigTable
Network
•Declarative specification of
topology
•Security Groups Availability
zones
•Elastic IP addresses provide
persistent name
•Automatic based on
roles
•Fixed topology for
3-tire webapps
•Automatic scaling
How to deploy SPIDR in Cloud?
Single instance:
SPIDR
webapp &
web services
EC2
EBS
S3
MySQL
databases
Database
dump
File system
snapshot
VM snapshot
bundle
VM image
Can we support multiple SPIDRs?
In different Amazon cloud regions?
Yes!
• Launch several instances of the SPIDR VM
• Configure DNS round-robin for load balancing
• Run MySQL master on the first instance, and MySQL
slaves on others
or
• Use third-party high-availability products for Amazon
cloud, such as RightScale
Clouds above Grid: Cumulus Nimbus
experiment in SKIF-Grid, fall 2009
Cloud VMs managed as Grid jobs
Condor Grid deployed in Cloud