Cloud Storage in the Czech Republic
Cloud Storage in Czech Republic
Czech national Cloud Storage and
Data Repository project
Cloud Computing and Cloud Storage and Data Repository
Grid Computing
and Storage
Web Services in the Cloud
What is Cloud Storage and Data
Understanding Cloud Storage and Data Repository
current and future requirements
Cloud Computing and Storage
Grid Computing
Refer to resource-pooled environments for running compute jobs (like
image processing) rather than long running processes (such as a Web
site or e-mail server)
Utility Computing
Refer to resource-pooled environments for hosting long running
processes, and tends to be focused on meeting service levels with the
optimal amount of resources necessary to do so
Cloud Computing
Refer to a variety of services available over the Internet that deliver
compute functionality on the service provider's infrastructure
Its environment (infrastructure) may actually be hosted on either a grid
or utility computing environment, but that doesn't matter to a service
The data in the cloud, as “Intel inside” (or intelligence inside), is often
an important part of the services
Cloud Computing – Simple Definition
Cloud Computing = Software as a Service
+ Platform as a Service
+ Infrastructure as a Service
+ Data as a Service
• Software as a Service (SaaS)
From end user’s point of view
Apps are located in the cloud
Software experiences are delivered through the Internet
• Platform as a Service (PaaS)
From developer’s point of view (i.e. cloud users)
Cloud providers offer an Internet-based platform to
developers who want to create services but don't want to
build their own cloud
• Infrastructure as a Service (IaaS)
̵ Cloud providers build datacenters
• Power, scale, hardware, networking, storage, distributed systems, etc
̵ Datacenter as a service
̵ Cloud users rent storage, computation, and maintenance from
cloud providers (pay-as-you-go; like utility)
Infrastructure of Mega Datacenters
Not us! We plan 3 + 1 datacenters
(3 PB + 6 PB + 12 PB + ? PB)
in 3 Czech cities. All will be housed
on Czech universities campuses in a
rebuild server rooms.
Knowledge & Data Intelligence as a Service
Cloud Computing = Software as a Service
+ Platform as a Service
+ Infrastructure as a Service
+ Data as a Service
Data Information Knowledge Intelligence
̵ Infrastructure for Web-scale data mining and knowledge
̵ Empower people with knowledge
̵ Empower applications and services with intelligence
• The real underlying value of “cloud + clients” is
that it transparently makes software, data, and
computing available everywhere
Czech National Storage Cloud and Data
Repository (
• Funding is provided by EU (85%) and Czech government
(15%). It totals about 24 million Euros for the total project
(16 mil for the 40+ Gb/s networking), (4 mil for the data
repository and storage cloud), and the rest for the small
computing grids and cloud computing system collocated
with the 3 main data repository sites (Pilsen 3+PB,
Pardubice near Prague 5 PB an Brno 9 PB)
• Design, investment, testing and realization phase is 20112013. The sustainability phase is till 2018. Then all project
hast to be self sufficient funded by Czech government
Access protocols
• Standard protocols
̵ All user required standard protocols will be supported
if possible
̵ CIF, SMB, NFS v4, WebDav, FTP, HTTP
• Non Standard protocols
̵ many user required special protocols will be supported
if possible depending on the project
̵ Open Source
• xrootd (LHC Cern), iRods middleware and others
Authentication and Security
• Community based authentication provided by CESNET and
Czech universities will be used when possible
• VPN tunnels will be used for less secure but standard
protocols like CIF, FTP or non secure HTTP
• We will research other means of authentication together
with either Czech Universities and academia or emerging
world wide authentication and security standard
• We will consider using encryption of transferred and
stored user data from the client computer
A typical site (1 of 3)
tier 0 (first site 0 PB)
tier 1 – fast FC or SAS disks, 15k, (first site 50 TB)
tier 2 – cheap SATA disks, 7.5k and 5 k (first site 400 TB)
tier 3 – FC tape robots (first site 5 LTO5 drives, 3+ PB)
dual dedicated DWDM 2 x 10 Gbit/sec (future 40, 100
• several front end servers
• HSM (ORACLE SUN SAM, GPFS Tivoli, DMF, etc.)
• All three sites will be connected by dual dedicated
network 2x10Gb/s. This will be upgraded to dual (or
more) 40 (or 100?) Gb/s
• All three sites will create one big virtual data repository
with possibility of remote replicas
• We prefer the replica concept to a classical backup
• All three sites will look like one big virtual data repository
• All usage will be free for academic and non profit users
• Data curation will be set to 7+ years (if funding model
works even possibly infinite)
• Catch all users go to CatchAll virtual organization
• Special user will negotiate special services and condition
• Special SLDs (Service Level Declarations)
• Open to international collaborations in EU and elsewhere
• [email protected]