Efficient management of very large volumes of

Download Report

Transcript Efficient management of very large volumes of

Hemera KickOff
October 5th, 2010
Working Group B5
Efficient management of very large
volumes of information for dataintensive applications
Gabriel Antoniu, Jean-Marc Pierson
Challenges
• Tremendous volumes of data (up to
Petabytes), increasing every year
• Cloud infrastructures enforce this trend
• Large span of diverse applications
• Different modalities of data: images, text,
video, raw values
• Distributed, heterogeneous, structured or not,
semantically (en-)riched, confidential
• Stored in DFS or DDB, Cloud storage
services, Warehouses
Aim of the WG
• Explore research issues related to high-level services
for information management (search, mining,
visualisation, processing)
• For large volumes of distributed data
• Taking into account
– security, efficiency and heterogeneity
– applications requirements
– and the execution infrastructure (grids, clouds)
Issues to be addressed
• Low-level:
– Fault-tolerance, caching, transport, security
(encryption, confidentiality), consistency, location
transparency
• Intermediate-level:
– Interoperability among storage systems
– Data indexing
• High-level:
– Data mining, data classification, data assimilation,
knowledge extraction, data visualization
– Metadata management
Communities involved
• Distributed applications
• Distributed systems
– clusters, grids, P2P, clouds
•
•
•
•
Fault-tolerant systems
Databases, data mining
Security
Numerical algorithms
Roadmap
• Identify research teams
– Active in the area of the WG
– With experience in data-intensive
applications on Aladdin-G5K
– And new comers…
• Organize workshops and possibly
schools to share and disseminate
experience and knowledge