Lars Arge - Department of Computer Science
Download
Report
Transcript Lars Arge - Department of Computer Science
Lars Arge
Lars Arge
1/12
Massive Data
Pervasive use of computers and sensors
Increased ability to acquire/store/process data
→ Massive data collected everywhere
Society increasingly “data driven”
→ Access/process data anywhere any time
Nature issue 2/06: “2020 – Future of computing”
Trend continues as e.g. sensors increasingly pervasive
→ Exponential growth of scientific data
Paradigm shift: Science will be about mining data
→ Computer science paramount in all sciences
Increased data availability: “nano-technology-like” opportunity
Lars Arge
2/12
Algorithm Inadequacy
Importance of scalability/efficiency
→ Algorithmics core computer science area
Traditional algorithmics:
Transform input to output using simple machine model
Inadequate with e.g.
Massive data
Small/diverse devices
Continually arriving data
→ Software inadequacies!
Communities addressing inadequacies have emerged
But much work still needs to be done
Lars Arge
3/12
Center for Massive Data Algorithmics
Established 2007
Initially funded for 5 years by the national research foundation
High level objectives:
Advance algorithmic knowledge in “massive data” processing area
Train researchers in world-leading international environment
Be catalyst for multidisciplinary/industry collaboration
Building on:
Strong international team
Establish vibrant international environment (focus on people)
Focus areas:
I/O-efficient, streaming, cache-oblivious algorithms
Algorithm engineering
Lars Arge
4/12
I/O-Efficient Algorithms
Problems involving massive data on disk
track
read/write head
read/write arm
magnetic surface
Disk access is 106 times slower than main memory access
“The
difference
in speed between
modern large
CPU and
diskof data
Large
access
time amortized
by transferring
blocks
technologies
is analogous
difference
in of
speed
in
→ Important
to store/access
datato
tothe
take
advantage
blocks
sharpening a pencil using a sharpener on one’s desk or by
taking an airplane to the other side of the world and using a
I/O-efficient
algorithms:
sharpener
on someone else’s desk.” (D. Comer)
Move as few disk blocks as possible to solve given problem
Lars Arge
5/12
I/O-Efficient Algorithms Matters
Example: Traversing linked list (List ranking)
Array size N = 10 elements
Disk block size B = 2 elements
Main memory size M = 4 elements (2 blocks)
1 5 2 6 3
8 9 4 7 10
1 2 10 9 5
Algorithm 1: N=10 I/Os
6 3 4 8 7
Algorithm 2: N/B=5 I/Os
Difference between N and N/B large since block size is large
Example: N = 256 x 106, B = 8000 , 1ms disk access time
N I/Os take 256 x 103 sec = 4266 min = 71 hr
N/B I/Os take 256/8 sec = 32 sec
Lars Arge
6/12
Streaming Algorithms
Problems involving truly massive data
track
read/write head
read/write arm
magnetic surface
Sequential read of disk blocks much faster than random read
In many modern (sensor) applications data arrive continually
→ (Massive) problems often have to be solved in one sequential scan
Streaming algorithms:
Use single scan, handle each element fast, using small space
Lars Arge
7/12
Cache-Oblivious Algorithms
Problems to be solved on unknown and/or changing devices
Block access important on all levels of memory hierarchy
But memory hierarchies are very diverse
Cache-oblivious algorithms:
Use blocks efficiently on all levels of any hierarchy
Lars Arge
8/12
Algorithm Engineering
Algorithm engineering:
Design/implementation of practical algorithms
Experimentation
Center motivated by theory inadequacy
Center wants to promote interdisciplinary/industry work
→ Natural to consider algorithm engineering
Algorithm engineering work can lead to practical breakthroughs
Example: Flow simulation on massive terrain models
~18 billion points at 1 meter (>>1TB)
Implementation of I/O-efficient alg.
→ two weeks to three hours!!
Lars Arge
9/12
Center Team
International core team of algorithms researchers
Including top ranked US and European groups
Arge
Brodal
Mehlhorn
Meyer
Demaine
Indyk
AU
MIT
MPI
Leading expertise in focus areas
AU: I/O, cache and algorithm engineering
MPI: I/O (graph) and algorithm engineering
MIT: Cache and streaming
Lars Arge
10/12
Center Activities
Visits of core researchers
Exchange of AU, MPI and MIT Post Docs and PhD students
Visiting faculty, Post doc and students from other institutions
Frequent summer schools: Streaming algorithms this week!
Major international events:
25th Annual Symposium on Computational Geometry in 2009
New conference: Symposium on Algorithms for Massive Datasets
Theme Workshops to foster multidisciplinary/industry collaboration
Lars Arge
11/12
Initially funded for 5 years by the National Research Foundation
Collaboration between three leading international groups
Addresses inadequacies of traditional theory
When processing massive data
When using diverse devices
Initial research focus on I/O-efficient, streaming, cache-oblivious
Also focus on algorithm engineering and multidisciplinary work
Focus on people: Establish vibrant international environment
www.madalgo.au.dk
Lars Arge
12/12