Click to add a title

Download Report

Transcript Click to add a title

Leiden Institute of Advanced
Computer Science
LIACS
Joost N. Kok, Wetenschappelijk Directeur
Leiden University. The university to discover.
Leiden University. The university to discover.
Leiden University. The university to discover.
LIACS
- Het Informatica Instituut van de
Universiteit Leiden
- Onderdeel Faculteit Wiskunde en
Natuurwetenschappen
Leiden University. The university to discover.
LIACS
- Management Team
- Scientific Director
- Director of Education
- Managing Director
-
Opleidingscommissie
Instituutsraad
Examencommissie
Raad van Toezicht
Leiden University. The university to discover.
Onderzoeksclusters van LIACS
-
Algorithms
Foundations of Software Technology
Computer Systems
Imagery and Media
Technology Innovation Management
Leiden University. The university to discover.
LIACS Research Clusters
Algorithms - prof.dr. Thomas Bäck & prof.dr.
Joost Kok
Computer Systems - prof.dr. Ed Deprettere &
prof.dr. Harry Wijshoff
Foundation of Software Technology - prof.dr.
Farhad Arbab & prof.dr. Joost Kok
Imaging - dr. Michael Lew & dr.ir. Fons Verbeek
Technology Innovation Management – prof. dr.
Bernhard Katzy
Leiden University. The university to discover.
Professors @ LIACS
Leiden University. The university to discover.
LIACS
Full Professors
Associate Professors
Assistant Professors
Postdocs
PhD students
Support staff
Leiden University. The university to discover.
Leiden University. The university to discover.
Taken
- 40% onderzoek – 40% onderwijs – 20%
management
- 80% onderwijs – 20% onderzoek
- 1e, 2e en 3e geldstroom
Leiden University. The university to discover.
Onderwijs
- Bachelor Informatica
- Master Computer Science
- Master Media Technology
- Master ICT in Businesss
Leiden University. The university to discover.
Master Degrees
 Three Masters
 Computer Science (including Bioinformatics
Track)
 Media Technology
 ICT in Business
 Two years
Leiden University. The university to discover.
PhD Education
 60 PhD students @ LIACS
Promovendi
Buiten Promovendi
Graduate School
Onderzoeksscholen
IPA
ACSI
SIKS
Leiden University. The university to discover.
Algorithms Cluster @ LIACS
Leiden University. The university to discover.
Natural computing
- Natural computing focuses on
computational methods gleaned from
natural models, such as evolutionary
computation, molecular computing, neural
computing, cellular automata, and swarm
intelligence.
Leiden University. The university to discover.
Natural Computing
- Computers are to Computer Science as
Comic Books to Literature
Leiden University. The university to discover.
Leiden University. The university to discover.
Evolutionary Algorithms
for Multi-Parameter Physics
- Evolutionary algorithms are applied to
problems in multi-parameter physics, such
as e.g. the control of femto-second lasers
to impact molecules in a desired way.
Leiden University. The university to discover.
Leiden University. The university to discover.
The Fourth Paradigm
- Data-Intensive Scientific Discovery
“One of the greatest challenges for
21st-century science is how we
respond to this new era of dataintensive science. This is recognized
as a new paradigm beyond
experimental and theoretical research
and computer simulations of natural
phenomena—one that requires new
tools, techniques, and ways of
working.”
— Douglas Kell, University of
Manchester
Leiden University. The university to discover.
Data Mining definitions
- Secondary analysis of data
- Induction of understandable useful models
and patterns from data
- Algorithms for large quantities of data
Leiden University. The university to discover.
- Data Mining is the non-trivial process of
identifying valid, novel, potentially
useful, and ultimately understandable
patterns in data
useful
novel, surprising
comprehensible
valid (accurate)
Leiden University. The university to discover.
Leiden University. The university to discover.
Typical Data Mining Results
-Forecasting what may happen in the future
-Classifying people or things into groups by
recognizing patterns
-Clustering people or things into groups
based on their attributes
-Associating what events are likely to occur
together
-Sequencing what events are likely to lead
to later events
Leiden University. The university to discover.
From “Querying” to “Mining”
Are there any occurrences of
GAAT in this string?
How many occurrences of AAT
are there in this string?
Which substrings of length 4
occur at least 2 times?
Which substrings (of any length) occur
significantly moreoften in the white
string than in the black string?
Why is the virus to the left resistant to
my drug, and the one to the right not?
Standard database
technology solves such
questions
Data mining technology
can sometimes solve such
questions (computations
may be (too) heavy)
Science fiction
Leiden University. The university to discover.
Subgroup Discovery
- How to find comprehensible subgroups
in large amounts of data?
- As an example: subtypes in complex
diseases.
- Different types of input.
Class A
Class B
2
1
3
Leiden University. The university to discover.
Leiden University. The university to discover.
Grand Challenges
- Lerende Auto’s
Leiden University. The university to discover.
Robosail
- Website
Leiden University. The university to discover.
Leiden University. The university to discover.
Leiden University. The university to discover.
Leiden University. The university to discover.
Intelligent Bridge (InfraWatch)
Leiden University. The university to discover.
Leiden University. The university to discover.
Leiden University. The university to discover.
Sensor network to monitor bridge behavi
Leiden University. The university to discover.
145 sensors, 100Hz: 5GB/day
Strain (longitudinal)
Vibration
Strain (transverse)
Temperature
Leiden University. The university
to discover.
What are we looking for?
and why?
•
•
•
•
Signs of decay (long)
Effects of traffic,
weather,... (medium)
Individual ‘events’
(short)
Relationships
between different
signals
• Plan maintenance
• What kind of forces is bridge
subjected to?
• How does bridge respond?
• How does one affect the other?
What info do we really need?
Fewer/different sensors?
Leiden University. The university to discover.
The Challenge
-
Sensor network: 145 sensors: 5GB/day,
8TB/4years
Disk read @50MB/s takes 2 days
We would like 2 minutes/seconds
-
Leiden University. The university to discover.
Sensor Viewer
- We constructed a “mediaplayer” to view
the data over time.
Leiden University. The university to discover.
Graph Mining
Internet Map
[lumeta.com]
Friendship Network
[Moody]
Hyves
Protein Interactions
[genomebiology.com]
Leiden University. The university to discover.
Graph Mining Tasks
- Object-Related
- Link-Based Object Ranking
- Link-Based Object Classification
- Object Clustering (Subgroup Detection)
- Object Identification (Entity Resolution)
- Link-Related
- Link Prediction
- Graph-Related
- Subgraph Discovery
- Graph Classification
- Generative Models for Graphs
Leiden University. The university to discover.
Visualisation
- Intelligent/Intelligible Data Analysis
- Intelligent = Methods
- Intelligent = Human Interaction
- Intelligible = Understandable
- First step:
- Visualisation of the data
Leiden University. The university to discover.
DNA Visualisation
- Long patterns over small alphabets are
hard to find …
-
ababababababababababababababababababababababa . . .
(ab)w
-
abbbababaaababbabbbababaaababbabbbababaaababb . . .
(abbbababaaababb)w
-
abaaaababbbbabaaaababbbbabaaaababbbbabaaaabab . . .
(abaaaa · babbbb)w
Leiden University. The university to discover.
Leiden University. The university to discover.
DNA Visualisation
- Associate each nucleotide A, C, T, G with
a dimension
- Four nucleotides => four dimensions
- Build a structure in four dimensions
- Project to three dimensions
Leiden University. The university to discover.
DNA Visualisation
- Expectation:
- A non-predictable walk for information
rich parts of the DNA
- A true random walk for random parts
- Lines (or approximate lines) for
repeating parts of the DNA
- Large identical substrings in the DNA
can easily detected
Leiden University. The university to discover.
DNA Visualisation
- Select four three-dimensional vectors.
- The vectors should be of comparable
length
- The four vectors should add up to 0
- Every subset of three vectors should be
independent.
Leiden University. The university to discover.
DNA Visualisation
Leiden University. The university to discover.
The first 160,000 nucleotides of the human Y-chromosome
Leiden University. The university to discover.
The first 160,000 nucleotides of the human Y-chromosome
Leiden University. The university to discover.
Leiden University. The university to discover.
40,000–100,000 of the chromosome 1 (human)
Leiden University. The university to discover.
Algorithms Cluster @ LIACS
Leiden University. The university to discover.
Leiden University. The university to discover.