The Information Universe of the (Near) Future

Download Report

Transcript The Information Universe of the (Near) Future

The Information Universe
of the (Near) Future
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
Frank van Harmelen
Vrije Universiteit Amsterdam
and how to achieve this
The Information
Why it needs
Universe
What it
with
willthe
look like
ofinfinite
the (Near)
scalability
Future
Large Knowledge Collider
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
Frank van Harmelen
Vrije Universiteit Amsterdam
The Future
CurrentInformation
InformationUniverse
Universe
and another
web page
about
Frank
a web page
in English
about
Frank
And this
page is
about
Stefano
And this
page is
about
LarKC
?
This page
is about
the Vrije
Uniersitei
?
?
linked web-pages,
writtenof
by
people,
Many
linked
data,
these
pages
writtenby
for
people,
already
usable
come
computers!
from data,
used
bylink
people...
usable
useful
for
by
people!
computers!
But
weonly
can’t
the data....
?
?
How far away is this ?
Not very far away!
rapidly growing Linked Open Data cloud.
already many billions of facts & rules
It gets bigger every month
Full Web-style decoupling:
re-usability, independence
• All identifiers are URL's (= on the Web)
– Allows total decoupling of
• data
• vocabulary
• meta-data
[<x> IsOfType <T>]
x
T
different
owners & locations
<person>
For the first time ever,
it is now possible:
to re-use somebody else's knowledge base
• without having to talk to them first
(syntax, semantics)
• without having to make copies
Rapid growth: "billion triple challenge"
(= machine-reason with a billion facts and rules)
• 2006: “where do we get a billion facts from?”
• 2008: “which billion shall we choose!”
What to do when success
is becoming a problem?
The Large Knowledge Collider
a platform for infinitely scalable
reasoning on the data-web
Infinite scalability?
parallelisation
• cluster computing
distribution
• “Thinking@home”,
“self-computing semantic Web”
approximation
• “almost” is often good enough
• gets better with more resources
First result: MaRVIN
Node
Reasoning
Routing
InputPool
OutputPool
Node
Node
Data
Preparation
Node
Node
Node
statistics & visualisation
MaRVIN scales by:
•distribution (over many nodes)
•approximation (sound but incomplete)
•anytime convergence (more complete over time)
Result
Storage
Use case:
Drug
FDA white
paper Discovery
Innovation or Stagnation (March 2004):
•
“developers have no choice but to use the tools of the last century
Problem:
pharmaceutical
R&D
in early clinical
to assess
this century's candidate
solutions.”
development is stagnating
“industry scientists often lack cross-cutting information about an
entire product area, or information about techniques that may be
used in areas other than theirs”
“Show me any potential liver toxicity associated with the
compound’s drug class, target, structure and disease.”
(Q1Q2Q3)
Q1
Q3
Q2
Show me all liver toxicity “Show me all liver toxicity
associated with the target associated with compounds
with similar structure”
or the pathway.
Genetics
Chemistry
“Show me all liver toxicity
from the public literature and
internal reports that are related
to the drug class, disease and
patient population”
LITERATURE
Current NCBI: linking but no inference
Use Case: City on-line
• Our cities face many challenges
• Urban Computing
improve the quality of life
is the ICT way to
address them
Is public transportation where
theispeople
are?
• Where
the traffic
moving
Which
• Is public transportation where people are
landmarks
attract more
• Which location
attractspeople?
most people right now
• Is public transportation where people will be
Where are people concentrating?
Where is traffic moving?
Is anybody doing this for real?
• OpenCalais:
– enrich text (news items) with semantic meta-data
– recognise people, places, events, organisations,...
– useful for searching, selecting, personalising, aggregating,
summarising, etc
• From early ’09:
– identify “people, places, events, organisations,...”
by linking to the Open Data cloud:
And this
page is
about
LarKC
And this
page is
about
Stefano
Summarising
The Information Universe of the Future will be a
Web of Data
•
•
•
•
This Web of Data is rapidly taking shape
There are compelling use-cases
Industrial take-up is beginning to happen
We are building new infrastructure
to deal with required scale
Contact Info
Want to ask questions?
Want to play with LarKC?
Want to contribute plugins?
Want to run a use-case?
[email protected]
http://www.larkc.eu