The Cooperative Web A Step towards Web Intelligence

Download Report

Transcript The Cooperative Web A Step towards Web Intelligence

The Cooperative Web
A Step towards Web Intelligence
Daniel Gayo Avello
University of Oviedo
Web Intelligence?
• Multidisciplinary effort
– Artificial Intelligence
– Information Retrieval
– Software Agents
– ...
• Early stages
• Goal  The Wisdom Web
– New web.
– More useful.
– Truly “intelligent”
The Semantic Web (in a nutshell)
• Standardized conventions (ontologies)
– objects
– attributes
– relations
• Semantic tags
– Document authors  mark up
– Software agents  (basic) reasoning
So...
• Semantic Web ~ Web Intelligence
Approach
• Cooperative Web ~ Web Intelligence
Approach
Is the Cooperative Web
just-another-proposal?
• Not really...
• Semantic Web
– beginning...
– human made (ontologies - at this moment)
– time to reach the whole Web (5-10 years?)
• “I know what I want and I want it now!”
• The Web ~ Legacy System
• Something...
–
–
–
–
fully automatic
simple
built on top of the current web (legacy)
between the current web (legacy) and The Wisdom Web (future)
• ...wouldn’t be nice?
Cooperative Web proposal
(in a nutshell)
• Simple, cheap, automatic
• Intermediate: Web  ¿?  Wisdom Web
• “Squeeze out” the current Web a little more...
• Main ideas:
– Concept extraction
– Automatic document taxonomies
– Computational biology
Concepts
• Let’s study these samples...
...Betelgeuse, a red supergiant star about 600 light years distant,
is seen in this Hubble Space Telescope image - the first direct
picture of the surface of a star other than the Sun...
...Designer Jim Wallace, who is developing the PlayStation 2 fighting
title "Rise to Honor" with martial-arts star Jet Li, said celebrity
involvement boosts the reputation of gaming in general...
...His reputation as one of America's greatest actors secured,
Hoffman proceeded to star in a series of films that disappointed
at the box office...
...The actor Arnold Schwarzenegger has signed for a record-setting
$30 million to star in "Terminator 3"...
Concepts
• They’re results from the Google query star...
...Betelgeuse, a red supergiant star about 600 light years distant,
is seen in this Hubble Space Telescope image - the first direct
picture of the surface of a star other than the Sun...
...Designer Jim Wallace, who is developing the PlayStation 2 fighting
title "Rise to Honor" with martial-arts star Jet Li, said celebrity
involvement boosts the reputation of gaming in general...
...His reputation as one of America's greatest actors secured,
Hoffman proceeded to star in a series of films that disappointed
at the box office...
...The actor Arnold Schwarzenegger has signed for a record-setting
$30 million to star in "Terminator 3"...
Concepts
• But they talk about different kinds of
“stars”...
...Betelgeuse, a red supergiant star about 600 light years distant,
is seen in this Hubble Space Telescope image - the first direct
picture of the surface of a star other than the Sun...
...Designer Jim Wallace, who is developing the PlayStation 2 fighting
title "Rise to Honor" with martial-arts star Jet Li, said celebrity
involvement boosts the reputation of gaming in general...
...His reputation as one of America's greatest actors secured,
Hoffman proceeded to star in a series of films that disappointed
at the box office...
...The actor Arnold Schwarzenegger has signed for a record-setting
$30 million to star in "Terminator 3"...
Concepts
• From those (and other) documents we could
extract something like these “word bags”...
0:{red supergiant, star, Sun, ...}
1:{actor, actors, celebrity, films, star, ...}
• Plenty of techniques to obtain these “word
bags” or “concepts”, for instance:
– Latent Semantics (Foltz, 1990)
– Concept Indexing (Karypis and Han, 2000)
Conceptual related documents
• Documents shown before...
...Betelgeuse, a red supergiant star about 600 light years distant, is seen in this
Hubble Space Telescope image - the first direct picture of the surface of a
star other than the Sun...
...Designer Jim Wallace, who is developing the PlayStation 2 fighting title "Rise
to Honor" with martial-arts star Jet Li, said celebrity involvement boosts the
reputation of gaming in general...
...His reputation as one of America's greatest actors secured, Hoffman
proceeded to star in a series of films that disappointed at the box office...
...The actor Arnold Schwarzenegger has signed for a record-setting $30 million
to star in "Terminator 3"...
Conceptual related documents
• Could be transformed in something like this...
...Betelgeuse, a red supergiant star about 600 light years distant, is seen in this
Hubble Space Telescope image - the first direct picture of the surface of a
star other than the Sun...
...Designer Jim Wallace, who is developing the PlayStation 2 fighting title "Rise
to Honor" with martial-arts star Jet Li, said celebrity involvement boosts the
reputation of gaming in general...
...His reputation as one of America's greatest actors secured, Hoffman
proceeded to star in a series of films that disappointed at the box office...
...The actor Arnold Schwarzenegger has signed for a record-setting $30 million
to star in "Terminator 3"...
• by dropping the “stop words”...
Conceptual related documents
• And then into this...
?00???????00
????????1?1????
??1??11??
1???1?
• Last three documents are closely related while the
first one has nothing to do...
Text strings...
• This way of representing free text...
?00???????00
????????1?1????
??1??11??
1???1?
• ...could be well-suited to determine the distance between
documents.
• Let’s see a simpler technique to get the distance between text
strings...
Text strings...
• Three simple strings:
– BENJI
– DANI
– HENRY
• How closely are they related?
• Let’s define a distance between two strings as
the number of letters to delete +
the number of letters to change +
the number of letters to insert...
• ...to transform one string into the another.
Text strings...
• Distance between BENJI and DANI: 3
BENJI  DENJI (1), DENJI  DANJI (2), DANJI  DANI (3)
• Distance between DANI and HENRY: 4
DANI  HANI (1), HANI  HENI (2), HENI  HENRI (3), HENRI  HENRY (4)
• Distance between BENJI and HENRY: 3
BENJI  HENJI (1), HENJI  HENRI (2), HENRI  HENRY (3)
• This is known as Levenshtein distance and will allow
us to better understand next step...
Someone’s in the kitchen with DNA
• DNA highly complex molecule made from only 4 different
kinds of components:
– Adenine - A
– Cytosine - C
– Guanine - G
– Thymine - T
• So, DNA molecules ~ simple (but huge) text strings
– CCAAGGA...
– CCAAGGAAACTCACTA...
– GATTACA...
Someone’s in the kitchen with DNA
• If DNA ~ text string then
distances between two or
more strings can be easily
computed...
(Ursing and Arnason, 1998)
What if...
Could be possible to adapt
computational biology algorithms to
distill semantics from the web in an
automatic fashion?
Cooperative Web architecture

?
User




Browsing
history
Software agent
Document
taxonomy
So, the Cooperative Web would be...
A layer over the Web
to provide semantics
in an automatic fashion
“inspired” by computational biology
Work in progress...
• Cooperative Web is just a proposal
(at this moment)
• Some prototypes soon (I hope...)
The Cooperative Web
A Step towards Web Intelligence
Thank you!
Any question?
References
•
Foltz, P.W. (1990), "Using Latent Semantic Indexing for Information Filtering", Proceedings of
the ACM Conference on Office Information Systems, Boston, EE.UU., pp. 40-47.
•
Karypis, G., and Han, E. (2000), "Concept indexing: A fast dimensionality reduction algorithm
with applications to document retrieval and categorization", Technical Report TR-00-0016,
University of Minnesota.
•
Ursing, B.M., and Arnason, U. (1998), "Analyses of mitochondrial genomes strongly support a
hippopotamus-whale clade", Proceedings of the Royal Society of London. Series B, Biological
Sciences, 265:2251-2255.