Powerpoint - CANIS: Community Architectures for Network

Download Report

Transcript Powerpoint - CANIS: Community Architectures for Network

The Evolution of the Net:
Predicting Global Infrastructure
Bruce R. Schatz
CANIS Laboratory
Graduate School of Library & Information Science
[email protected], www.canis.uiuc.edu
Department of Computer Science seminar
University of Illinois, February 14, 2005
Art of Physical Architecture
Art of Logical Architecture
The Evolution of the Net
Niels Bohr on Quantum Theory
“Prediction is very Difficult,
especially about the Future”
THE THIRD WAVE OF NET EVOLUTION
CONCEPTS
OBJECTS
PACKETS
Computer Science and Infrastructure


Transparent Federation across Sources
Generic Protocols for Global Infrastructure
Ultimate Goal is cyberspace visions of
“being one with all the world’s knowledge”
Computer Science and Infrastructure




1985
1995
2005
2015
Operating Systems
Database Management
Information Retrieval
Artificial Intelligence
caching
tagging
clustering
recognizing
Linguistics Levels and Universal Units




1985
1995
2005
2015
Syntax
Structure
Semantics
Pragmatics
Files (wholes)
Records (parts)
Concepts (meaning)
Features (reality)
Evolution of Information
Retrieval across
the Net
Evolution
of Information
Retrieval
Concept Search
Document Search
Text Search
Grand Visions
Syntax
Structure
Semantics
from: Bruce R. Schatz, “Information Retrieval in Digital Libraries: Bringing Search to the Net”
cover article in Science, vol 275, Jan 17, 1997 special issue on Bioinformatics
1985 Syntax Federation




Same Query into Multiple Sources
Results return Uniform Packages
Packets are for Bits, but Objects need more
Information Units are for Database Items
1985 Technology Environment
CMU Computer Science – Andrew
 Apollo Domain – distributed file system
 Xerox Star – multimedia document system
Bellcore Network Systems – Fibers
 Telenet – International Packet Switches
 Dialog – Bibliographic Text Searches
Telesophy Prototype
Distributed Documents
 Distributed Collections
 Multimedia Documents
Networked Hypertext
 Document Browsing (links across sources)
 Document Search
(texts across sources)
Telesophy Session
Telesophy Implementation
Bitmapped Workstation with Custom Software
 $30K Apollo with 10Mb/s WAN
 Windows via Brown [hypertext]
 Objects via Xerox [Smalltalk]
 Information Units and Data Items
 300K Units across 20 sources
 Bellcore R&D, $2.5M 1984-1988
Operating System Research
Browsing requires Caching across Internet
 Raw bandwidth insufficient
 200ms Ping versus 250ms Saccade
 Lookahead Applications Specific Protocols
 1987 Internet Research Task Force
 1989 ARPANET 20th Anniversary
 1990 Dissertation on Interactive Retrieval
1995 Structure Federation




Search using Parts of Documents
Transparent merge different Schema
Results return Complete Displays
Displayers invoked for all types
1995 Technology Environment
NCSA and the World-Wide Web
 Mosaic – multimedia document browsing
 HTTP – standard query protocol
University Library and Online Retrieval
 Ovid – full-text journal searching
 SGML – standard document protocol
DeLIver System
Full Distributed Documents
 Full Displays with tables and equations
 Distributed Collections from publishers
Single Federated Collection
 Streamlined search using tag structure
 Canonical tag schema with translation
DeLIver Session
DeLIver Implementation
Desktop PC plus Custom Software Integration
 $5K IBM Personal Computer
 Mosaic via NCSA [hypertext]
 Displays via SoftQuad [viewers]
 Custom DTD and SSL for tags and styles
 100K articles for 3000 users
 NSF DLI, $5M 1994-1998
Database Management Research
Metadata Extraction for Structure Federation
 Raw schema insufficient
 Different names and different types
 Author tags in physics vs mathematics
 1995 interactive databases using Mosaic
 1997 Beat Elsevier using canonical tags
 1999 production distributed XML federation
2005 Semantic Federation




Search using Concepts above Words
Extraction of Concepts from Documents
Statistical Index on Community Collections
Concept Navigation across Collections
2005 Technology Environment
Web Portals and statistical NLP
 Google – statistical linked contexts
 NLP – statistical generic parsers
Fast Processors and Big Disks
 Gigaflops – Beowulfs and cluster computing
 Terabytes – RAIDs and literature scaling
BeeSpace System
Fully Parsed Documents
 Concepts and Entities auto generated
 Distributed Collections from communities
Fully Related Concepts
 Switching across Community Repositories
 Automatic Links to Entity Databases
BeeSpace Session
BeeSpace Implementation
Commodity PC plus Custom Software
 $1K Dell Personal Computer
 $15K Server 1 Gflops 2 TBytes
 Semantic Indexing generic scalable
 Concept Extraction and Normalization
 Concept Co-occurrence on Collections
 50M articles across 50K repositories
Information Retrieval Research
Statistical Clustering Equivalent Phrases
 Raw phrases insufficient
 Phrase parsing with normalization
 Entity recognition with normalization
 1998 semantic indexing
(concepts from terms)
 1999 information spaceflight
(categories from documents)
CONCEPT SPACES

from Objects to Concepts

from Syntax to Semantics

Infrastructure is Interaction with Abstraction
Internet is packet transmission across computers
Interspace is concept navigation across repositories
LEVELS OF INDEXES
Technology
Engineering
FORMAL
(manual)
Electrical
IEEE
communities
INFORMAL
groups
(automatic)
individuals
Technology Trends
IEEE Computer for January 2002
Information Infrastructure for Trends issue





Document Representation
Language Parsing
Statistical Indexing
Peer-Peer Networking
Vocabulary Switching
(Semantic Web)
(TIPSTER)
(TREC)
(SETI@home)
(UMLS)
SCALABLE SEMANTICS

Automatic indexing
Domain-Independent indexing
Statistical clustering

Compute Context of




concepts within documents
documents within repositories
COMPUTING CONCEPTS
‘92: 4,000 (molecular biology)
‘93: 40,000 (molecular biology)
‘95: 400,000 (electrical engineering)
‘96: 4,000,000 (engineering)
‘98: 40,000,000 (medicine)
1992
1993
1995
1996
1998
SIMULATING A NEW WORLD

Obtain discipline-scale collection



Partition discipline into Community Repositories



4 core terms per abstract for MeSH classification
32K nodes with core terms (classification tree)
Community is all abstracts classified by core term



MEDLINE from NLM, 10M bibliographic abstracts
human classification: Medical Subject Headings
40M abstracts containing 280M concepts
concept spaces took 2 days on NCSA Origin 2000
Simulating World of Medical Communities

10K repositories with > 1K abstracts
(1K w/ > 10K)
COMMUNITY PROCESSING
INTERSPACE NAVIGATION

Semantic Indexes for Community Repositories

Navigating Abstractions within Repository


concept space & category map
Interactive browsing by Community experts
*www.canis.uiuc.edu/interspace-prototype
Interspace Remote Access Client
Navigation in MEDSPACE
For a patient with Rheumatoid Arthritis


Find a drug that reduces the pain (analgesic)
but does not cause stomach (gastrointestinal) bleeding
Choose Domain
Concept Search
Concept Navigation
Retrieve Document
Navigate Document
Retrieve Document
Concept
Navigation
SWITCHING
In the Interspace…

each Community maintains its own repository

Switching is navigating Across repositories

use your vocabulary to search
another specialty
CONCEPT SWITCHING

“Concept” versus “Term”


set of “semantically” equivalent terms
Concept switching

region to region (set to set) match
Semantic region
term
Concept Space
Concept Space
Biomedical Session
Categories and Concepts
Concept Switching
Document Retrieval
THE NET OF THE 21st CENTURY



Beyond Objects to Concepts
Beyond Search to Analysis
Problem Solving via Cross-Correlating
Multimedia Information across the Net

Every community has its own special library
Every community does semantic indexing

The Interspace approximates Cyberspace

2015 Pragmatics Federation




Beyond Words and Concepts to Reality
Feature Vectors describing Situation
Each Individual has Vector (< Community)
Discrete Samples into Continuous Monitors
2015 Technology Environment
Continuous Vector Recording
 Health Grid – personal lifestyle monitors
 Peer-to-Peer – beyond Napster and Amazon
Individual User Modeling
 Cohort Grouping – custom clustering
 Adaptable Interfaces – multiple levels
Lifestyle Monitor System
Continuous Monitoring
 Adaptive Questionnaires full-spectrum
 Distributed Collections from individuals
Situational Analysis
 Structured Vectors custom for Individuals
 Population Cohorts for Decision Support
Lifestyle Monitor Questions
How good is your health?
What is your typical energy level?
Do you eat well-balanced foods?
How much do you eat?
Do you exercise for at least half an hour?
How often are you tired without exercising?
How much do you sleep a night?
Do you get enough sleep (to not be tired)?
How often are you in pain?
Do you feel happy with your life?
Can you lead a full life with your current health?
Can you deal adequately with all your problems?
Are you worried about things you cannot control?
Do you feel too tired to function properly?
Does time hang heavy on you in an average day?
Sample General Health Questions for User Modeling
Lifestyle Monitor Session
Artificial Intelligence Research
Structured Vectors Individual customized
 Raw concepts insufficient
 Adaptive Concepts for individual situations
 Structured Vectors for cohort clustering
 Situational Analysis infrastructure support
 2007 Internet Health Monitors prototypes
 2011 Population Health Monitors for
chronic illness regionally deployed
THE DISTRIBUTED WORLD



Community Repositories in the Interspace
Peer to Peer Networking Infrastructure
Every Person performs Every Role
USER
LIBRARIAN
INDEXER
PUBLISHER
AUTHOR
request
reference
classify
quality
generate
FEATURE VECTORS

from Concepts to Features

from Semantics to Pragmatics

Infrastructure is Interaction with Abstraction
Interspace is concept navigation across repositories
Intermind is feature comparison across individuals
Towards the Intermind



Beyond Concepts to Features
Beyond Analysis to Synthesis
Problem Solving via Cross-Correlating
Universal Knowledge across the Net

Every individual has its own special vector
Every viewpoint does semantic clustering

The Intermind is true Cyberspace

Today the Hive
Tomorrow the HiveMind