Powerpoint - CANIS: Community Architectures for Network
Download
Report
Transcript Powerpoint - CANIS: Community Architectures for Network
The Evolution of the Net:
Predicting Global Infrastructure
Bruce R. Schatz
CANIS Laboratory
Graduate School of Library & Information Science
[email protected], www.canis.uiuc.edu
Department of Computer Science seminar
University of Illinois, February 14, 2005
Art of Physical Architecture
Art of Logical Architecture
The Evolution of the Net
Niels Bohr on Quantum Theory
“Prediction is very Difficult,
especially about the Future”
THE THIRD WAVE OF NET EVOLUTION
CONCEPTS
OBJECTS
PACKETS
Computer Science and Infrastructure
Transparent Federation across Sources
Generic Protocols for Global Infrastructure
Ultimate Goal is cyberspace visions of
“being one with all the world’s knowledge”
Computer Science and Infrastructure
1985
1995
2005
2015
Operating Systems
Database Management
Information Retrieval
Artificial Intelligence
caching
tagging
clustering
recognizing
Linguistics Levels and Universal Units
1985
1995
2005
2015
Syntax
Structure
Semantics
Pragmatics
Files (wholes)
Records (parts)
Concepts (meaning)
Features (reality)
Evolution of Information
Retrieval across
the Net
Evolution
of Information
Retrieval
Concept Search
Document Search
Text Search
Grand Visions
Syntax
Structure
Semantics
from: Bruce R. Schatz, “Information Retrieval in Digital Libraries: Bringing Search to the Net”
cover article in Science, vol 275, Jan 17, 1997 special issue on Bioinformatics
1985 Syntax Federation
Same Query into Multiple Sources
Results return Uniform Packages
Packets are for Bits, but Objects need more
Information Units are for Database Items
1985 Technology Environment
CMU Computer Science – Andrew
Apollo Domain – distributed file system
Xerox Star – multimedia document system
Bellcore Network Systems – Fibers
Telenet – International Packet Switches
Dialog – Bibliographic Text Searches
Telesophy Prototype
Distributed Documents
Distributed Collections
Multimedia Documents
Networked Hypertext
Document Browsing (links across sources)
Document Search
(texts across sources)
Telesophy Session
Telesophy Implementation
Bitmapped Workstation with Custom Software
$30K Apollo with 10Mb/s WAN
Windows via Brown [hypertext]
Objects via Xerox [Smalltalk]
Information Units and Data Items
300K Units across 20 sources
Bellcore R&D, $2.5M 1984-1988
Operating System Research
Browsing requires Caching across Internet
Raw bandwidth insufficient
200ms Ping versus 250ms Saccade
Lookahead Applications Specific Protocols
1987 Internet Research Task Force
1989 ARPANET 20th Anniversary
1990 Dissertation on Interactive Retrieval
1995 Structure Federation
Search using Parts of Documents
Transparent merge different Schema
Results return Complete Displays
Displayers invoked for all types
1995 Technology Environment
NCSA and the World-Wide Web
Mosaic – multimedia document browsing
HTTP – standard query protocol
University Library and Online Retrieval
Ovid – full-text journal searching
SGML – standard document protocol
DeLIver System
Full Distributed Documents
Full Displays with tables and equations
Distributed Collections from publishers
Single Federated Collection
Streamlined search using tag structure
Canonical tag schema with translation
DeLIver Session
DeLIver Implementation
Desktop PC plus Custom Software Integration
$5K IBM Personal Computer
Mosaic via NCSA [hypertext]
Displays via SoftQuad [viewers]
Custom DTD and SSL for tags and styles
100K articles for 3000 users
NSF DLI, $5M 1994-1998
Database Management Research
Metadata Extraction for Structure Federation
Raw schema insufficient
Different names and different types
Author tags in physics vs mathematics
1995 interactive databases using Mosaic
1997 Beat Elsevier using canonical tags
1999 production distributed XML federation
2005 Semantic Federation
Search using Concepts above Words
Extraction of Concepts from Documents
Statistical Index on Community Collections
Concept Navigation across Collections
2005 Technology Environment
Web Portals and statistical NLP
Google – statistical linked contexts
NLP – statistical generic parsers
Fast Processors and Big Disks
Gigaflops – Beowulfs and cluster computing
Terabytes – RAIDs and literature scaling
BeeSpace System
Fully Parsed Documents
Concepts and Entities auto generated
Distributed Collections from communities
Fully Related Concepts
Switching across Community Repositories
Automatic Links to Entity Databases
BeeSpace Session
BeeSpace Implementation
Commodity PC plus Custom Software
$1K Dell Personal Computer
$15K Server 1 Gflops 2 TBytes
Semantic Indexing generic scalable
Concept Extraction and Normalization
Concept Co-occurrence on Collections
50M articles across 50K repositories
Information Retrieval Research
Statistical Clustering Equivalent Phrases
Raw phrases insufficient
Phrase parsing with normalization
Entity recognition with normalization
1998 semantic indexing
(concepts from terms)
1999 information spaceflight
(categories from documents)
CONCEPT SPACES
from Objects to Concepts
from Syntax to Semantics
Infrastructure is Interaction with Abstraction
Internet is packet transmission across computers
Interspace is concept navigation across repositories
LEVELS OF INDEXES
Technology
Engineering
FORMAL
(manual)
Electrical
IEEE
communities
INFORMAL
groups
(automatic)
individuals
Technology Trends
IEEE Computer for January 2002
Information Infrastructure for Trends issue
Document Representation
Language Parsing
Statistical Indexing
Peer-Peer Networking
Vocabulary Switching
(Semantic Web)
(TIPSTER)
(TREC)
(SETI@home)
(UMLS)
SCALABLE SEMANTICS
Automatic indexing
Domain-Independent indexing
Statistical clustering
Compute Context of
concepts within documents
documents within repositories
COMPUTING CONCEPTS
‘92: 4,000 (molecular biology)
‘93: 40,000 (molecular biology)
‘95: 400,000 (electrical engineering)
‘96: 4,000,000 (engineering)
‘98: 40,000,000 (medicine)
1992
1993
1995
1996
1998
SIMULATING A NEW WORLD
Obtain discipline-scale collection
Partition discipline into Community Repositories
4 core terms per abstract for MeSH classification
32K nodes with core terms (classification tree)
Community is all abstracts classified by core term
MEDLINE from NLM, 10M bibliographic abstracts
human classification: Medical Subject Headings
40M abstracts containing 280M concepts
concept spaces took 2 days on NCSA Origin 2000
Simulating World of Medical Communities
10K repositories with > 1K abstracts
(1K w/ > 10K)
COMMUNITY PROCESSING
INTERSPACE NAVIGATION
Semantic Indexes for Community Repositories
Navigating Abstractions within Repository
concept space & category map
Interactive browsing by Community experts
*www.canis.uiuc.edu/interspace-prototype
Interspace Remote Access Client
Navigation in MEDSPACE
For a patient with Rheumatoid Arthritis
Find a drug that reduces the pain (analgesic)
but does not cause stomach (gastrointestinal) bleeding
Choose Domain
Concept Search
Concept Navigation
Retrieve Document
Navigate Document
Retrieve Document
Concept
Navigation
SWITCHING
In the Interspace…
each Community maintains its own repository
Switching is navigating Across repositories
use your vocabulary to search
another specialty
CONCEPT SWITCHING
“Concept” versus “Term”
set of “semantically” equivalent terms
Concept switching
region to region (set to set) match
Semantic region
term
Concept Space
Concept Space
Biomedical Session
Categories and Concepts
Concept Switching
Document Retrieval
THE NET OF THE 21st CENTURY
Beyond Objects to Concepts
Beyond Search to Analysis
Problem Solving via Cross-Correlating
Multimedia Information across the Net
Every community has its own special library
Every community does semantic indexing
The Interspace approximates Cyberspace
2015 Pragmatics Federation
Beyond Words and Concepts to Reality
Feature Vectors describing Situation
Each Individual has Vector (< Community)
Discrete Samples into Continuous Monitors
2015 Technology Environment
Continuous Vector Recording
Health Grid – personal lifestyle monitors
Peer-to-Peer – beyond Napster and Amazon
Individual User Modeling
Cohort Grouping – custom clustering
Adaptable Interfaces – multiple levels
Lifestyle Monitor System
Continuous Monitoring
Adaptive Questionnaires full-spectrum
Distributed Collections from individuals
Situational Analysis
Structured Vectors custom for Individuals
Population Cohorts for Decision Support
Lifestyle Monitor Questions
How good is your health?
What is your typical energy level?
Do you eat well-balanced foods?
How much do you eat?
Do you exercise for at least half an hour?
How often are you tired without exercising?
How much do you sleep a night?
Do you get enough sleep (to not be tired)?
How often are you in pain?
Do you feel happy with your life?
Can you lead a full life with your current health?
Can you deal adequately with all your problems?
Are you worried about things you cannot control?
Do you feel too tired to function properly?
Does time hang heavy on you in an average day?
Sample General Health Questions for User Modeling
Lifestyle Monitor Session
Artificial Intelligence Research
Structured Vectors Individual customized
Raw concepts insufficient
Adaptive Concepts for individual situations
Structured Vectors for cohort clustering
Situational Analysis infrastructure support
2007 Internet Health Monitors prototypes
2011 Population Health Monitors for
chronic illness regionally deployed
THE DISTRIBUTED WORLD
Community Repositories in the Interspace
Peer to Peer Networking Infrastructure
Every Person performs Every Role
USER
LIBRARIAN
INDEXER
PUBLISHER
AUTHOR
request
reference
classify
quality
generate
FEATURE VECTORS
from Concepts to Features
from Semantics to Pragmatics
Infrastructure is Interaction with Abstraction
Interspace is concept navigation across repositories
Intermind is feature comparison across individuals
Towards the Intermind
Beyond Concepts to Features
Beyond Analysis to Synthesis
Problem Solving via Cross-Correlating
Universal Knowledge across the Net
Every individual has its own special vector
Every viewpoint does semantic clustering
The Intermind is true Cyberspace
Today the Hive
Tomorrow the HiveMind