A Risk Minimization Framework for Information Retrieval

Download Report

Transcript A Risk Minimization Framework for Information Retrieval

BeeSpace Informatics Research:
From Information Access to
Knowledge Discovery
ChengXiang Zhai
Nov. 7, 2007
BeeSpace Technology: From V3 to V4
Query Docs Genes Function
Search &
Navigation
Function
Analysis
Literature
Question Answers
Question Answers
ER Graph
Mining
Inference
Engine
Entities
Relations
Knowledge
Base
Expert
Knowledge
New Functions in V4
• Massive Entity/Relation Extraction
• Graph Indexing and Mining
• Integration of Expert Knowledge & Reasoning
• Personalization & Info/Knowledge Sharing
• “Plug and Play” (PnP)
Massive Entity Recognition
• Class1: Small Variation (Dictionary/Ontology)
– Organism, Anatomy , Biological Process, Pathway,
Protein Family
• Class2: Medium Variation
– Gene, cis Regulatory Element
• Class3: Large Variation
– Phenotype, Behavior
Massive Relation Extraction
•
Expression Location
–
•
•
•
the expression of a gene in some location (tissues, body
parts)
Homology/Orthology
–
one gene is homologous to another gene
Biological process
–
one gene has some role in a biological process
Genetic/Physical/Regulatory Interaction
–
–
one gene interacts with another gene in a certain fashion (3
types of relations)
a simple case: Protein-Protein Interaction (PPI)
Entity Relation Graph Mining
• The extracted entities and relations form a
weighted graph
• Need to develop techniques to mine the graph
for knowledge
– Store graphs
– Index graphs
– Mining algorithms (neighbor finding, path finding,
entity comparison, outlier detection, frequent
subgraphs,….)
– Mining language
Integration of Expert Knowledge
• How can we combine expert knowledge with
knowledge extracted from literature?
• Possible strategies:
– Interactive mining (human knowledge is used to
guide the next step of mining)
– Trainable programs (focused miner, targeting at
certain kind of knowledge)
– Inference-based integration
Inference-Based Discovery
•
•
•
Encode all kinds of knowledge in the same knowledge
representation language
Perform logic inferences
Example
– Regulate (GeneA, GeneB, ContextC). [Literature mining]
– SeqSimilar(GeneA,GeneA’) [Sequence mining]
– Regulate(X,Y,C) Regulate(Z,Y,C) & SeqSimilar(X,Z) [Human
knowledge]
– ? Regulate(GeneA’,GeneB,ContextC)
Personalization & Workflow Management
• Different users have different tasks 
personalization
– Tracking a user’s history and learning a user’s
preferences
– Exploiting the preferences to customize/optimize
the support
– Allowing a user to define/build special function
modules
• Workflow management
Information/Knowledge Sharing
• Different users may perform similar tasks 
Information/Knowledge sharing
– Capturing user intentions
– Recommend information/knowledge
– How do we solve the problem of privacy?
• Massive collaborations?
– Each user contributes a small amount of knowledge
– All the knowledge can be combined to infer new
knowledge
Plug and Play
• Users’ tasks vary significantly
• Need flexible combinations of basic modules
• Need to move toward a “discovery
workbench”
– How do we design basic modules?
– How do we support synthesis of information and
knowledge?
BeeSpace V4
User
Vertical
Search
Services
Search &
Navigation
User
Xin Xu, PnP Function
Analyzers, Peixiang, Bio
Text
Mining
Yue
Literature
User
Customized
Knowledge Base
ER Graph
Mining, Peixiang
Inference
Engine,
Yue, Xin, Bio
Entities
Relations
Knowledge
Base
Expert
Knowledge
Discussion
• Task Model?
• PnP Modules?
• Massive Collaboration?