IJCAI-20050731 - Kansas State University

Download Report

Transcript IJCAI-20050731 - Kansas State University

Relational Graphical Models for
Collaborative Filtering and Recommendation
William H. Hsu
Department of Computing and Information Sciences
Kansas State University
http://www.kddresearch.org
Sunday, 31 July 2005
Multi-Agent Learning
from Portal User Data
Relational
Representation
Collaborative Recommendation,
Information Retrieval & Extraction
IJCAI-2005 Workshop W20, Multi-Agent Information Retrieval
This presentation is: http://www.kddresearch.org/KSU/CIS/IJCAI-20050731.ppt
Joint work with: Jeffrey M. Barber, Haipeng Guo, Andrew L. King, Julie A. Thornton
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Outline
• Application: Workflow Modeling in Bioinformatics
– Collaborative recommendation (CR)
– Shallow CR: market basket analysis for cross-selling
– Domain: gene expression modeling, proteomics, metabolomics
• Methodology: Relational Graphical Models (RGMs)
– Workflow basics
– DESCRIBER project: using RGMs for CR and info retrieval (IR)
– Input, desired output, application, methodology, criteria
• Link Analysis Applications
– Finding dynamic relational attributes
– Identity uncertainty in spatial data cleaning
• Software for Building Graphical Models: BNJ
• Infrastructure and Preliminary Experiments
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
“Classical” Collaborative Recommendation:
Clickstream Mining
Explanation from
Recommender
(Decision Support)
System
Classification and
Regression based upon
Historical Customer Data
(Market Basket Analysis)
© 2003 Amazon.com, Inc.
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Shallow Collaborative Recommendation:
Market Basket Analysis for Cross-Selling
Cross-Selling
based upon Market Basket
Analysis – Apriori (Agrawal, 1993)
Basis for
Collaborative
Recommendation
© 2002 Amazon.com, Inc.
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Application to Computational Grid Portal:
DESCRIBER Design
Users of Information
Grid & Scientific
Workflow Repository
Example Queries:
• What experiments have found cell
cycle-regulated metabolic pathways
in Saccharomyces?
• What codes and microarray data
were used? How and why?
Data Entity, Service, and Component Repository Index for Bioinformatics Experimental Research
Personalized Interface
User Queries & Evaluations
Domain-Specific
Collaborative
Recommendation
Learning
over Workflow Instances
and Use Cases
(Historical
Decision Support
User Requirements)
Models
Use Case &
Query/Evaluation Data
Interface(s) to Distributed Repository
Domain-Specific Workflow Repositories
Workflows
Transactional, Objective Views
Workflow Components
Data Sources, Transformations; Other Services
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Computational Genomics and
Microarray Gene Expression Modeling
[A] Structure
Learning
G2
D: Data (User, Microarray)
G1
G4
G3
G5
G = (V, E)
[B] Parameter
Estimation
G2
Treatment 1
(Control)
Treatment 2
(Pathogen)
Learning
Environment
Messenger RNA cDNA
(mRNA) Extract 1
G1
Dval (Model Validation by Inference)
G4
G5
G3
B = (V, E, )
Specification Fitness
(Inferential Loss)
Messenger RNA
(mRNA) Extract 2 cDNA
DNA Hybridization Microarray
(under LASER)
Nir’s Invited Talk at IJCAI:
Wednesday, 0900 GMT 03 Aug 2005
Adapted from Friedman et al. (2000)
http://www.cs.huji.ac.il/labs/compbio/
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Bioinformatics: Data Mining
from DNA Hybridization Microarrays
How do we get from
microarray data (and
other expression data) to
a linked network?
© G. Simpson (1999)
Used with permission
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Outline
• Application: Workflow Modeling in Bioinformatics
– Collaborative recommendation (CR)
– Shallow CR: market basket analysis for cross-selling
– Domain: gene expression modeling, proteomics, metabolomics
• Methodology: Relational Graphical Models (RGMs)
– Workflow basics
– DESCRIBER project: using RGMs for CR and info retrieval (IR)
– Input, desired output, application, methodology, criteria
• Link Analysis Applications
– Finding dynamic relational attributes
– Identity uncertainty in spatial data cleaning
• Software for Building Graphical Models: BNJ
• Infrastructure and Preliminary Experiments
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Finding Dynamic Relational Attributes:
From Workflows to Class Diagrams
Transactional View (cf. UML Sequence Diagram)
Objective View (cf. UML Class Diagram)
cDNA
MicroarrayExperiment
Gene
Protein
DNA-sequence
protein-ID
cDNA-sequence
protein-product
canonicalname
role
treatment
pathway
hybridization
pathway
accession-number
data
functionaldescription
normalization
Pathway
pathway-ID
regulation
pathway-name
TAVERNA Workbench
myGrid Project
© 2003 Oinn et al.
Kansas State University KDD Lab (www.kddresearch.org)
Relational Link
(Reference
Key)
Probabilistic
Dependency
pathwaydescriptor
DESCRIBER example schema
© 2003 Hsu
Kansas State University
Department of Computing and Information Sciences
DESCRIBER:
Preliminary Overview of System
Workflow
Logs,
Instances,
Templates,
Components
(Services, Data
Sources)
Structure &
Data
Module 2
Training Data
Learning & Validation RGMs of
of Relational Graphical Workflows
Models (RGMs) for
Experimental
Workflows and
Components
Personalized Interface
Recommendations/Evaluations
(Before and After Use)
Module 1
Collaborative
Recommendation
Front-End
Training
Data
User
Queries
Module 4
Learning &
Validation of
RGMs
for User
Requirements
Module 3
Estimation of
RGM Parameters
from Workflow and
Component
Database
Structure
& Data
RGMs of
Queries
Module 5
RGM
Parameters
from User
Query Data
Complete RGMs of User Queries
Complete RGMs of Workflows (Data-Oriented)
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Workflow Management [1]:
Input and Representation
• Input: Implemented Workflows
– Workflow: operational aspect of work procedure
•
•
•
•
Data sources: relational databases, object stores (what)
Structure of tasks (what/how)
Operations: structured queries, data transformations (how)
Agents to perform tasks: web services/enactment history
(who/where)
– Examples
• Desktop: TIGR TM4 (gene expression data analysis suite)
• Intranet: groupware (e.g., business process management,
ORACLE Workflow, IBM WebSphere MQ Workflow)
• Online: Computational science (grid) portals
• Representation
– SCUFL (Stevens, 2002): language (DAML+OIL, now OWL)
– TAVERNA (Oinn, 2003): editor
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Workflow Management [2]:
Problem Specification: Output, Criteria
• Output
– Relational abstraction over workflow classes
– Underlying graphical models representing workflow instances
• Goals
– Personalize UI
– Assist in retrieval, development and repurposing
• Workflows and components
• Decrease time, maintain quality
• Criteria
– The hard part!
– Classical evaluation measures: accuracy, precision vs. recall,
likelihood – “just a start” (Langley, 2000)
– Utility measures: user ratings, performance
– User modeling: usability, accessibility of grid portal
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Methodology [1]:
from Collaborative Recommendation to IR
• Applications to Information Retrieval
– Development of new workflows
– Repurposing of prefabricated workflows
– Personalization of interfaces
• What is Collaborative?
– Filtering of workflow components by usage
– Recommendation via ratings: EachMovie (McJones et al., 1997),
Jester (Goldberg et al., 2001), MovieLens (Miller et al., 2003)
• Multi-Agent Aspects
– Brokered services (W3C’s Simple Object Access Protocol v1.2)
http://www.w3.org/TR/soap/
– Modeling context of data transformations, services, clients
– Heterogeneous data at multiple levels of abstraction
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Methodology [2]:
Relational Models for Multi-Agent IR
• Probabilistic Inference and Representation
– Probabilistic Relational Models (Friedman et al., 1999)
– Single instancs extracted from TAVERNA editor
– Workflow abstractions: dropping enactment information
– Schemata: relational skeletons, link/reference slot uncertainty
• Applied Machine Learning
– General problem: knowledge acquisition and capture
– Schemata: designed with grid portal builder
– Distributions learned from data: link, reference slot
– Clusters: workflows, components, users
– Relations from clusters to one another
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Emergent Relational Structure
• “Google Approach”
– Hubs/authorities (Brin & Page 1998, Kleinberg 1998)
– Using existing structure: Netscape Open Directory Project (ODP)
– Minimal annotation: meta tags (keywords, description)
• “CiteSeer/ResearchIndex Approach”
– Citation indexing (Lawrence et al., 1998, Giles et al., 2002)
– Web of influence (Koller, 2001)
• Where is The Relational Structure?
– “Does inherent relational structure exist?” (Russell, SRL-2003)
– Sources of rich info: “link structure”
– Richer sources? Procedural context and beyond!
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Outline
• Application: Workflow Modeling in Bioinformatics
– Collaborative recommendation (CR)
– Shallow CR: market basket analysis for cross-selling
– Domain: gene expression modeling, proteomics, metabolomics
• Methodology: Relational Graphical Models (RGMs)
– Workflow basics
– DESCRIBER project: using RGMs for CR and info retrieval (IR)
– Input, desired output, application, methodology, criteria
• Link Analysis Applications
– Finding dynamic relational attributes
– Identity uncertainty in spatial data cleaning
• Software for Building Graphical Models: BNJ
• Infrastructure and Preliminary Experiments
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Identity Uncertainty
• How to Tell When Two Descriptors Refer to Same Entity?
• Problem
– Coalesced databases
– Multiple sources
• Errors and Inconsistencies
– Spatial, temporal error
– Inconsistent descriptors
• Clues
– Proximity in space, time
– Similarities in values of key variables (attributes, features)
• Applications
– Fraud detection and information security (intrusion detection)
– Data cleaning
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Spatial Data Cleaning:
STARWARD
Groundwater irrigation lifetime
estimates in the Ogallala
region of the Kansas High
Plains aquifer.
[Wilson et al. 2002]
http://snurl.com/39kz
Darkest: already depleted
Next darkest: 25-50 years
Problems
Water well location (identity
uncertainty in coalesced
spatial databases),
descriptive statistics
(paraconsistency), spatial
outlier detection
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Outline
• Application: Workflow Modeling in Bioinformatics
– Collaborative recommendation (CR)
– Shallow CR: market basket analysis for cross-selling
– Domain: gene expression modeling, proteomics, metabolomics
• Methodology: Relational Graphical Models (RGMs)
– Workflow basics
– DESCRIBER project: using RGMs for CR and info retrieval (IR)
– Input, desired output, application, methodology, criteria
• Link Analysis Applications
– Finding dynamic relational attributes
– Identity uncertainty in spatial data cleaning
• Software for Building Graphical Models: BNJ
• Infrastructure and Preliminary Experiments
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
BNJ Graphical User Interface [1]:
Editor
© 2005 KSU Bayesian Network tools in Java (BNJ)
Development Team
Kansas State University KDD Lab (www.kddresearch.org)
ALARM Network
Kansas State University
Department of Computing and Information Sciences
BNJ Graphical User Interface [2]:
Graph Visualization and Algorithm Animation
© 2004 KSU Bayesian Network tools in Java (BNJ)
Development Team
Kansas State University KDD Lab (www.kddresearch.org)
CPCS-54 Network
Kansas State University
Department of Computing and Information Sciences
Genetic Algorithm for BN Structure Learning
Results: ALARM-13
Inferential RMSE for Forward Simulation
0.25
Gold
Standard
Network
RMSE
0.2
0.15
K2 Output
on Optimal
Ordering
0.1
0.05
K2 Output
on GA
Ordering
0
1
2693 5385 8077 10769 13461
K2: 20K FS: 1500
Samples
(Hsu, Guo, Perry & Stilson, 2002)
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Software Packages for Building
Graphical Models: BNJ, etc.
• Commercial Tools: Ergo, Netica, TETRAD, Hugin
• Open Source Tools: BNT (Murphy, 2001), gR (Lauritzen et al., 2002)
• Bayesian Network tools in Java (BNJ) – Hsu et al. (2002-present)
– Distribution page
http://bnj.sourceforge.net
– Development group
http://groups.yahoo.com/group/bndev
– Current (re)implementation projects for KSU KDD Lab
• Structure learning and parameter estimation – Hsu, Barber
• Fast Adaptive Importance Sampling, other sampling – King, Guo
• Statistical Machine Translation / Information Extraction (IE) toolkit –
Al-Jandal, Meyer, Pydimarri
• Continuous time representations – Barber, Hsu
• Formats: XML BNIF (MSBN), Netica – Guo, Barber, Hsu
• Space-efficient DBN inference – Hsu, Barber
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Acknowledgements
•
Kansas State University Lab for Knowledge Discovery in Databases
– Alumni: Guo (HKUST), Perry (Delaware), Thornton (Kansas State)
– Graduate students: Ph.D. – Al-Jandal, Li; M.S. – Barber (Math), Meyer, Pydimarri
– Undergraduate programmers: King (CIS); Bell, Figueroa (2005 summer interns)
•
Joint Work with
– KSU Bioinformatics Group (EECE: Das; Agronomy: Welch, Roe; Weather: Knapp)
– NSF FIBR (Brown: Schmitt; NCSU: Purugganan; Wisconsin: Amasino)
www.egad.ksu.edu
•
Thanks to Collaborators and Other Research Groups
– IJCAI-2001, AAAI/UAI/KDD-2002, IJCAI-2003 (UMBC: Kargupta, ASU: Liu; Iowa:
Street; MSR: Horvitz; UConn: Santos; HKUST: Guo)
www.kddresearch.org/Workshops
– BNJ/CSR (CMU: Glymour, Scheines; IA State: Honavar, Margaritis, Tian)
– myGrid/TAVERNA (Manchester: Goble, Stevens; EBI: Oinn; Southampton: Addis)
– The Institute for Genomic Research (Quackenbush, Saeed)
– Kansas Geological Survey (Bohling), Kansas Biological Survey, KU EECS
– NSF ITR (KSU Physics: Rahman, Kara; KSU CIS: Wallentine)
http://www.phys.ksu.edu/~a0kara01/ITR/
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences