Transcript Document

BeeSpace:
An Interactive Environment
for Functional Analysis of Social Behavior
Bruce Schatz, Principal Investigator
Graduate School of Library & Information Science (GSLIS)
Department of Computer Science, Program in Neuroscience
[email protected], www.canis.uiuc.edu
Theme for Genomics of Neural and Behavioral Plasticity
www.beespace.uiuc.edu
IGB Thematic Research Seminar, November 2, 2004
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Bee Counted – Vote Today!
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
BeeSpace FIBR Project
BeeSpace project is NSF FIBR flagship
Frontiers Integrative Biological Research,
$5M for 5 years at University of Illinois
Nature-Nurture using honey bee as model
Genome technologies in wet lab and dry lab biology
Localized Gene Expression for Normal Social Behavior
Gene Robinson, Entomology (behavioral expressions)
Susan Fahrbach, Entomology (anatomical localization)
Sandra Rodriguez-Zas, Animal Sciences (data analysis)
Interactive Information System for Functional Analysis
Bruce Schatz, Library & Information Science (info systems)
ChengXiang Zhai, Computer Science (text analysis)
Chip Bruce, Library & Information Science (user support)
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Post-Genome Informatics
Classical Organisms have extensive Genetic Descriptions
There will be NO more classical organisms beyond
Mice and Men other than Worms and Flies, Yeasts and Weeds.
So must use comparative genomics to classical organisms,
Via sequence homologies and literature analysis.
Automatic annotation of genes to standard classifications,
Such as Gene Ontology via sequence homology.
Automatic analysis of functions to scientific literature,
Such as concept spaces via text mining.
Descriptions in Literature MUST be used for future
interactive environments for functional analysis!
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Informational Science
Computational Science is widely accepted as the
Third Branch of Science (beyond Experimental and Theoretical)
Genes are Computed, Proteins are Computed,
Sequence “equivalences” are Computed.
Informational Science is coming to be accepted as the
Fourth Branch of Science
Based on Information Science technologies for
Functional Mining of Information Sources
Comparative Analysis within the
Dry Lab of Biological Knowledge
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Conceptual Navigation in BeeSpace
Behavioral
Biologist
Bee
Literature
Molecular
Biology
Literature
Brain Gene
Expression
Profiles
Brain Region
Localization
Neuroscience
Literature
Neuroscientist
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Molecular
Biologist
Bee
Genome
Flybase,
WormBase
Biology: The Model Organism
The Western Honey Bee, Apis mellifera
has become a primary model for social behavior
Complex social behavior in controllable urban environment
Normal Behavior – honey bees live in the wild
Controllable Environment – hives can be modified
Small size manageable with current genomic technology
Capture bees on-the-fly during normal behavior
Record gene expressions for whole-brain or brain-region
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Informatics: From Bases to Spaces
data Bases support genome data
e.g. FlyBase has sequences and maps
Genes annotated by GeneOntology and linked to literature
BeeBase (Christine Elsik, Texas A&M)
Uses computed homologies to annotate genes
information Spaces support biomedical literature
e.g. BeeSpace uses automatically generated
conceptual relationships to navigate functions
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
BeeSpace Software Environment
Will build a Concept Space of Biomedical Literature for
Functional Analysis of Bee Genes
-Partition Literature into Community Collections
-Extract and Index Concepts within Collections
-Navigate Concepts within Documents
-Follow Links from Documents into Databases
Locate Candidate Genes in Related Literatures then
follow links into Genome Databases
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
BeeSpace Software Implementation
Natural Language Processing
Identify noun phrases
Recognize biological entities
Statistical Information Retrieval
Compute statistical contexts
Support conceptual navigation
Network Information System
Concept switch across community collections
Semantic Links into biological databases
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
BeeSpace Information Sources
Biomedical Literature
- Medline (medicine)
- Biosis
(biology)
- Agricola, CAB Abstracts, Agris (agriculture)
Model Organisms (heredity)
-Gene Descriptions (FlyBase, WormBase)
Natural Histories (environment)
-BeeKeeping Books (Cornell Library, Harvard Press)
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Worm Community System (1991)
WCS Information Sources
Literature Biosis, Medline, newsletters, meetings
Data
Genes, Maps, Sequences, strains, cells
WCS Interactive Environment
Browsing
search, navigation
Filtering
selection, analysis
Sharing
linking, publishing
WCS: 250 users at 50 labs across Internet (1991)
Flagship in NSF National Collaboratory program
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
WCS
Molecular
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
WCS
Cellular
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
WCS
PPCS
demo
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Medical Concept Spaces (1998)
Obtain discipline-scale collection
Medline from NLM, 10M bibliographic abstracts
human classification: Medical Subject Headings
Partition discipline into Community Repositories
4 core terms per abstract for MeSH classification
32K nodes with core terms (classification tree)
Community is all abstracts classified by core term
40M abstracts containing 280M concepts
computation took 2 days on NCSA Origin 2000
Simulating World of Medical Communities
10K repositories with > 1K abstracts
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
(1K w/ > 10K)
Navigation in MedSpace
For a patient with Rheumatoid Arthritis
Find a drug that reduces the pain (analgesic)
but does not cause stomach (gastrointestinal) bleeding
Choose Domain
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Concept Search
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Concept Navigation
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Retrieve Document
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Biomedical Session
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Categories and Concepts
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Concept Switching
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Document Retrieval
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Biological Concept Spaces (2005)
Compute concept spaces for All of Biology
BioSpace across entire biomedical literature
50M abstracts across 50K repositories
Use Gene Ontology to partition literature into
biological communities for functional analysis
GO same scale as MeSH but adequate coverage?
GO light on social behavior (biological process)
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Interactive Functional Analysis
BeeSpace will enable users to navigate a uniform space of diverse
databases and literature sources for hypothesis development
and testing, with a software system that goes beyond a
searchable database, using statistical literature analyses to
discover functional relationships between genes and behavior.
Genes to Behaviors
Behaviors to Genes
Concepts to Concepts
Clusters to Clusters
Navigation across Sources
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
BeeSpace Information Sources
General for All Spaces:
Scientific Literature
-Medline, Biosis, Agricola, Agris, CAB Abstracts
-partitioned by organisms and by functions
Model Organisms
-Gene Descriptions (FlyBase, WormBase, MGI, SCD, TAIR)
Special Sources for BeeSpace:
-Natural History Books (Cornell Library, Harvard Press)
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
XSpace Information Sources
Organize Genome Databases (XBase)
Compute Gene Descriptions from Model Organisms
Partition Scientific Literature for Organism X
Compute XSpace using Semantic Indexing Technology
Boost the Functional Analysis from Special Sources
Collecting Useful Data about Natural Histories
e.g. CowSpace Leverage in AIPL Databases
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign
Beyond BeeSpace
The Analysis Environment technology is GENERAL!
BirdSpace? BehaviorSpace? BrainSpace?
SoySpace? CowSpace? IGBSpace?
BioSpace
Internet will evolve into Interspace…
INSTITUTE FOR GENOMIC BIOLOGY
University of Illinois at Urbana-Champaign