PPT - BeeSpace - University of Illinois at Urbana

Download Report

Transcript PPT - BeeSpace - University of Illinois at Urbana

BeeSpace:
An Interactive Environment
for Functional Analysis of Social Behavior
Bruce Schatz
Institute for Genomic Biology
University of Illinois at Urbana-Champaign
www.beespace.uiuc.edu
First Annual BeeSpace Workshop
University of Illinois
June 6, 2005
BeeSpace FIBR Project
BeeSpace project is NSF FIBR flagship
Frontiers Integrative Biological Research,
$5M for 5 years at University of Illinois
Analyzing Nature and Nurture in Societal Roles
using honey bee as model
(Functional Analysis of Social Behavior)
Genomic technologies in wet lab and dry lab
Bee [Biology] gene expressions
Space [Informatics] concept navigations
for Social Beehavior
Complex Systems I
Understanding Social Behavior




Honey Bees have only 1 million neurons
Yet…
A Worker Bee exhibits Social Behavior!
She forages when she is not hungry
but the Hive is
She fights when she is not threatened
but the Hive is
for Functional Analysis
Complex Systems II
Understanding Functional Analysis




Molecular Mechanisms of Social Behavior
Can only be Discovered via the
Interactive Navigations of Distributed Systems
The Interspace is the next generation of
of the Net (beyond the Web)
Where Concept Navigation across
Distributed Communities is routine
System Architecture
Post-Genome Informatics
Classical Organisms have extensive Genetic Descriptions!
There will be NO more classical organisms beyond
Mice and Men other than Worms and Flies, Yeasts and Weeds.
So must use comparative genomics to classical organisms,
Via sequence homologies and literature analysis.
Automatic annotation of genes to standard classifications,
Such as Gene Ontology via sequence homology.
 Automatic analysis of functions to scientific literature,
Such as concept spaces via text mining.

Descriptions in Literature MUST be used for future
interactive environments for functional analysis!
Informational Science
Computational Science is the Third Branch of Science
(beyond Experimental and Theoretical)
Genes are Computed, Proteins are Computed,
Sequence “equivalences” are Computed.
Informational Science is coming to be accepted as
The Fourth Branch of Science
Based on Information Science technologies for
Functional Mining of Information Sources
Comparative Analysis within the
Dry Lab of Biological Knowledge
Biology: The Model Organism
The Western Honey Bee, Apis mellifera
has become a primary model for social behavior
Complex social behavior in controllable urban environment
 Normal Behavior – honey bees live in the wild
 Controllable Environment – hives can be modified
Small size manageable with current genomic technology
 Capture bees on-the-fly during normal behavior
 Record gene expressions for whole-brain or brain-region
(Note logistical limitations with bees and expressions)
Informatics: From Bases to Spaces
data Bases support genome data
e.g. FlyBase has sequences and maps
Genes annotated by GeneOntology and
linked to biological literature
BeeBase (Christine Elsik, Texas A&M)
Uses computed homologies to annotate genes
information Spaces support biological literature
e.g. BeeSpace uses automatically generated
conceptual relationships to navigate functions
Project Investigators
BeeSpace project is NSF FIBR flagship
Frontiers Integrative Biological Research,
$5M for 5 years at University of Illinois
Biology
Gene Robinson, Entomology (behavioral expression)
Susan Fahrbach, Wake Forest (anatomical localization)
Sandra Rodriguez-Zas, Animal Sciences (data analysis)
Informatics
Bruce Schatz, Library & Information Science (systems)
ChengXiang Zhai, Computer Science (text analysis)
Chip Bruce, Library & Information Science (users)
Education and Outreach
Explaining Social Behavior at all Levels

Graduate Students and Postdocs as System Users
5 early adopter labs then 15 international labs

Undergraduates to plan Bioinformatics Course
through Susan Fahrbach at Wake Forest
Run Workshop for Middle School Minorities
through UIUC SummerMath (George Reese)



University High School Biology Courses (David Stone)
Home Hi Middle School for Girls Science (Jim Buell)
BeeSpace GOALS
Analyze the relative contributions of
Nature and Nurture in
Societal Roles in Honey Bees
Experimentally measure differential gene expression for
important societal roles during normal behavior
varying heredity (nature) and environment (nurture)
Interactively annotate gene functions for important gene
clusters using concept navigation across biological
literature representing community knowledge
Concept Navigation in BeeSpace
Behavioral
Biologist
Bee
Literature
Molecular
Biology
Literature
Brain Gene
Expression
Profiles
Brain Region
Localization
Neuroscience
Literature
Neuroscientist
Molecular
Biologist
Bee
Genome
Flybase,
WormBase
BeeSpace Software Environment

Will build a Concept Space of Biomedical Literature
for Functional Analysis of Bee Genes
-Partition Literature into Community Collections
-Extract and Index Concepts within Collections
-Navigate Concepts within Documents
-Follow Links from Documents into Databases
Locate Candidate Genes in Related Literatures then
follow links into Genome Databases
BeeSpace Software Implementation

Natural Language Processing
Identify noun phrases
Recognize biological entities

Statistical Information Retrieval
Compute statistical contexts
Support conceptual navigation

Network Information System
Concept switch across community collections
Semantic Links into biological databases
BeeSpace Information Sources

Biomedical Literature
-
-
Medline (medicine)
Biosis (biology)
Agricola, CAB Abstracts, Agris (agriculture)

Model Organisms (heredity)
-
-Gene Descriptions (FlyBase, WormBase)

Natural Histories (environment)
-BeeKeeping Books (Cornell Library, Harvard Press)
Worm Community System (1991)
WCS Information Sources
Literature Biosis, Medline, newsletters, meetings
Data
Genes, Maps, Sequences, strains, cells

WCS Interactive Environment
Browsing
search, navigation
Filtering
selection, analysis
Sharing
linking, publishing


WCS: 250 users at 50 labs across Internet (1991)
NSF National Collaboratories Flagship
WCS
Molecular
WCS
Cellular
Medical Concept Spaces (1998)




Medical Literature (Medline, 10M abstracts)
Partition with Medical Subject Headings (MeSH)
Community is all abstracts classified by core term
 40M abstracts containing 280M concepts
 computation is 2 days on NCSA Origin 2000
Simulating World of Medical Communities
 10K repositories with > 1K abstracts
 (1K with > 10K)
Navigation in MedSpace
For a patient with Rheumatoid Arthritis


Find a drug that reduces the pain (analgesic)
but does not cause stomach (gastrointestinal) bleeding
Choose Domain
Concept Search
Concept Navigation
Retrieve Document
CONCEPT SWITCHING

“Concept” versus “Term”


set of “semantically” equivalent terms
Concept switching

region to region (set to set) match
Semantic region
term
Concept Space
Concept Space
Biomedical Session
Categories and Concepts
Concept Switching
Document Retrieval
Biological Concept Spaces (2006)
Compute concept spaces for All of Biology
BioSpace across entire biomedical literature
50M abstracts across 50K repositories
Use Gene Ontology to partition literature into
biological communities for functional analysis
GO same scale as MeSH but adequate coverage?
GO light on social behavior (biological process)
Interactive Functional Analysis
BeeSpace will enable users to navigate a uniform space
of diverse databases and literature sources for
hypothesis development and testing, with a software
system that goes beyond a searchable database, using
statistical literature analyses to discover functional
relationships between genes and behavior.
Genes to Behaviors
Behaviors to Genes
Concepts to Concepts
Clusters to Clusters
Navigation across Sources
BeeSpace Information Sources
General for All Spaces:
Scientific Literature
-Medline, Biosis, Agricola, Agris, CAB Abstracts
-partitioned by organisms and by functions

Model Organisms
-Gene Descriptions (FlyBase, WormBase, MGI, OMIM,
SCD, TAIR)

Special Sources for BeeSpace:
-Natural History Books (Cornell Library, Harvard Press)
XSpace Information Sources
Organize Genome Databases (XBase)
 Compute Gene Descriptions from Model Organisms
 Partition Scientific Literature for Organism X
 Compute XSpace using Semantic Indexing

Boost the Functional Analysis from Special Sources
 Collecting Useful Data about Natural Histories
 e.g. CowSpace Leverage in AIPL Databases
Towards the Interspace
The Analysis Environment technology is
GENERAL!
BirdSpace? BeeSpace?
PigSpace? CowSpace?
BehaviorSpace? BrainSpace?
BioSpace
… Interspace