Transcript ppt

Analysis Environments
For Scientific Communities
From Bases to Spaces
Bruce R. Schatz
Institute for Genomic Biology
University of Illinois at Urbana-Champaign
[email protected],www.beespace.uiuc.edu
Baker Center for Bioinformatics
Iowa State University
October 6, 2006
What are Analysis Environments

Functional Analysis



Find the underlying Mechanisms
Of Genes, Behaviors, Diseases
Comparative Analysis


Top-down data mining (vs Bottom-up)
Multiple Sources especially literature
Building Analysis Environments

Manual by Humans



Interaction
Classification
user navigation
collection indexing
Automatic by Computers


Federation
Integration
search bridges
results links
Trends in Analysis Environments
Central versus Distributed Viewpoints

The 90s Pre-Genome



Entrez (NIH NCBI) versus
WCS (NSF Arizona)
The 00s Post-Genome


GO (NIH curators) versus
BeeSpace (NSF Illinois)
Pre-Genome Environments
Focused on Syntax pre-Web

WCS (Worm Community System)



Search words across sources
Follow links across sources
Words automatic, Links manual
Towards Integrated Searching
Post-Genome Environments
Focused on Semantics post-Web

BeeSpace (Honey Bee Inter Space)



Navigate concepts across sources
Integrate data across sources
Concepts automatic, Links automatic
Towards Conceptual Navigation
Worm Community System
WCS Information:
Literature BIOSIS, MEDLINE, newsletters, meetings
Data
Genes, Maps, Sequences, strains, cells

WCS Functionality
Browsing
search, navigation
Filtering
selection, analysis
Sharing
linking, publishing


WCS: 250 users at 50 labs across Internet (1991)
WCS
Molecular
WCS
Cellular
WCS
invokes
gm
WCS
vis-à-vis
acedb
Towards the Interspace

from Objects to Concepts

from Syntax to Semantics

Infrastructure is Interaction with Abstraction
Internet is packet transmission across computers
Interspace is concept navigation across repositories
THE THIRD WAVE OF NET EVOLUTION
CONCEPTS
OBJECTS
PACKETS
LEVELS OF INDEXES
Technology
Engineering
FORMAL
(manual)
Electrical
IEEE
communities
INFORMAL
groups
(automatic)
individuals
Post-Genome Informatics I
Comparative Analysis within the
Dry Lab of Biological Knowledge
Classical Organisms have Genetic Descriptions.
There will be NO more classical organisms beyond
Mice and Men, Worms and Flies, Yeasts and Weeds.

Must use comparative genomics on classical organisms
Via sequence homologies and literature analysis.
Post-Genome Informatics II
Functional Analysis within the
Dry Lab of Biological Knowledge
Automatic annotation of genes to standard
classifications, e.g. Gene Ontology via homology on
computed protein sequences.

Automatic analysis of functions to scientific
literature, e.g. concept spaces via text extractions.
Thus must use functions in literature descriptions.

Informatics: From Bases to Spaces
data Bases support genome data
e.g. FlyBase has sequences and maps
Genes annotated by GeneOntology and
linked to biological literature
information Spaces support biological literature
e.g. BeeSpace uses automatically generated
conceptual relationships to navigate functions
BeeSpace FIBR Project
BeeSpace project is NSF FIBR flagship
Frontiers Integrative Biological Research,
$5M for 5 years at University of Illinois
Analyzing Nature and Nurture in Societal Roles
using honey bee as model
(Functional Analysis of Social Behavior)
Genomic technologies in wet lab and dry lab
Bee [Biology] gene expressions
Space [Informatics] concept navigations
System Architecture
Concept Navigation in BeeSpace
Behavioral
Biologist
Bee
Literature
Molecular
Biology
Literature
Brain Gene
Expression
Profiles
Brain Region
Localization
Neuroscience
Literature
Neuroscientist
Molecular
Biologist
Bee
Genome
Flybase,
WormBase
V1 BeeSpace Community Collections

Organism



Behavior



Social / Territorial
Foraging / Nesting
Development




Honey Bee / Fruit Fly
Song Bird / Soy Bean
Behavioral Maturation
Insect Development
Insect Communication
Structure


Fly Genetics / Fly Biochemistry
Fly Physiology / Insect Neurophysiology
CONCEPT SWITCHING

“Concept” versus “Term”


set of “semantically” equivalent terms
Concept switching

region to region (set to set) match
Semantic region
term
Concept Space
Concept Space
BeeSpace Analysis Environment

Build Concept Space of Biomedical Literature
for Functional Analysis of Bee Genes
-Partition Literature into Community Collections
-Extract and Index Concepts within Collections
-Navigate Concepts within Documents
-Follow Links from Documents into Databases
Locate Candidate Genes in Related Literatures
then follow links into Genome Databases
Well Characterized Gene
Poorly Characterized Gene
Gene Summarization, BeeSpace V2
Collaboration across Users
Category Browse (Collection)
Category Browse (Search)
PlantSpace Examples
Interactive Functional Analysis
BeeSpace will enable users to navigate a uniform space of
diverse databases and literature sources for hypothesis
development and testing, with a software system beyond a
searchable database, using literature analyses to discover
functional relationships between genes and behavior.
Genes to Behaviors
Behaviors to Genes
Concepts to Concepts
Clusters to Clusters
Navigation across Sources
BeeSpace Information Sources
General for All Spaces:
Scientific Literature
-Medline, Biosis, CAB Abstracts
 Genome Databases
-GenBank, ProteinDataBank, ArrayExpress

Special for BeeSpace:
Model Organisms (heredity)
-Gene Descriptions (FlyBase, WormBase)
 Natural Histories (environment)
-BeeKeeping Books (Cornell, Harvard)

XSpace Information Sources
Organize Genome Databases (XBase)
 Compute Gene Descriptions from Model Organisms
 Partition Scientific Literature for Organism X
 Compute XSpace using Semantic Indexing

Boost the Functional Analysis from Special Sources
 Collecting Useful Data about Natural Histories
 e.g. CowSpace Leverage in AIPL Databases
Towards SoySpace




Organize Genome Databases (SoyBase)
Partition Scientific Literature for SoyBean
Gene Descriptions from Models (TAIR)
Natural Histories from Population Databases
Key to Functional Analysis is Special Sources

Collecting Appropriate Text about Genes

Extracting Adequate Data about Histories

Leverage is National Archives of germplasm
and Historical Records for soybean crops
Towards the Interspace
The Analysis Environment technology is
GENERAL!
BirdSpace? BeeSpace?
PigSpace? CowSpace?
BehaviorSpace? BrainSpace?
SoySpace? PlantSpace?
BioSpace
… Interspace