Fungal Semantic Web - UNL Office of Research and Economic

Download Report

Transcript Fungal Semantic Web - UNL Office of Research and Economic

Fungal Semantic Web
Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE)
Etsuko Moriyama, Ken Nickerson, Audrey Atkin
(Biological Sciences)
Steve Harris (Plant Pathology)
Motivation
Many fungal genomes being sequenced
– 100s in the next few years
Important fungal genetics work done by
large and strong UNL group
– Have prototype fungal genome database
Numerous groups around the world are
developing disparate fungal genome
databases
– Databases dissimilar and widely distributed
– Difficult to unify others’ results with one’s own
– Unnecessarily complicates research
But it’s impractical to unify the databases!
Semantic Web
Develop an ontology to describe the fungal
genome data
– An ontology is a formal, explicit specification of shared
concepts
– Allows both human and machine processing
– Concepts shared between ontology files on the WWW
– Ontology describes properties of genes, relations
between genes, and operations useful in analyzing
them
Participants keep their own data locally, but
represent it in a way consistent with this
framework
Captures the semantic meaning of the data,
facilitating automatic processing
– This is where the fun starts
Semantic Web Architecture
What we can do with it
Can do transitive reasoning on genes
– E.g. if genes A and B are related via property 1 and B
and C are related via 2, then perhaps so are A and C
Inverse relationships to reduce data entry
– E.g. “EvolvedFrom” data entered automatically
implies “EvolvedTo” relation
Consistency checking
– E.g. verify that UNL’s assertions about fungal
genomes don’t contradict those by others on the
same genomes
Hypothesis building and testing
– E.g. identification of genes that function in specific
cellular processes
Knowledge discovery and data mining
– Ontology includes appropriate techniques for users to
apply to extract new knowledge from the data
What we can do with it (cont’d)
Uniform interface to the world’s collection of
genomic resources
– Visualization, query & search
– Instructional tool: Train postdocs/students as bi/trilingual scientists who can understand molecular
(fungal) biology/genetics, bioinformatics, and
computer science
Can add active machine learning component to
facilitate querying of database to classify new
sequences
– Computer learns how to classify biological sequences
via labeled examples, interaction with the user, and
interaction with other experts
Prior Work
Application of semantic web technology to
bioinformatics is not new
Gene Ontology (http://www.geneontology.org)
– Collection of ontologies related to molecular
functions, biological processes, and cellular
components
– Takes a rather limited view of ontologies
Little (if any) use of quantifiers, shared concepts, etc.
Prior Work (cont’d)
Fungal Web (http://www.cs.concordia.ca/~baker/)
– Built a fungal gene ontology based on GO
– Developing technologies to parse on-line scientific
literature to add data to database
– Tools to query databases and perform analysis
Similar to what we propose, but:
– Their extensions to GO do not suit the needs of UNL
scientists or the broader fungal community
They focus on fungi that degrade cellulose
Their annotations too limited to represent entire fungal kingdom
– They support machine learning, but not active learning
Extending Other Repositories
Use existing ontologies (e.g. GO) and data
stores as a basis for fungal ontologies
– Utilize existing concepts in other gene
ontologies
– Extend to meet needs of fungal genomes
– Extensions can in turn be utilized by other
researchers, both fungal and other kingdoms
Because we use common concepts where
applicable
Funding Opportunities
NSF
– Frontiers in Integrative Biological Research (FIBR): Oct
prerop, Feb full
http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf05597
– Science and Engineering Information Integration and
Informatics (SEIII): December
http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf04528
NIH
– INNOVATIONS IN BIOMEDICAL COMPUTATIONAL
SCIENCE AND TECHNOLOGY: Sept LOI, Oct full
http://grants1.nih.gov/grants/guide/pa-files/PAR-03-106.html
Nebraska Research Initiative: November
Conclusions
Semantic web now popular within bioinformatics,
but no support for the work of UNL’s fungal
research community
We plan to build the necessary infrastructure to
unify disparate data sources and provide an
interface conducive to knowledge discovery,
hypothesis testing, and collaboration
– Will build on existing fungal database here at UNL
– Contributions: distributed infrastructure, means for
querying, drawing inferences
We should send someone to the KnowledgeBased Bioinformatics Workshop to learn more
about the state of the art