MGED Ontology Workshop

Download Report

Transcript MGED Ontology Workshop

MGED Ontology Workshop
MGED7
September 8-10, 2004
Toronto, Canada
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
MAGE Workshop
May 24th- May 28th 2004
The Institute for Genomic Research
and
University of Maryland, Shady Grove
Rockville, MD
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
MAGE Workshop Goals
• MAGE -> MGED Ontology API
– Main goal for this meeting
– Build a mechanism for people to use the MO as part of
MAGE
• Ontology Tools
– In order to use MO we need tools for manipulating and
managing the ontology
• MAGE v2 Model
– Continue model discussions toward MAGE v2
– Document Model and changes as the model is developed
– Begin work on Mapping MAGE v1 to MAGE v2
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
MAGE Workshop Goals
• Standalone ADF Converter
– Continue work on simplified Array Design
Format and a reader and writer for it
– Integrate this into MAGE v2
• Documentation
– MO policies and usage
– MOE class or methods
– MAGE v2
• Code, Code, Code …
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Ontology Tools
• daml file parsing scripts for extracting the
classes, instances and properties from MO
• ANSI SQL scripts for creating MO in a
relational database like MySQL or Sybase
• Script based methods for updating a
datrabase implementation of the MO
• Perl and Java methods for searching the MO
for classes, instances, and properties
• Others ?
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
OWL Reader:
Adam Witney developed an reader based on
the Redland RDF reader that will parse the
MGED Ontology from .OWL as well as
.DAML files. This reader became useful to
the Ontology helper API. It is located in cvs:
lib/Perl/script/MGEDOntology_parser.pl
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Ontology Helper API:
• Eric Deutsch and Kjell Petersen developed ‘MGEDOntologyHelper.pm’
which will create OntologyEntry objects based on ‘leaf node’ data. Both
applications follow MGEDOntology policies.
• MO Traversal application written by Kjell Petersen and Stathis Sideris.was
extended to use ‘leaf node’ data to instantiate an OE MAGE-ML object.
• Eric Deutsch ported Stathis’ code to a perl modules, which returns nested
OE objects. The Perl helper is nearly complete. It works well in simple
cases, but in nested cases, the final ‘value’ doesn’t get inserted.
• These ‘prototype’ modules and applications are available in cvs:
• Perl: MAGE-Perl/MAG/Tools. Java: MAGE-Java/MGEDOntologyEntry
• The code also is a working prototype.
• Example scripts are located in ‘lib/Perl/MGEDOntology
• The Jave Helper has been completed and is fully functional.
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Perl modules
MGEDOntologyClassEntry.pm
MGEDOntologyEntry.pm
MGEDOntologyHelper.pm
MGEDOntologyPropertyEntry.pm
Java classes
MGEDOntologyEntry.java
MGEDOntologyClassEntry.java
MGEDOntologyPropertyEntry.java
OntologyHelper.java
StringManipHelpers.java
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Example Perl for MGEDOntologyHelper
my $ontologyEntry1 =
Bio::MAGE::Tools::MGEDOntologyClassEntry->new(
parentObject => $qt,
## ref to a QT object
className => 'QuantitationType',
association => 'Scale',
values => {
Scale => 'linear_scale',
},
ontology => $ontology, ## ref to MO Helper obj
);
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Resulting OE
<OntologyEntry value="linear_scale"
category="Scale">
<OntologyReference_assn>
<DatabaseEntry
URI="http://mged.sourceforge.net/ontologies/MGEDOn
tology.php#linear_scale"
accession="linear_scale">
<Database_assnref>
<Database_ref identifier="MO"/>
</Database_assnref>
</DatabaseEntry>
</OntologyReference_assn>
</OntologyEntry>
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Another example
my $ontologyEntry1 =
Bio::MAGE::Tools::MGEDOntologyClassEntry->new(
parentObject => $BioSource, ## ref to parent
className => 'BioMaterial',
association => 'Characteristics',
values => {
OrganismPart => 'lung',
},
ontology => $ontology,
## ref to MO obj.
);
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Missing value in resulting OE
<OntologyEntry value="BioMaterialCharacteristics“
category="BioMaterialCharacteristics">
<OntologyReference_assn>
<DatabaseEntry
URI=http://mged.sourceforge.net/ontologies/MGEDOntology.php#BioMaterialCharacter
istics
accession="BioMaterialCharacteristics">
<Database_assnref>
<Database_ref identifier="MO"/>
</Database_assnref>
</DatabaseEntry>
</OntologyReference_assn>
<Associations_assnlist>
<OntologyEntry value="OrganismPart“
category="OrganismPart">
<OntologyReference_assn>
<DatabaseEntry
URI="http://mged.sourceforge.net/ontologies/MGEDOntology.php#OrganismPart"
accession="OrganismPart">
<Database_assnref>
<Database_ref identifier="MO"/>
</Database_assnref>
</DatabaseEntry>
</OntologyReference_assn>
</OntologyEntry>
</Associations_assnlist>
</OntologyEntry>
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Autogenerated MO classes:
Scott Gustofsen has devised a method of
generating Java classes from the MO. He
calls this Java Ontology Bindings. The
code isn’t yet implemented, but will be in
the near term. The classes would have to be
regenerated with each release of MO. He
has agreed to push it into cvs.
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
BioMoby Web service:
Tina Boussard and Derek Fowler began work to establish a
BioMoby service to do the following:
1) search MO for terms and definitions,
2) return instantiated classes [objects],
3) provide command-line client register service [for batch
processes].
The service would likely be hosted at CBIL (Upenn). A
namespace has been defined; datatype and output/input
formats for searches still need to be defined. Currently, the
service uses GO as the test case database. Derek Fowler
has set up a BioMoby test service: test_GetMoTerm.
• The Moby namespace was problematic because another
group at Cornell has registered their namespace
incorrectly. They've been asked to fix it.
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
MO Term Tracker/Validator:
Trish Whetzel implemented a term tracker for the RAD
database.
The use cases for the Tracker are:
1) return new terms proposed by date and/or submitter,
2) return all terms for any MO class,
3) consistency check the MO.
The Tracker would be hosted at CBIL. Helen Parkinson
and Trish discussed methods and use cases for
managing the MO in RAD.
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
ADF Converter:
Philippe Rocca-Serra continued work on the
ADF format with Michael Miller and Pierre
Bushel. In order to incorporate chip/cgh
data he had to remove Reporter and
CompositeSequence identifiers. The format
now is in an Excel workbook with 3
worksheets: headers, Reporters and
CompositeSequences. The code will be
placed in cvs when ready.
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
MAGEv2 Progress:
Ugis Sarkans, Paul Spellman, Michael Miller and Angel Pizarro continued re-model-ing discussions. The
following changes are being considered:
1. Use Channel to link FactorValues to particular BioMaterials
2. BioAssays are now generalized to (hopefully) include all types of experimental protocols. As such the
BioMaterial Treatment object is now a type of BioAssay.
3. ArrayDesign and DesignElement will mostly be left as-is in the model and the reference
implementation of the model will have native support for a simpler format, probably ADF. It is not
yet clear whether the default serialization (XML schema) will have both formats.
4. Protocols have been changed to be multistep protocols, i.e. a set of ordered steps, and not just simple
protocol descriptions. This has allowed the changes in BioAssay to take place.
5. A new abstract class, ‘Referenceable’ was devised to separate the concept of internal MAGE references
(Identifiable) to objects that exist in other resources (e.g. they have associations to DatabaseEntry and
BibliographicReference)
6. OntologyEntry will be redefined to allow for representation of frame-based and descriptive logic
ontologies (MO), as well as simpler node-based ontologies (GO)
7. HigherLevelAnalysis will be extended to represent other types of analytical results, The current cluster
represnetation will remain the same, modulo some bug fixes.
8. The MAGE submitters notes were searched for best-practices issues that could be solved by the model.
The conclusion was that most of the best-practice recommendations stem from semantic checks, not
syntactic, so the model changes would not suffice.
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
MO Problems:
Kjell and Eric found a couple of
inconsistencies in the Ontology which are
being investigated - these have been posted
into the MGED Ontology tracker.
There was an issue about how we handle
deprecated instances and classes; this will
also be investigated at the next VOW.
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH
Ontology Helper API:
•
•
•
Eric Deutsch and Kjell Petersen developed an API which will create OntologyEntry
objects based on ‘leaf node’ data, i.e. the instance values for the OE. Both
applications follow MGEDOntology policies.
Kjell Petersen extended a Java application written together with Stathis Sideris after
the EBI jamboree last December. The application traverses the MO to generate
uninitalised data structures according to the policies, with the possible choices from
MO available at each node. The extended code will now take a minimum set of
'leaf node' data as input, validate them against MO and instantiate the full data
structureof MAGE OntologyEntry. The code is a working prototype. The classes
are in cvs: MAGE-Java/MGEDOntologyEntry. Kjell completed the Java Helper
such that it produces complete and filled OE objects.
Eric Deutsch ported Stathis’ code to a set of perl modules that are used by a further
module, ‘MGEDOntologyHelper.pm’, which returns nested OE objects. The helper
is nearly complete. It works well in simple cases, but in nested cases, the final
‘value’ doesn’t get inserted. These modules are available in cvs:
MAGE-Perl/MAGE/Tools. The code also is a working prototype. Example scripts
that use it are located in ‘lib/Perl/MGEDOntology’ as test1.pl , test2.pl, and
test3.pl.).
TIGR
THE INSTITUTE FOR GENOMIC RESEARCH