AMRiN - Atlas of Living Australia

Download Report

Transcript AMRiN - Atlas of Living Australia

Towards a Data Model
for the
Australian Microbial Resources
Information Network
(AMRiN)
Version: 0.03
17/09/2010
Lynette Woodburn
Atlas of Living Australia
TIP
Each slide in this presentation comes with accompanying Notes.
You can’t see them if you display this presentation in ‘Slide Show’ mode.
If you’d like to see the Notes
• view the presentation in ‘Normal’ mode, and
• expand the pane below the slide (the Notes pane) to see extra text.
Only then will you have a chance of understanding all the crazy diagrams.

Towards a data model for AMRiN
Requirement
a standard set of data fields for all micro-organisms
. to support the sharing and integration of data through AMRiN
. to pre-configure BioloMICS
Options
. choose an existing set
. develop something new
Recommendation
. surprise!
1. Requirements
2. Options
3. Recommendation
AMRiN community
AMRiN
AMRiN community
AMRiN
AMRiN community
AMRiN
1. Requirements
2. Options
- existing
CABRI
MCL
3. Recommendation
Common Access to Biological Resources and Information
CABRI
a European organization of partner collections
who contribute data to searchable ‘catalogues’ covering
•
bacteria & archaea
•
fungi & yeasts
•
animal & human cell lines
•
plant cell lines
•
hybridomas
•
phages
•
plasmids
•
plant cell viruses
•
genomic libraries
http://www.cabri.org/
CABRI’s sets of data elements
elements per set
•
bacteria & archaea •
26
•
fungi & yeasts
•
23
•
animal cell lines
•
29
•
plant cell lines
•
17
Isolated_from
Doubling_time
Morphology
•
hybridomas
•
15
•
phages
•
33
•
plasmids
•
30
•
plant cell viruses
•
12
•
genomic libraries
•
7
Lysogenicity
Original_host_plant
Common Access to Biological Resources and Information
CABRI
For each different kind of biological resource,
CABRI defines nested sets of data elements
Mandatory
Recommended
Full
CABRI : bacteria & archaea
Mandatory
Strain_number
Other_collection_numbers
Restrictions
Organism_type
Name
Infrasubspecific_names
Status
History
Conditions_for_growth
Form_of_supply
Recommended
Serovar
Other_names
Isolated_from
Geographic_origin
Mutant
Genotype
Literature
Full
Sexual_state
Pathogenicity
Enzyme_production
Metabolite_production
Applications
Catalogue_entry
Remarks
Price_code
Plasmids
CABRI : fungi & yeasts
Mandatory
Strain_number
Other_collection_numbers
Name
Status
Organism_type
History
Restrictions
Form_of_supply
Conditions_for_growth
Recommended
Misapplied_names
Race
Substrate
Geographic_origin
Literature
Applications
Mutant
Sexual_state
Full
Price_code
Remarks
Pathogenicity
Metabolite_production
Enzyme_production
Genotype
CABRI : animal & human cell lines
Mandatory
Accession_number
Cell_line_name
Brief_description
Description
Depositor
Bibliographic_references
Morphology
Culture_conditions
Viruses
Properties
Release_conditions
Hazard
Recommended
Passage_number
Species_validation
Full
Tumorigenicity
Karyology
Freezing_medium
Sterility
Validation_assays
Further_bibliography
Comments
Storage
Doubling_time
Mycoplasma
Fingerprint
Cytogenetics
Karyotype
Comments
Research_council_deposit
BIOMED_1
CABRI’s sets of data elements
•
bacteria & archaea •
26
•
fungi & yeasts
•
23
•
animal cell lines
•
29
•
plant cell lines
•
17
•
hybridomas
•
15
•
phages
•
33
•
plasmids
•
30
•
plant cell viruses
•
12
•
genomic libraries
•
7
192
Sharing data about one kind of biological resource is easy
eg. phages
Sharing data about one kind of biological resource is easy
eg. plasmids
Sharing data about multiple kinds of biological resources
is hard
Other_culture_collection_numbers
Other_collection_numbers
What is the prospect of deriving a common model from CABRI
for describing several different kinds of biological resources ?
133 distinct data elements …
… distributed across 9 sets
CABRI as a common model ?
each of 92 elements is found in only one set
only 41 elements are found in more than one set
CABRI as a common model ?
27 data elements are found in two sets
10 …..
in three
4 …..
in four
No elements are found in more than 4 sets
Distribution of data elements across CABRI sets
Count of data elements
in one set
6
•
bacteria & archaea
•
fungi & yeasts
•
animal cell lines
•
hybridomas
•
phages
•
plant cell lines
•
plant cell viruses
•
plasmids
•
genomic libraries
two
three
3 22 7 14 12 9 13 6 11 4 12 2
1
four
2
2
1
1
1
3
1
CABRI data element ‘themes’
….
•
bacteria & archaea
•
fungi & yeasts
•
animal cell lines
•
plant cell lines
•
hybridomas
•
phages
•
plasmids
•
plant cell viruses
•
genomic libraries
CABRI : comparison of elements across sets
• different names, same meaning (definition)
Morphology, Morphology_and_growth
History, History_of_deposit
Accession_number, Strain_number
Bibliographic_references, Reference_paper,
Literature, Reference, Further_bibliography
Restricted_distribution, Release_conditions,
Restrictions, Distribution
….
CABRI : comparison of elements across sets
• same name, different meanings
Brief_description
hybridomas
listing of species, strain, antibody specificity
animal cell lines
listing of species, strain, tissue, tumour, pathology,
transformed/transfected
phages
type of element
Type
phage, transposon, minitransposon, IS element, …
plasmids
type of element
plasmid, phasmid, cosmid, shuttle vector, transposon,
minitransposon, IS element, …
genomic libraries
type of library
PAC, BAC, YAC, PI, cDNA, …
CABRI : comparison of data element sets
• varying levels of scope
Conditions_for_growth
bacteria & archaea
fungi & yeasts
Medium
plasmids, phages
Medium_1
plant cell lines
Light_regime
plant cell lines
Light_conditions
plant cell lines
Temperature
plant cell lines
Humidity
plant cell lines
culture medium
atmospheric and light conditions
temperature conditions
additional remarks on cultivation
CABRI : fitness for our purpose
• 9 sets of data elements (but does not cover algae)
good for sharing information about one kind of organism
• few elements common to several sets
hard to share information about more than one kind of organism
• does not lend itself to the derivation of a common set
elements of ‘different names, same meaning’
elements of ‘same name, different meanings’
elements with meanings of varying scope
• has international acceptance / presence (but no longer funded?)
1. Requirements
2. Options
- existing
CABRI
MCL
3. Recommendation
Microbiological Common Language
MCL
• a new data exchange standard for microbiological information
Research in Microbiology, 161(6), 439-445
http://www.straininfo.net/projects/mcl
• a pluggable framework, easily extended
• has the same ancestor as CABRI (MINE)
• underpins StrainInfo
(www.straininfo.net)
“ a world-wide, virtual catalog integrating the information from BRC
[Biological Resource Centres] catalogs with related information”
CABRI compared with MCL
CABRI
MCL
partitioned by kind of biological resource
partitioned by workflow step
The abstract model of Microbiological Common Language (MCL)
Strain
Deposit
Sample
Isolation
Culture
Medium
Publication
… follows the logical flow from sampling to subsequent deposits
mcl : Sample
Sample
sampleDate
sampleCollector
sampleCollectorInstitute
sampleCulture
sampleCultureStrainNumber
sampleDescription
sampleLocationDescription
sampleLocationCountry
sampleLocationPlace
sampleHabitat
sampleHabitatEnvoTerm
sampleAlt
sampleLat
sampleLong
comments
mcl : Culture
Culture
id
history
isolationDate
isolator
isolatorInstitute
isolationMethod
speciesName
typeStrainOf
typeStrainOfSpecies
typeStrainOfGenus
strainNumber
otherStrainNumber
[otherStrainNumbers]
catalogURL
oxygenRelationship
[growthTemperature]
minimalGrowthTemperature
optimalGrowthTemperature
maximalGrowthTemperature
hasSample
recommendMedium
publication
nomenclaturalPublication
environmentPublication
historyPublication
taxonomicPublication
cultureLastUpdateDate
comments
some Object Properties
Culture
Sample
Medium
hasSample
recommendMedium
publication
nomenclaturalPublication
environmentPublication
historyPublication
taxonomicPublication
Publication
mcl : Medium
mcl : Publication
Medium
mediumName
mediumNumber
mediumURL
mediumDescription
comments
Publication
dcterms: bibliographicCitation
dc: title
dc: creator
prism: publicationName
prism: volume
prism: number
prism: startingPage
prism: pageRange
dcterms: issued
MCL : fitness for our purpose
• MCL offers a broadly-applicable suite of data elements
. data elements are grouped according to workflow steps, not organism type
. applicable to algae and cyanobacteria
. the Strain concept supports the logical linking of related cultures
• the model is modular and easily extensible
. model cohesion is achieved through Object Properties
. links easily with genomic standards (see StrainInfo)
• born and raised in Europe (StrainInfo), but now going global
. Asian biorepositories network is considering adoption
. we’re invited to contribute to ongoing development
• primarily devised (custom-built) as a data exchange standard
1. Requirements
2. Options
3. Recommendation
Recommendation : dip a toe into the water
• MCL, custom-built for describing microbiological data, deserves consideration
Proposal
undertake a pilot, involving a small group of AMRiN participants,
to assess the suitability of MCL for AMRiN’s purpose.
AMRiN community
AMRiN
AMRiN participants’ input
map local elements
to MCL elements
Note:
some MCL elements
may not have a local
equivalent
identify local elements
to be kept ‘private’
identify other local elements
to be shared ;
provide English definitions
to enable reconciliation with
other participants’ elements
Pilot assessment
• Coverage?
How much orange
overlaps purple?
• What additional common elements exist amongst the set to be shared?
How much purple
overlaps purple?
• Other assessment criteria?
Pulling the pieces together
Please consider the foregoing proposal.
Does it seem reasonable to you?
Do you think there’s a better way?