The Web-Enabled Research Commons

Download Report

Transcript The Web-Enabled Research Commons

The Web-Enabled Research
Commons: Applications,
Goals, and Trends
Thinh Nguyen
October 2009
Use Case #1
NeuroCommons Project:
Science Commons project using
Semantic Web to link massive amounts
of data
27,266 papers
128,437 papers
41,985 papers
4,563 papers
10,365 papers
PDSPki
Reactome
Gene
Ontology
BAMS
NeuronDB
Entrez
Gene
Antibodies
Allen Brain
Atlas
Literature
BrainPharm
SWAN
Homologene
PubChem
AlzGene
Mammalian
Phenotype
MESH
credit: W3C HCLS
PDSPki
NeuronDB
Reactome
Gene
Ontology
BAMS
Antibodies
Entrez
Gene
Allen Brain
Atlas
MESH
Literature
Mammalian
Phenotype
SWAN
AlzGene
BrainPharm
Homologene
PubChem
making computers understand linkages
(the WWW)
links to
Web page
Web page
directed, contextual links
is located in
receptor
Cell membrane
ht
“URI”
(unique names for things on the web)
http://ontology.foo.org/is_located_in
is located in
receptor
Cell membrane
http://ontology.foo.org/receptor
http://ontology.foo.org/compartment
ht
has
neuron
Cell membrane
is located in
receptor
Cell membrane
is located in
channel
Cell membrane
using the web to integrate
data and databases
“compartment”
“container”
“doohickey”
Cell membrane
http://ontology.foo.org/compartment
better answers through better formats:
prefix go: <http://purl.org/obo/owl/GO#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix mesh: <http://purl.org/commons/record/mesh/>
prefix sc: <http://purl.org/science/owl/sciencecommons/>
prefix ro: <http://www.obofoundry.org/ro/ro.owl#>
select ?genename ?processname
where
{ graph <http://purl.org/commons/hcls/pubmesh>
{ ?paper ?p mesh:D017966 .
?article sc:identified_by_pmid ?paper.
?gene sc:describes_gene_or_gene_product_mentioned_by ?article.
}
graph <http://purl.org/commons/hcls/goa>
{ ?protein rdfs:subClassOf ?res.
?res owl:onProperty ro:has_function.
?res owl:someValuesFrom ?res2.
?res2 owl:onProperty ro:realized_as.
?res2 owl:someValuesFrom ?process.
graph <http://purl.org/commons/hcls/20070416/classrelations>
{{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166}
union
{?process rdfs:subClassOf go:GO_0007166 }}
?protein rdfs:subClassOf ?parent.
?parent owl:equivalentClass ?res3.
?res3 owl:hasValue ?gene.
}
graph <http://purl.org/commons/hcls/gene>
{ ?gene rdfs:label ?genename }
graph <http://purl.org/commons/hcls/20070416>
{ ?process rdfs:label ?processname}
}
Mesh: Pyramidal Neurons
Pubmed: Journal Articles
Entrez Gene: Genes
GO: Signal Transduction
•reformat what we already have
•reformat into a commons, not a closed
system
•get the materials into the emerging
research web
What data sharing protocol (legal and
policy) best enables use of Web
technology?
“Licensing” Archetypes
• Public Domain: No restrictions on use or
distribution, no contracts, copyright waived.
• Community Licenses: standard “open
access” licenses, a range of rights, some
rights reserved, available to all
• Private Licenses: custom agreements,
varies by institution, privately negotiated, may
be offered only to some
Goals
• Interoperable: data from many sources can
be combined without restriction
• Reusable: data can be repurposed into new
and interesting contexts
• Administrative Burden: low transaction
costs and administrative costs over time
• Legal Certainty: users can rely on legal
usability of the data
• Community Norms: consistent with
community expectations and usages
Interoperability
• Public Domain ****
– Can be combined with other data sources with
ease
• Community Licenses *** / **
– Depends on type of license: share-alike or copyleft
are unsuitable, but attribution-only licenses are
less problematic
• Private Licenses * / **
– Depends on restrictions, but not scalable;
permutations too large
Reusable
• Public Domain ****
– No restrictions on subsequent use
• Community Licenses ***
– Depends on license, but some licenses
such as NC / ND can be restrictive
• Private Licenses **
– Depends on license, but typically restrictive
Administrative Burden
• Public Domain ****
– No paperwork or legal review needed
• Community License ***
– Little paperwork, but some legal review
needed (attribution stacking issues)
• Private Licenses *
– Large amounts of paperwork, frequent
legal review needed
Legal Certainty
• Public Domain **** / ***
– Clear rights; generally irrevocable; (copyright
should be addressed)
• Community Licenses ***
– Generally credible, good track record with open
access and open source licenses
• Private Licenses **
– Must be considered individually; few private
licenses tested by time
Community Norms
• Public Domain ***
– Traditional method for scientific data sharing
(citation)
• Community Licenses ***
– Relatively new, but familiar to computer scientists
and open source community (attribution)
• Private Licenses **
– tendency to emphasize private / individual
interests rather than community norms
Overall Grade
• Public Domain ***
– Easiest and least restrictive form of sharing
• Community Licenses **
– Can be used to implement community
expectations, but can be burdensome / restrictive
• Private Licenses *
– High transaction costs, burdensome,
unpredictable
Convergence
CC0
• Released by Creative Commons in
2009
• Result of a 3-year policy exploration
process
• Not a license but a waiver of copyright
Why is it needed
• “Borderline” copyright
• European sui generis database rights
• Varying legal standards for copyright
protection in different countries
CC0
• [deed]
CC0
•
•
•
•
•
Waiver of copyright
Waiver of sui generis database rights
Waiver of “neighboring rights”
Does not affect trademarks or patents
Only affects rights of person making
assertion
Use Case #2
• Coordination and Sustainability of
International Mouse Informatics Resources
(CASIMIR) (EU Project)
• Commentary in Letter to Nature (Sept 2009)
recommends PD and use of CC0 for sharing
mouse genomic data
• Recommendations endorsed by scientists,
NIH representatives, Jackson Labs, and
editors of top scientific journals
Use Case #3
• Personal Genome Project personalized medicine project from
George Church lab
• Adopted CC0 to release sequence and
medical data collected from volunteers
Summary
• Solving some bioinformatics problems
require ability to integrate massive
quantities of data from diverse sources
• Public Domain sharing best fits this
need
• CC0 waiver can be used to enrich
public domain and provide clarity
Thank You
• Thinh Nguyen
([email protected])
• On the Web:
http://www.sciencecommons.org