Company Briefing - Community Capability Model

Download Report

Transcript Company Briefing - Community Capability Model

The journal as index and incentive for data publication
Myles Axton
Editor,
Nature Genetics
Cambridge
Oct 23rd 2011
Nature quality primary research in genetics and genomics
- Professional editors choose which papers to publish
- Peer referees provide technical guidance to improve the work
- Editorial standards and decision criteria are constantly revised in light of referee advice,
author comments, conference presentations and lab visits
- Standards of the journal constantly get higher
All Nature Research Journals use two basic criteria for decisions:
Novelty: new data, new resource
Conceptual advance: new ideas, new strategies
Plus:
Nature: Is this work of general interest to all scientists, to decision makers or to the
public?
Nature Genetics: How many other researchers will do their research differently as a
result of this work?
Scope of Nature Genetics
Common/complex diseases
Gene networks
Cancer
Human disease genetics
Pharmacological genomics
Epigenetics
Developmental genetics
Functional genomics
Stem cell genetics
Genetic technology
Genome evolution
- Risk
- Wiring
Results, data, funding and contributors
Footnotes and XML tags for contributor roles
Publisher metadata transmitted to popular sites
Better information and metrics for authors and funders
Citable data management plans by data producers
- Citable by unique digital object identifier (DOI) that points to current version.
- Previous versions stored and independently citable.
- Project descriptions (“marker papers”) grouped in a Collection together with funder policy
documents and community standards papers.
What conventions and accession codes do 8 journals enforce?
Annals of Human Genetics, Genetics in Medicine, Genome Research, Human Mutation,
Journal of Human Genetics, Journal of Medical Genetics, Nature Genetics, PLoS Biology
Grant and funder information
Competing financial interests
OMIM accession
HGNC gene names
Model organism database: genes
DNA sequence
Primer sequences
Expression array
LOVD deposition of human variants
Prepublication data sharing
Postpublication data sharing
NCBI/EBI deposition of human variants
Author contribution statement
HGVS allele nomenclature
6 by initials, 3 optional
5 publish, 1 if positive, 2 optional
4 mandate, 4 do not
4 throughout, 3 first use, 2 no obligation
4 mandate mouse, 3 all models
4 public deposit, 2 encourage, 2 no obligation
4 mandate publication, 4 no obligation
3 mandate, 2 encourage, 3 no obligation
3 endorse, 5 not policy
2 for review, 1 if referees, 1 no enforcement, 4 no policy
2 enforce, 1 policy, 5 no policy
2 mandate, 3 endorse, 3 not policy
2 by initials, 4 optional, 1 in supplementary information
1 throughout, 1 first use, 2 optional, 4 no policy
Pilot study in data publishing and citation
Erasmus Med. Cent.
Bharat Singh
Free U. Amsterdam
Paul Groth
Leiden U.
Med. Cent.
Herman van Haagen
Erik Schultes
Barend Mons
Ivo Fokkema
Johan den Dunnen
Thomson Reuters
Joel Hammond
Bruce Kiesel
Penn. State U.
Belinda Giardine
NPG
Myles Axton
Tony Hammond
Microattribution
 Journals provide citation links from paper to paper. Quantitative citation provides a
rough measure of usefulness.
 Database accessions should be traceable to their original source.
 Citations to database accessions in peer reviewed papers and in database
entries should be counted and the count made public.
 Microcitation provides incentive for collaborative genome annotation and a more
accurate picture of individual and group research activity.
 Microattribution can be extended to web traffic analysis.
Microattribution is an incentive to database deposition
Microattribution started
Human Variome Microattribution Review (Hemoglobin)
http://www.bx.psu.edu/~giardine/
RDF/XML triplet with provenance metadata
Supplementary Table 1 Nanopublications of the form [HGVS gene variant name][has][Variant frequency]
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:n="http://example/schema/"
>
<n:nanopublication rdf:about="#2000">
<n:triplet>
<rdf:Statement>
<rdf:subject
rdf:resource="http://example/id/NG_000007.3:g.70628G>A"/>
<rdf:predicate
rdf:resource="http://example/genetics/has-variant-frequency"/>
<rdf:object>0.25%</rdf:object>
</rdf:Statement>
</n:triplet>
<n:condition>Sardinian</n:condition>
<n:provenance>
<rdf:Description>
<n:researcherID>HbVar (A-2391-2010)</n:researcherID>
<n:DOI rdf:resource="http://dx.doi.org/10.1038/ng.785"/>
<n:PMID rdf:resource="info:pmid/6695908"/>
<n:PMID rdf:resource="info:pmid/1428944"/>
<n:PMID rdf:resource="info:pmid/1610915"/>
<n:linkout
rdf:resource="http://globin.bx.psu.edu/cgi-bin/hbvar/query_vars3?mode=output
&amp;display_format=page&amp;i=239;http://phencode.bx.psu.edu/cgi-bin/phenco
de/phencode?build=hg18&amp;id=HbVar.239"/>
</rdf:Description>
</n:provenance>
</n:nanopublication>
</rdf:RDF>
Our experiments suggest articles are a poor source of data
 Leiden Open Variation Database (LOVD)196,000 variants in 3,267 genes from 84,937 individuals
 Giardine et al. (ng.785) 1,901 unique variants in 37 genes affecting hemoglobin levels
1) Text search 10m PubMed abstracts with 4,940 gene variants of 11 genes in LOVD.
16 out of 10,000,000 abstracts contained gene variants from the search set
2) Search for 2,545 DMD variants in the full text of 109 articles on Duchenne muscular dystrophy
selected by LOVD curator Johan den Dunnen.
625 out of 2,545 variants were found in 22 out of 109 articles (2,565 total mentions)
3) Search all LOVD entries with PubMed PMIDs indicating Nature Genetics articles.
LOVD database linked a gene variant to 19 of 36 articles. Only 3 articles mention the variant in
unambiguous HGVS nomenclature in the text.
4) Text mining ng.785 for nanopublications concerning all biomedical concepts.
a) From the article, 13 out of a total of 698 nanopublications assert genetic variation
b) From the Supplementary Tables, 1,734 nanopublications of the form [HGVS
variant][has][OMIM VarID]
and 121 of the form [HGVS variant][has][variant frequency]
c) >40,000 nanopublications from the HbVarDB described by the article
Scholarly communication
Current
Nature Genetics 43, 281–283 (2011) doi:10.1038/ng0411-281
Proposed
Thank you!