EuroCRIS OpenAIRE2020 Task 8.3 Elly Dijk

Download Report

Transcript EuroCRIS OpenAIRE2020 Task 8.3 Elly Dijk

Data Archiving and Networked Services
Measurement of research
impact in OpenAIRE 2020:
via text mining or the CRISs?
Elly Dijk
Policy Advisor DANS
Project leader OpenAIRE2020 at DANS
EuroCRIS membership meeting
AMUE, Paris, 12 May 2015
DANS is an institute of KNAW and NWO
Outline
•
•
•
•
•
•
What is DANS?
Portal NARCIS
EU initiative OpenAIRE2020
Task 8.3: Research Impact Services
Text mining open access content
Preferred solution?
Data Archiving and Networked Services
Institute of
Dutch Academy
and Research
Funding
Organisation
(KNAW & NWO)
since 2005
Mission:
promote and
provide
permanent
access to digital
research
information
First predecessor
dates back to 1964
(Steinmetz
Foundation),
Historical Data
Archive 1989
DataverseNL
EASY
NARCIS
Content NARCIS
= National aggregator + making Dutch research
visible
• CRIS information: research projects, researchers,
scholarly institutes (including 8,400 projects
financed by the National Research Funder NWO)
• (Open access) publications from the repositories
of Dutch universities, Netherlands Academy,
NWO, and a number of research institutes;
• Research data from data archives, including EASY
OpenAIRE2020
• Open Access Infrastructure for Research in
Europe: Promotes Open Science
• Funded by Horizon2020 to develop and maintain
the infrastructure to support OA policy of the EU
• Network of over 500 repositories and open access
journals
• Access to 11 million open access publications and
7,000 data sets, 58,000 organizations, and
30,000 projects of two research funders
OpenAIRE2020 ambition
• From a Repository Network
to a European wide Research Information System
• Enhance interoperability of all research-cycle
related resources: link content together
(following a subset of CERIF) through its main
entities: publications, research data, projects,
people, organizations
• Support H2020 OA mandates
– 100% OA on scientific publications
– Research Data Pilot
• Implement Gold OA pilot
• Establish OpenAIRE legal entity
Task 8.3 Research Impact Services
• Athena Research and Innovation Center (ARC - task
leader), CNR (Italy) and DANS
• Realization of services for measurement of research
impact w.r.t. a research initiative
• Such services will identify relationships between
publications/datasets and a research initiative by
text mining
• Goal: visualizing statistics and measuring research
impact over time
• Use-case: pilots with selected National funding
agencies e.g. Dutch NWO
Text mining
• To find NWO funding informations in the
publications that are already in OpenAIRE
• This is done with text mining algorithms
• The publications that are in OpenAIRE: from
ArXiv, PMC Europe open set and the OpenAIRE
compliant institutional repositories
• Example: FCT - Fundação para a Ciência e a
Tecnologia is the major Portuguese Science
Funder
FCT - Fundação para a Ciência e a Tecnologia is the
major Portuguese Science Funder
Only use this slide to present a screenshot of an application.
As no style is applied, the screenshot can take up the whole
slide. For all other information please use the slide with
Open
preset style!
access
Only use this slide to present a screenshot of an application.
As no style is applied, the screenshot can take up the whole
slide. For all other information please use the slide with
preset style!
Information about the projects
•
•
•
•
•
•
PROJECT IDENTIFIER (MANDATORY)
PROJECT TITLE or ACRONYM (MANDATORY)
FUNDER NAME (MANDATORY) - NWO
START DATE (MANDATORY)
END DATE (MANDATORY)
FUNDING STREAM(S) (OPTIONAL) – funding
categories for more detailed statistics
• ORGANIZATION(S) INVOLVED (OPTIONAL)
First text mining
• DANS sent 8,451 research projects financed by
NWO from NARCIS to ARC in Athens
• ARC did the text mining in ArXiv and PMC Europe
publications
• Later ARC will start text mining the Dutch
repositories, starting with the University of
Amsterdam
Results
• 353 matches in EuropePMC (in 327 unique
publications - i.e. some publications had more
than one NWO project matches)
• 323 matches in ArXiv.org (in 286 unique
publications)
• Project identifier – URL of the publication in
Europe PMC or ArXiv
• But: In EuropePMC there are 900+ extra links
to NWO that where not in the NARCIS list and
appear to be valid matches
• What to do next?
Second text mining
• Partly NWO database identifiers instead of
“dossier numbers” known by the researchers
• From NWO list with 5,000 research projects since
2006: database identifiers and NWO dossier
numbers
• We matched this list with our list of 8,451
projects and sent it again for text mining to ARC
• Outcome so far: only about 100 extra
publications found in PMC and 14 extra
publications in ArXiv!
What to do next?
• Try to find out what kind of identifiers there are
in 900+ extra links in PMC
• Repeat the text mining with other identifiers?
• Text mining the Dutch repositories
And for the future: Use the CRISs!
Advantages of this pilot text mining
•Connection between research projects of funders
and open access publications becomes clear
•Possible: Measurement of research impact/to make
statistics/graphics
•Improve NARCIS by adding NWO dossier numbers
and the URLs of the publications in the project
descriptions
•NWO might improve the information about using
the “dossier number” by the researcher
Disadvantages of text mining
• You’ve to do it again and again and again
• What are the right identifiers?
• Where can you find the publications?
DANS and NWO want:
• To contribute to a strengthening of the national
research infrastructure
• To make it easier for the researcher to fill in his
research information only once
Future solution: Use the CRISs
• NWO demands that the “dossier number” will be
stored in the CRIS : projects and publications
• Describe exchange format – using CERIF
• Deliver all CRIS information to NARCIS
• Make a technical connection in NARCIS
• Make it easier for the researchers to give an
overview of their ‘NWO’ publications
• Deliver the information to OpenAIRE2020, or…
• When it's done: the information can be used for
measuring research impact in OpenAIRE
Easy to realize?
• No!
• Necessary: Communication with NWO and the
universities
• Appointments about national exchange formats
• Problem or right moment: most universities are
busy to change from Metis to Converis or Pure
• But in the end: everyone will benefit:
o DANS/NARCIS + NWO + universities/researchers
+ international community
Thank you for your attention
For more information please contact
[email protected]