Populating the infrastructure the case of the Netherlands
Download
Report
Transcript Populating the infrastructure the case of the Netherlands
Populating the infrastructure
the case of the Netherlands
Hans Bennis
executive board of CLARIN-NL
Meertens Institute (KNAW)
CLARIN COORDINATORS
BUDAPEST, June 29-30
1
the start in 2009
• 9 million Euro for CLARIN-NL for the period 2009-2015 (requested
amount m€ 25)
• concentration on text (language data for humanities research)
• audio and video are left out, in contrast to the original proposal
• social sciences are not included, in contrast to the orginal proposal
• organizational structure: director, executive board, board, advisory
panels (national and international)
• substantial part of money will be spent in programmatic form
through Calls
• important goal / ambition: create broad support for CLARIN in
humanities research in the Netherlands
2
Projects 2009
• technical projects (centers, metadata, web services,
workflow, etc.)
• centers: Max Planck Institute for Psycholinguistics (MPI,
Nijmegen), Meertens Institute (Amsterdam), DANS (Den
Haag) and Institute for Dutch Lexicology (INL, Leiden)
• user survey
• Call-1 (Demonstrator Projects or Resource Curation
projects)
• 12 projects (+/- € 60.000 each)
– demonstrator projects
– data curation projects
3
Call-1 Projects
1) AAM-LR [UNijmegen/MPI] - Automatic annotation of
language resources
2) Adelheid [UNijmegen/MPI] – Lemmatizer for
Historical Dutch
3) Adept [UGroningen/Meertens] – Dialect Analysis
4) Duelme-LMF [UUtrecht/INL] – Multi-word expressions
5) INTER-VIEWS [UNijmegen/DANS] – Interviews of lifehistory of veterans
6) MIMORE [UUtrecht/Meertens] – Dialect
morphosyntax
7) SignLinC [UNijmegen/MPI] – Sign Language
4
Call-1 (more)
8) TDS Curator [UUtrecht/DANS] – Typological
Database
9) TICCLops [UTilburg/INL] – Text Clean-up
10) TQE [UNijmegen/MPI]Transcription evaluation
11) WFT-GTB [Fryske Akademy/INL] – Integration of
Dutch and Frisian dictionaries
12) CKCC [UUtrecht, Huygens Institute, DANS]
Correspondence of scholars in 17th century
5
Demonstration of the
Microcomparative Morphosyntactic
Research Tool
MIMORE
Sjef Barbiers, Matthijs Brouwer,
Jan Pieter Kunst, Folkert de Vriend
Meertens Instituut, 2011
6
Opening screen MIMORE
7
Research question
• The Standard Dutch [non-neuter] relative pronoun
and distal demonstrative has the form ‘die’ (that,
those).
• We know that there are dialects that have ‘dien’ as a
relative pronoun and/or as a distal demonstrative.
• We would like to know if there is a correlation
between ‘dien’ as a relative pronoun, ‘dien’ as a
demonstrative preceding a noun, and ‘dien’ as a
demonstrative in elliptical constructions.
• The linguistic question behind this search is what the
‘-n’ on ‘die’ is: case, phonologically determined, etc.?
8
Optional restrictions on the search
9
Search 1: DynaSAND with text string and tag
constructor: ‘dien’ as relative pronoun
10
Elements of search result
11
Specification of data resource
12
Corresponding sound fragment
13
Search 2: GTRP with demonstrative + N in test item
14
Elements of search result
15
Result of search 3: demonstrative ‘dien’ in
elliptical nominal groups in DIDDD
16
Available operations on search results
17
Map combining three search results
18
Map combiningtwo search results
19
Frequency maps
20
Creating the intersection of two sets of search
results
21
Export as Excel-file
22
Data exported
23
Complex search: More thanone database,
string of tags
24
CALL-2 (2011)
1) Arthurian Fiction [UUtrecht] - Curation of two
databases for literary research
2) C-DSD [UUtrecht/Meertens] Curation of Folksong
Database
3) COAVA [Meertens] bringing together five linguistic
databases (language variation/acquisition)
4) INPOLDER [UNijmegen/Meertens] Syntactic analysis
of historical Dutch
5) IPROSLA [UNijmegen/UAmsterdam/MPI] Sign
language databases
25
CALL-2 (more)
6) NEHOL [UNijmegen] – Curation of
Negerhollands database
7) VU-DNC [VU-Amsterdam] – corpus of Dutch
newspapers
8) WAHSP [UUtrecht] – Text mining in large
historical databases
9) WIP [NIOD] – Data curation of Dutch Second
World War database
26
developments
• collaboration with CATCH-programme
(programme to finance projects for teams of ict-developers,
humanities scholars and cultural heritage institutions)
– CLAVAS – vocabularies
– Persistent Identifiers
• Data Curation Service (>2011)
• Call 3 (call open now; projects in 2012)
• Agreement with Dutch Science Foundation (NWO) and
Royal Netherlands Academy of Science (KNAW) with
respect to CLARIN-norm for databases/tools in
humanities
• CLARIN-NL + DARIAH-NL => CLARIAH – Dutch Roadmap
27