Bulgariana Collections (2)

Download Report

Transcript Bulgariana Collections (2)

Ontotext Experience in Cultural Heritage
Bulgariana Collections in Europeana
Vladimir Alexiev, PhD, PMP
Mariana Damova, PhD
Conference "Europeana and the Bulgarian Institutions", Plovdiv, 3-4 Apr 2012
Semantic Technology Applicability to CH
• Best way to interconnect data. If the Web (1.0) is a giant hyper-linked document,
Semantic Web (3.0) is a giant linked data-base
• Unified, globalized and abstracted representation (RDF, RDFS, OWL2, RIF). Schema info
(metadata) is represented the same way as data
• Ontologies and schemas ensure metadata interoperability
(ESE, EDM, LIDO, CIDOC CRM, EADS, MODS…)
• Linked Open Data provides additional context
(DBpedia, GeoNames, FreeBase, WordNet, …)
• Thesauri ensure consistent vocabulary
(Getty ULAN, AAT, TGN; IconClass, VIAF, etc)
• adopts semtech for all future development (EDM). First White Paper "Knowledge =
Information in Context" looks at the key role of LOD
– "Linked data gives machines the ability to make associations and put search terms into context.
Without linked data, Europeana could be seen as a simple collection of digital objects. With linked
data, the potential is far greater"
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#2
Ontotext
• Ontotext is a Bulgarian company with 65 staff: Sofia, Varna, Ruse,
Asenovgrad, Innsbruck (AT), London (UK), Connecticut (US), Wellington (NZ)
• Started in 2000 as a research lab in Sirma Group. Spun off in 2008 with
investment from NEVEQ
• World-leader in semantic technologies. 360-degree semtech: repository
(OWLIM), text mining (KIM, GATE), web mining (WMF), Ontology and
Linked Data Management
• Most successful Bulgarian participant in EU FP 5,6,7 research projects (16
completed, 7 in execution). Received the prestigious Pitagoras award
• Revenue growth in the last 3 years: 210%. 5M BGL in 2011, over 7M
expected in 2012
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#3
Commercial Projects
• Commercial revenue grew 10x in last 3 years and is close to 2/3 of
total
• Data providers 27% (jobs, food, cars), Media/Publishing 26%,
Government 18%, Life Sciences 11%, Cultural Heritage 10%, Telecom
4%
• Technical topics range from core semtech to ontology design, master
data management, web services, SOA, business processes, eGov, etc
etc
• UK 59%, US 18%, Global 9%, BG 7%, IT 3%, KR 2%, MX 2%, DE, NL
• Regular SemTech training courses in London
• Great potential in Cultural Heritage so we want to focus on that
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#4
Clients Related to Media and Cultural Heritage
• Project clients: UK, KR, JP, SE, NL, BG
• Research projects executed by Ontotext
• Projects using OWLIM: EU, PL, JP, UK
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#5
Projects Related to Media and Cultural Heritage (1)
• British Broadcasting Corporation (BBC): Dynamic Semantic Publishing. The
World Cup (2010), BBC Sports (2011) and Olympics (2012) multi-sites run on top
of OWLIM. KIM-based Concept Extraction
• Press Association (UK): commercial image annotation and search, Concept
Extraction
• The National Archives (UK): Semantic KB and search for Government Web
Archive. 780M documents (150M after de-duplication), 10B facts
• British Museum (UK): ResearchSpace project funded by Mellon Foundation
(US): Collaborative web-based research for the cultural heritage scholarly
community. Based on the CIDOC CRM ontology
• de Bibliothek (NL): data aggregation from 150 national/local sources to
semantic format, unified search (40M objects)
• National Institute of Informatics (JP): Linked Open Data in Academia (LODAC):
aggregates museum and other data across multiple Japanese resources
6
Projects Related to Media and Cultural Heritage (2)
• Polish Digital National Museum (PL): aggregates artifacts from 70 contributing
cultural institutions
• PrestoSpace (FP6): Preservation towards storage and access. Standardized
Practices for Audiovisual Contents in Europe. Continuation: PrestoCenter.org
• MOLTO (FP7) : Multilingual Online Translation. Knowledge infrastructure,
interoperability between natural language and structured queries, museum
object descriptions in 15 languages. Based on the CIDOC CRM ontology
• Gothenburg City Museum (SE): 9K museum objects for use case of CH
knowledge representation that allows querying and presenting semantic search
results in natural language.
• Bulgariana (BG, KR): a Bulgarian aggregator for Europeana, including digital
repository for CH objects, semantic conversion (ESE, EDM), submission to
Europeana, and community building
7
Bulgariana
• A Bulgarian aggregator to Europeana that includes
– A public website for sharing information
– A wiki (Confluence) for discussion, technical materials, coordination and
collaboration
– A digital repository (DSpace) for storing and presenting digitized cultural
heritage
– Conversion/ingestion tools for converting objects to the required Europeana
formats: ESE and EDM (pilot)
– An OAI-PMH endpoint for serving content to Europeana
– Semantic search using OWLIM (in the future)
• Partners
–
–
–

BG-KR IT Cooperation Center: initial funding
Ontotext: initiative, semtech, Europeana contact
Sirma Media: digital repository
And we want you!
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#8
Collaboration and Networking
• Google Group "Cultural Heritage Digitalisation"
– Jointly created with SU FMI in Oct 2011
– 40 members, 80 messages in 5 months (still not a lot of activity…)
• Meetings
– 20121010: joint MS program in Digitalization (IMI BAS, UNIBIT).
Welcome by Ontotext, proposed to use Bulgariana as a platform
– 20120119: Restart(?) of expert working group (Ministry of Culture)
– 20120130: Europeana1 "Mission Possible" (Ontotext, Sofia University)
– 20120319: Europeana2: “Bulgarian projects for digitalization and presentation of cultural
heritage Europeana" (V.Tarnovo Regional Library)
 All presentations and contacts are published
– 20120305: “Workshop on Multilingual Digital Repositories and Services” (Sofia: ITD, VirtSOI,
DSLL, ATLAS, Share.TEC)
– 20120918: "Digitalization, Preservation and Presentation of Cultural and Scientific Heritage"
(DiPP 2012, organized by IMI BAS, hosted by V.Tarnovo Library)
– 201211xx: Europeana3 (Varna Regional Library)
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#9
Current Proposals
•
WSR4Europeana (web science research for Europeana): FP7 People (Marie Curie) Initial
Training Network (Multi-Partner ITN). Doctoral research, exchanges, training
–
–
–
•
SmartCulture: Regions of Knowledge 2012 cluster of clusters
–
–
•
International: Madrid, Basque, Birmingham, Siena, Eindhoven, Central Denmark, Sofia
BG cluster: Sofia Development Organization, Sofia University, UNIBIT, IMI-BAS, Ontotext, Tetracom, DSLL
Ontology-based Digital Platform for Knowledge Sustainability: ICT Call9
–
•
Partners: Humboldt U (DE), Tampere (FI), Aalto U (FI), FORTH (GR), RSLIS (DK), U Mannheim (DE), VU Brussel (BE),
NTUA (GR), Ontotext (BG), Seme4 (UK), Net7 (IT),
Associated: Europeana (NL), CNR ISTI (IT), U Carlos III (ES), CCS (DE), Tufts U (US)
Emerging fields, incl. semantic repositories for Europeana, semantic annotation
U Lyon, invited GeoCad93. Tentative
Balkan Wars: PSP Call6
–
–
Idea by PrimaSoft/SoftLib. Interest VTU, V.Tarnovo library , IMI BAS, Plovdiv Library
Need 6 international partners: Turkey, Serbia, Macedonia, etc.Tentative
•
Geographical Regions : PSP Call6. BAS, Austria, Romania, GeoCad93. Tentative
•
Slavonic Manuscripts: PSP Call6. BAS, tentative
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#10
Bulgariana Wiki
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#11
Bulgariana Collections (1)
•
Pra-historic and
Thracian
Civilizations
•
Unpublished
Thracian
archeological
objects. Prof.
Valeria Fol,
Center of
Thracology at the
Institute for
Balkan Studies at
the Bulgarian
Academy of
Sciences
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#12
Bulgariana Collections (2)
• Golden Pages from the Bulgarian Renaissance
– Unique manuscripts of Bulgarian folk songs collected in 19th century
by Miladinov Brothers, published in 2008 by Dr Luchia Antonova, Institute of
Bulgarian Language, BAS
МАРКО КРАЛЕВИКИ БОЛЕН СЕ КАИТ И СЕ
ИСПОВЕДВИТ
Поболил се Марко Кралевике,
що си лежал токму три години,
от нищо се иляч (1) не на’ож’ал.
И му рече негва стара майќа:
“Ай ти, Марко, ай ти, синко милий;
не си болен, синко, от господа,
тук си болен, синко, от гре’о’и,
да ти викна попой (2), ду’овници,
лепо да се синко исповедиш,
да си кажиш твоите гре’о’и!”
….
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#13
Bulgariana Collections Published to Europeana (1)
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#14
Bulgariana Collections Published to Europeana (2)
Ontotext experience in CH; Bulgariana collections in Europeana
3-4 Apr 2012
#15