Multilingual Text Mining

Download Report

Transcript Multilingual Text Mining

Multilingual Text Mining
Bc. Jana Čupová, Bc. Jaromír Hloch
Introduction
•
•
•
Data miming
•
Analyzing data from warehouse
Text miming
•
•
Analyzing data from unstructured sources (text)
80 % of all electronic data is in textual form
Multilingual text miming
•
•
Text sources in different languages
Relevant information from multilingual text sources
A multilingual text mining approach to web crosslingual text retrieval
• Users want to search relevant information across different languages
• CLTR methods
• Artificial international language
• Manual vs. automatic construction of artificial international language
Current CLTR methods
• Principle: translation of search query
• Machine translation
• Knowledge-based methods using machines-readable dictionary
• Corpus-based methods using parallel corpus
Internet Users by Language
• English
• Chinese
• Spanish
• Japanese, Portuguese, German,
Arabic, French, Russian, Korean
• Total Internet Users – 2 099 926 965
Multilingual text mining
with Self-organized map
• Process for text mining with self-organized maps
•
•
•
Document preprocessing
Self-organized maps
The World Cluster Map and Document Cluster Map
Software for multilingual text mining
• IBM SPSS Modeler
• IBM SPSS Text Analytics
• Lexalytics Text Analytics
Sentence translation example
• Sentence: “There is no electricity, what happened?”
Thank you for your attention