Text Mining Application Programming
Download
Report
Transcript Text Mining Application Programming
Text Mining Application Programming
Chapter 1 Introduction
Manu Konchady, 2006
Definition: Text Mining
all types of text processing that deal with finding,
organizing, and analyzing information.
(formal) the creation of new information that is not
obvious in a collection of documents.
New information is defined as a pattern, trend, or
relationship that can’t be easily gleaned by reading
individual documents.
The term document to refer to any unit of text, such as a
Web page, an e-mail, a formatted article, a set of slides, or
a plain text file.
Data Mining vs. Text Mining
Data mining deals with structured numeric
data, text mining deals with unstructured text.
Data used for data mining is extracted,
transformed, and loaded in a data warehouse.
Text mining attempts to build a model from
data that is assumed to be imprecise.
Origins of Text Mining
Information Retrieval
Natural Language Processing
Understanding Text
“Alice saw the rabbit with glasses,”
Polysemy
“In what state would you find Lincoln”
“free software”
Synonymy
More than one word can be expressed the same meaning.
Exuberant: lush, luxuriant, profuse, and riotous.
An Architecture for Text Mining
Applications
Text Mining Functions
Searching
Information Extraction
Clustering
Categorization
Summarization
Information Monitor
Question and Answer
A Layered Model
Text Mining Installation
Text Mine (http://textmine.sf.net) is a
collection of Perl modules and code on
SourceForge to index, cluster, classify, and
summarize text.
Usage
Command line
Web-based interface.
Web Interface