Text Categorization

Download Report

Transcript Text Categorization

CSC 594 Topics in AI –
Text Mining and Analytics
Fall 2015/16
9. Text Categorization
1
Text Categorization
• Text categorization is to classify texts into predefined target
category.
• Techniques for text categorizations are the same as those used in
data mining, such as decision tree, KNN etc.
• But with texts, data transformation step is the key – to select
features that are relevant to the target category.
2
Coursera, Text Mining and Analytics, ChengXiang Zhai
3
Coursera, Text Mining and Analytics, ChengXiang Zhai
4
Naïve Bayes Classifier
• Naïve Bayes classification (a generative classifier) has shown to
work well for text categorization.
• Naïve Bayes also scales up very well for large data.
5
Machine Learning, by Tom Mitchell
6
Machine Learning, by Tom Mitchell
7
Machine Learning, by Tom Mitchell
8
Machine Learning, by Tom Mitchell
9
Machine Learning, by Tom Mitchell
10
Machine Learning, by Tom Mitchell
11
Coursera, Text Mining and Analytics, ChengXiang Zhai
12
Parameter Estimation for
Logistic Regression
Coursera, Text Mining and Analytics, ChengXiang Zhai
13
Coursera, Text Mining and Analytics, ChengXiang Zhai
14
Coursera, Text Mining and Analytics, ChengXiang Zhai
15
Coursera, Text Mining and Analytics, ChengXiang Zhai
16
Coursera, Text Mining and Analytics, ChengXiang Zhai
17
SVM (cont.)
More information on SVM, including the kernel functions:
(my old CSC 578 notes)
http://condor.depaul.edu/ntomuro/courses/578/notes/SVMoverview.pdf
18
Coursera, Text Mining and Analytics, ChengXiang Zhai
19
Coursera, Text Mining and Analytics, ChengXiang Zhai
20