After the preprocessing
Download
Report
Transcript After the preprocessing
College of Science & Technology
Dep. Of Computer Science & IT
BCs of Information Technology
Data Mining
Chapter 2_1: Data Preparation and Preprocessing
Case Study
2013
Prepared by: Mahmoud Rafeek Al-Farra
www.cst.ps/staff/mfarra
Course’s Out Lines
2
Introduction
Data Preparation and Preprocessing
Association Rules
Classification Methods
Evaluation
Clustering Methods
Mid Exam
Knowledge Representation
Special Case study : Document clustering
Discussion of Case studies by students
Consider the following instances
3
The documents before preprocessing are the following:
Document 1:
Document 2:
Palestine freedom requires all Muslims.
All Muslims must pray five times every day.
Palestinians and Muslims are persecuted by United Nations.
Freedom for Palestine.
Palestine is a holy land for all Muslims.
The legal right of Palestine for Muslims.
I am proud to be Muslim.
Document 3:
Support our legal rights to Palestine.
I am proud to be from Palestine.
After the preprocessing
4
After passing them on the preprocessing steps
many words will be removed
(ex.
Our, to, am, the, five and so on)
Others will be stemmed to their roots
(ex.
Muslims is stemmed to Muslim and persecuted to
persecute and so on).
After the preprocessing
5
Now, after the preprocessing steps the three documents will be
as the follows:
Document 1:
Document 2:
Palestin freedom requir all Muslim.
All Muslim pray.
Palestin Muslim persecut unit nation.
Freedom Palestin.
Palestin holy land all Muslim.
Legal right Palestin Muslim.
Proud Muslim.
Document 3:
Support legal right Palestin.
Proud Palestin.
Then … representation
6
One of Possible ways
item1
item2
item3
item4
Doc1
0
1
1
1
Doc2
1
1
1
1
Doc3
1
1
0
0
Doc4
0
1
1
0
Then our application uses each document as a vector
Thanks
7