CS 601R: Advanced NLP
Download
Report
Transcript CS 601R: Advanced NLP
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
CS 679: Advanced NLP
Lecture #1: Introduction to Text
Mining
Objectives for Today
1.
2.
3.
4.
5.
Quick course info.
Overview of Text Mining
Discuss your applications of Text Mining
Elements of Text Mining
Introduce course objectives
Course Info.
Office Hours:
Tue & Thu. 3-4pm (without appointment)
OR by appointment
TA: TBD
Web page: https://facwiki.cs.byu.edu/cs679
Syllabus
Regularly updated schedule: Due dates, Reading
assignments, Projects guidelines, Lecture Notes
Google Group “BYU CS 679”
Email: ringger AT cs DOT byu DOT edu
Grades: http://gradebook.byu.edu
Assignments
Readings – with max. one page reports
Mostly research papers (see course web page for all hyperlinks)
Usually one reading report per week
Intro. Projects
Presentation
Report
Semester Project
Proposal
Presentation
Report
Course Policies
Early
Late
Grades
Other
See Syllabus for details
Text Mining
The process of discovering
previously unknown information
in large text collections
Paraphrased from M. Hearst
Other Definitions
Looking for patterns in unstructured text
(Nahm)
Text mining applies the same analytical
functions of data mining to the domain of
textual information (Doore(
“Search” versus “Discover”
Structured
Data
Unstructured
Data (Text)
Search
(goal-oriented)
Discover
(opportunistic)
Data
Retrieval
Data
Mining
Information
Retrieval
Text
Mining
Credit: adapted from slide by Nathan Treloar, AvaQuest
Your Exciting Applications
F2011: Your Exciting Applications
W2011: Exciting Applications
2010: Exciting Applications
2009: Exciting Applications
Additional Applications
News Mining
Sentiment Detection
Summarization
Trend Analysis
Association Detection
Course Objectives
Acquire experience conducting exploratory data
analysis on large collections of text
Gain in-depth experience with and understanding of
approaches to
document classification
sentiment classification
feature engineering
feature selection
document clustering
unsupervised topic identification
visualization, including document summarization
Build a foundation of techniques for approximate
Bayesian reasoning for unsupervised text analysis
Course Objectives (2)
Obtain experience with techniques for
evaluating and visualizing the results of
unsupervised learning processes
Independent investigation of methods of your
choice!
Application of your methods to learn
something important from a significant text
corpus of your choice
Simplistic Text Mining Process
Credit: NCSA
Methods
Feature Engineering
Feature Selection
Information Extraction
Categorization (Supervised)
Clustering (Unsupervised)
Topic Identification / Topic Modeling
Visualization
Some Available Data Sets
20 Newsgroups -- Usenet
Reuters (1990s) newswire
Del.icio.us bookmarked web pages
Enron Email
Movie Reviews
Gamespot game reviews
General Conference
State of the Union
Campaign Speeches
…
Yours!
Assignment
Reading for next time:
Course Syllabus
"Tapping the Power of Text Mining" by Fan et al.
(CACM 9/2006)
"Text-Mining the Voice of the People" by
Evangelopoulos et al. (CACM 2/2012)
Skim: Alta Plana Text Analytics Report
Reading Report #1
% Completed
Questions