CSC 466: Knowledge Discovery From Data

Download Report

Transcript CSC 466: Knowledge Discovery From Data

CSC 466:
Knowledge Discovery From Data
New Computer Science Elective
Alex Dekhtyar
Department of Computer Science
Cal Poly
Outline
 Why?
 What?
 How?
 Discussion
Why?
Information Retrieval
Why?
Text Classification? Link Analysis?
Why?
Recommender Systems
Why?
Market Basket Analysis. Purchasing trends analysis.
Why?
Data Warehouse… and so much more…
Why?
Link Analysis
Why?
Cluster Analysis
Buzzwords
Data warehousing
Data mining
Market basket analysis
Web mining
Information filtering
Recommender Systems
Information retrieval
Text classification
OLAP
Cluster Analysis
Why?
As professionals, hobbyists and consumers
students constantly interact with intelligent
information management technologies
This is moving into the realm of
undergraduate-level knowledge
@Calstate.edu
CSU Fullerton: CPSC 483 Data Mining and Pattern Recognition
CSU LA: CS 461 Machine Learning
CS 560 Advanced Topics in Artificial Intelligence
CSU Northridge: 595DM Data Mining
CSU Sacramento: CSC 177. Data Warehousing and Data Mining
CSU SF: CSC 869 - Data Mining
CSU San Marcos: CS475 Machine Learning
CS574 Intelligent Information Retrieval
What?
 Undergraduate course
Informed consumers
Professionals
OLAP/Data Warehousing
Data Mining
Collaborative Filtering
Information Retrieval
Knowledge
Discovery
from Data
1 quarter = 10 weeks
What? (goals)
 Understand KDD technologies @ consumer
level
 Understand basic types of
 Data mining
 Information filtering
 Information retrieval
techniques
 Use KDD to analyze information
 Implement KDD algorithms
 Understand/appreciate societal impacts
What? (syllabus in a nutshell)
 Intro (data collections, measurement):
2 lectures
 Data Warehousing/OLAP:
2 lectures
 Data Mining:
 Association Rule Mining:
3 lectures
 Classification:
3 lectures
 Clustering:
3 lectures
 Collaborative Filtering/Recommendations:
2 lectures
 Information Retrieval:
4 lectures
CSC 466, Spring 2009 quarter
19 lectures
(= spring quarter)
How? (Alex’s ideas)
 Learn-by-doing....
 Labs: work with existing software, analyze data,
interpret
 Labs: small groups, implement simple KDD techniques
 Project: groups, find interesting data, analyze it…
 Need to incorporate “societal issues”: privacy
vs. data access, etc…
 Students to make informed choices
 Lectures
 Breadth over depth
 do a follow-up CSC 560 (grad. DB topics class)
How?
TODO List:
 Find data for labs and projects
 Investigate open source mining/retrieval software
 Figure out the textbook
 (Web Data Mining by Bing Liu
is promising)
How?
This slide intentionally left blank