PPT - CCSC - Consortium for Computing Sciences in Colleges
Download
Report
Transcript PPT - CCSC - Consortium for Computing Sciences in Colleges
A Data Mining Course for
Computer Science and non
Computer Science Students
Jamil Saquer
Computer Science Department
Missouri State University
Springfield, MO
Outline
Introduction
Motivation
Challenges
Design
of the Course
Topics Covered
Assignments
Examination Format
Conclusion
Introduction
What
is data mining (DM)?
non-trivial process of identifying valid, novel,
useful, and ultimately understandable patterns
in large volumes of data.
DM is an interdisciplinary topic
Has many things in common with machine
learning and pattern recognition
Motivation for the Course
Introducing
more electives
Introducing graduate level CS courses
Informatics Program
Interest to faculty members and students
from other departments
Author’s main area of research
Challenges in Designing the
Course
Diverse
student population
CS vs. non-CS
undergrad vs. grad
Solution
Informatics program in design stages
MNAS CS option is new
• Therefore, emphasis on undergrad CS students
Accommodating other students
Minimize prerequisites
CS 2 (or even CS 1)
Capable of using a DM software
Scientific background/mentality
• One from business, another from GGP
For grad CS students:
• project requires more research
• Tests could be a little different
Emphasize understanding basic DM concepts
and using software for mining data
Design of the Course
Used
book by Dunham
Book divided into 3 parts
About
1 week spent on definitions,
applications, motivations, challenges, …
Core of the course spent on core DM
subjects: classification, clustering, mining
association rules
Last week for project presentations
Classification
Assigning
objects to classes
supervised learning
Example:
classify a military vehicle as a
friendly or an enemy vehicle
Methods covered include: decision trees,
Naïve Bayesian, k-nearest neighbor,
backpropogation
Clustering
Grouping
objects into different classes
unsupervised learning
Example:
cluster Weblog data to discover
groups of similar access patterns
Techniques covered include: link
algorithms, nearest neighbor, k-means,
PAM, BIRCH, DBSCAN, CURE, ROCK
Association Rules
Finding
patterns that occur together
Example: diapers and beer are usually
bought together
Techniques covered: Apriori, sampling,
partitioning, FP-growth
Assignments
Students
need to learn how to mine data
One assignment on each core DM topic
apply two different algorithms on at least two
data sets, one has to be relatively large
can use any DM package (Weka)
Students
write a report
Students learn how to run an experiment
Term Project
Group
projects
Either provide a non-trivial implementation
of a DM algorithm
Or, learn about a DM topic not discussed
in class
Graduate students required to read at
least three research papers and to write a
report
All students present their project in class
Examination Format
Open
book
Two types of questions
First type, require basic knowledge of the
material
definitions, T/F, short answers
Second
type, apply certain algorithms on
small data sets
Conclusion
DM
is an interesting course for CS and
non-CS students
DM can be taught for non-CS students
A DM course can be taught for students
with minimal CS background
Questions