lecture 1 - pantherFILE

Download Report

Transcript lecture 1 - pantherFILE

CS 657/790
Machine Learning and
Data Mining
Course Introduction
Student Survey
• Please hand in sheet of paper with:
• Your name and email address
• Your classification (eg, 2nd year computer
science PhD student)
• Your experience with MATLAB (none, some or
much)
• Your undergraduate degree (when, what, where)
• Your AI experience (courses at UWM or
elsewhere)
• Your programming experience
Course Information
• Course Instructor: Joe Bockhorst
• email: [email protected]
• office: 1155 EMS
• Course webpage:
http://www.uwm.edu/~joebock/790.html
• office hours: ???
•
Possible times:
•
•
•
•
before class on Monday (3:30-5:30)
Monday morning
Wednesday morning
after class Monday (7:00-9:00)
Textbook & Reading
Assignment
• Machine Learning (Tom Mitchell)
• Bookstore in union, $140 new
• Amazon.com hard cover: $125 new , $80 used
• Amazon.com soft cover: < $30
• Read (posted on class web page)
•
•
•
•
Preface
Chapter 1
Sections 6.1, 6.2, 6.9, 6.10
Sections 8.1, 8.2
Powerpoint Vs
Whiteboard
• Powerpoint encourages words over
pictures (not good)
• But powerpoint can be saved,
tweaked, easily shared, …
• Notes posted on course website following
lecture
• Your thoughts?
Full Disclosure
•
Slides are a combination of
1) Jude Shavlik’s notes from UW-Madison
machine learning course (Prof. I had)
2) Textbook Slides (Google “machine
learning textbook”)
3) My notes
Class Email List
• Is there one?
Course Outline
• 1st half covers supervised learning
• Algorithms: support vector machines,
neural networks, probabilistic models …
• Methodology
• 2nd half covers graphical probability
models
• Powerful statistical models very useful for
learning in complex and/or noisy settings
Course "Style"
• Primarily algorithmic & experimental
• Some theory, both mathematical & conceptual
(much on statistics)
• "Hands on" experience, interactive
lectures/discussions
• Broad survey of many ML subfields
•
•
•
•
•
"symbolic" (rules, decision trees)
"connectionist" (neural nets)
Support Vector Machines
statistical ("Bayes rule")
genetic algorithms (if time)
Two Major Goals
• to understand what a learning system
should do
• to understand how (and how well)
existing systems work
Background Assumed
• Programming
• Data structures and algorithms
•
• Math
CS 535
• Calculus (partial derivatives)
• Simple probability & statistics
Programming
Assignments in MATLAB
• Why MATLAB?
•
•
•
•
Fast prototyping
Integrated plotting
Widely used in academia (industry too?)
Will save you time in the long run
• Why not MATLAB?
• Proprietary software
• Harder to work from home
• Optional Assignment: familiarize yourself
with MATLAB, use MATLAB help system
Student Computer Labs
• E256, E280, E285, E384, E270
• All have MATLAB installed under
Windows XP
Requirements
• Bi-weekly programming plus perhaps some
“paper & pencil” homework
•
•
•
•
"hands on" experience valuable
HW0 – build a dataset
HW1 & HW2 supervised learning algorithms
HW3 & HW4 graphical probability models
• Midterm exam (after about 8-10 weeks)
• Final exam
• Find project of your choosing
•
during last 4-5 weeks of class
Grading
HW's
Project
Midterm
Final
Quality Discussion
25%
20%
20%
30%
5%
Late HW's Policy
• HW's due @ 4pm
• you have 5 late days to use over the
semester
•
(Fri 4pm → Mon 4pm is 1 late "day")
• SAVE UP late days!
•
extensions only for extreme cases
• Penalty points after late days exhausted
•
10% per day
• Can't be more than one week late
Machine Learning Vs
Data Mining
• Machine Learning: computer
algorithms that improve automatically
through experience [Mitchell].
• Data Mining: Extracting knowledge
from large amounts of data. [Han &
Kamber] (synonym: knowledge
discovery in databases (KDD))
What’s the difference?
Topics in ML and DM texts
(Mitchell Vs Han & Kamber)
Supervised learning, decision trees, neural nets,
Bayesian networks, k-nearest neighbor, genetic
algorithms, unsupervised learning (clustering in DM
jargon),…
reinforcement
learning, learning
theory, evaluating
learning systems,
using domain
knowledge,
inductive logic
programming, …
Data Warehouse,
OLAP, query languages,
association rules,
presentation, …
ML
DM
We’ll try to cover topics in red
The learning problem
• Learning = improving with experience
Improve over task T,
with respect to performance
measure P,
based on experience E
• Example: learn to play checkers
T: Play Checkers
P: % of games won
E: games played against self
Famous Example:
Discovering Genes
• T: find genes in DNA sequences
• ACGTGCATGTGTGAACGTGTGGGTCTGATGATGT…
• P: % of genes found
• E: experimentally verified genes
* Prediction of Complete Gene Structures in Human Genomic DNA,
Burge & Carlin J. Molecular Biology, 1997, 268 78-94
Famous Example 2:
Autonomous Vehicles Driving
• T: drive vehicle
• P: reach destination
• E: machine observation of human
driver
ML key to winning DARPA
Grand Challenge
Stanford team won 2005 driverless vehicle race
across Mojave Desert
“The robot's software
system relied predominately
on state-of-the-art AI
technologies, such as
machine learning and
probabilistic
reasoning.”
[Winning the DARPA Grand
Challenge, Thrun et al., Journal
of Field Robotics, 2006]
Why study machine
learning
?
(data mining)
• Data is plentiful
• Retail, video, images, speech, text, DNA,
bio-medical measurements, …
•
•
•
•
Computational power is available
Budding Industry
ML has great applications
ML still relatively immature
Next Time: HW0 – Create
Your Own Dataset
• Think about this
•
will need to create it by week after next
• Google to find:
•
•
UCI archive (or UCI KDD archive)
UCI ML archive (UCI machine learning
repository)
HW0 – Your “Personal Concept”
• Step 1: Choose a Boolean (true/false) concept
• Subjective Judgement
• Books I like/dislike
• Movies I like/dislike
• Web pages I like/dislike
• “Time will tell” concepts
• Stocks to buy
• Medical outcomes
• Sensory interpretation
• Face recognition (See text)
• Handwritten digit recognition
• Sound recognition
HW0 – Your “Personal Concept”
• Step 2: Choosing a feature Space
• We will use fixed-length feature vectors
• Choose N features
Defines a space
• Each feature has Vi possible values
• Each example is represented by a vector of N feature values
(i.e., is a point in the feature space)
e.g.: <red, 50, round>
color weight
shape
• Feature Types
•
•
•
•
Boolean
Nominal
In HW0 we will use a subset
Ordered
(see next slide)
Hierarchical
• Step 3: Collect examples (“I/O” pairs)
Standard Feature Types
for representing training examples
– source of “domain knowledge”
• Nominal
• No relationship among possible values
e.g., color є {red, blue, green} (vs. color = 1000 Hertz)
• Linear (or Ordered)
• Possible values of the feature are totally ordered
e.g., size є {small, medium, large} ← discrete
weight є [0…500] ← continuous
• Hierarchical
• Possible values are partially
ordered in an ISA hierarchy
e.g. for shape ->
closed
polygon
square
continuous
triangle circle
ellipse
Example Hierarchy
(KDD* Journal, Vol 5, No. 1-2, 2001, page 17)
Product
Pct
Foods
2302 Product
Subclasses
Dried
Cat Food
Tea
99 Product
Classes
Canned
Cat Food
Friskies
~30k
• Structure of one feature!
Liver, 250g
Products
• “the need to be able to incorporate hierarchical (knowledge
about data types) is shown in every paper.”
- From eds. Intro to special issue (on applications) of KDD journal, Vol 15, 2001
*
Officially, “Data Mining and Knowledge Discovery”, Kluwer Publishers
Our Feature Types
(for homeworks)
• Discrete
• tokens (char strings, w/o quote marks and
spaces)
• Continuous
• numbers (int’s or float’s)
• If only a few possible values (e.g., 0 & 1) use discrete
• i.e., merge nominal and discrete-ordered
(or convert discrete-ordered into 1,2,…)
• We will ignore hierarchy info and
only use the leaf values (it is rare any way)
Today’s Topics
• Creating a dataset of
fixed length feature vectors
• HW0 out on-line
• Due next Monday
Some Famous Examples
• Car Steering (Pomerleau)
Digitized
camera image
Learned
Function
Steering
Angle
• Medical Diagnosis (Quinlan)
Medical
record
•
•
•
•
•
•
age = 13
sex = M wgt = 18
Learned
Function
DNA Categorization
TV-pilot rating
Chemical-plant control
Back gammon playing
WWW page scoring
Credit application scoring
ill
vs
healthy
HW0: Creating your dataset
1. Choose a dataset
•
•
based on interest/familiarity
meets basic requirements
• >1000 examples
• category (function) learned should be
binary valued
• ~500 examples labeled class A,
other 500 labeled class B
→ Internet Movie Database (IMD)
HW0: Creating your dataset
2. IMD has a lot of data that are
not discrete or continuous or
binary-valued for target function
Name
(category)
Country
Name
Studio List of movies
Name
Director/
Year of birth
List of movies Producer
Actor
Made
Directed
Produced
Acted in
Movie
Year of birth
Gender
Oscar nominations
List of movies
Title, Genre, Year, Opening Wkend BO receipts,
List of actors/actresses, Release season
HW0: Creating your dataset
3. Choose a boolean or binaryvalued target function (category)
•
•
•
Opening weekend box office receipts >
$2 million
Movie is drama? (action, sci-fi,…)
Movies I like/dislike (e.g. Tivo)
HW0: Creating your dataset
4. How to transfer available attributes:
Other example attributes (select
predictive features)
•
Movie
•
Studio
•
•
•
Average age of actors
Number of producers
Percent female actors
•
•
•
Number of movies made
Average movie gross
Percent movies released in US
HW0: Creating your dataset
• Director/Producer
•
•
•
•
Years of experience
Most prevalent genre
Number of award winning movies
Average movie gross
• Actor
• Gender
• Has previous Oscar award or nominations
• Most prevalent genre
HW0: Creating your dataset
David Jensen’s group at UMass used Naïve Bayes (NB) to
predict the following based on attributes they selected and a
novel way of sampling from the data:
• Opening weekend box office receipts > $2
million
• 25 attributes
• Accuracy = 83.3%
• Default accuracy = 56%
• Movie is drama?
• 12 attributes
• Accuracy = 71.9%
• Default accuracy = 51%
• http://kdl.cs.umass.edu/proximity/about.html
What Do You Think
Machine Learning Means?
What is Learning?
Learning denotes changes in the system that
… enable the system to do the same task …
more effectively the next time.
- Herbert Simon
Learning is making useful changes in our minds.
- Marvin Minsky
Major Paradigms of
Machine Learning
• Inducing Functions from I/O Pairs
•
•
•
•
•
Decision trees (e.g., Quinlan’s C4.5 [1993])
Connectionism / neural networks (e.g., backprop)
Nearest-neighbor methods
Genetic algorithms
SVM’s
• Learning without a Teacher
• Conceptual clustering
• Self-organizing systems
• Discovery systems
Not in Mitchell’s
textbook (will spend
0-2 lectures on this –
but also in CS776)
Major Paradigms of
Machine Learning
• Improving a Multi-Step Problem
Solver
• Explanation-based learning
• Reinforcement learning
Will be covered briefly
• Using Preexisting Domain
Knowledge Inductively
• Analogical learning
• Case-based reasoning
• Inductive/explanatory hybrids