Introduction to Machine Learning

Download Report

Transcript Introduction to Machine Learning

4.30 Machine Learning
Pádraig Cunningham
Machine Learning Group
University College Dublin
2
Outline

Week 1

Introduction & General Overview of Matrix Decomposition
 Nearest Neighbour Classifiers
 Tutorial

Week 2: Neural Networks

Simple Perceptron, Backpropagation
 Other Architectures: Hopfield, Self-Organising Maps
 Tutorial

Week 3

Support Vector Machines
 Kernel Methods & Evaluation
 Tutorial

Week 4

Decision Trees
 Naïve Bayes
 Tutorial
Intro to ML
3
Outline

Week 5: Ensemble Techniques




Bagging
Boosting
Tutorial
Coursework
3-4 pieces, 15 hours, Weka & Java
Week 6: Unsupervised Learning

Hierarchical Clustering
 Other Clustering Algorithms: k-Means, Spectral Clustering
 Tutorial

Week 7: Dimension Reduction

Principle Components Analysis, LSI, SVD
 Feature Selection
 Tutorial

Later

2 revision tutorials
Intro to ML
4
Why Machine Learning



Recent progress in algorithms and theory
Loads of processing power
Computational power is available Growing flood of
online data
 Amazon
 Google
Intro to ML
5
3 niches for ML

Data mining: using historical data to improve decisions


Software applications that cannot be programmed by hand.




medical records  medical knowledge
autonomous driving
speech recognition
i.e. weak theory domains.
Self customising programs


Personalised Newspaper
E-mail filtering
Intro to ML
6
Data-mining in medical records
Quality Assurance in Maternity Care.
http://svr-www.eng.cam.ac.uk/projects/qamc/qamc.html
Intro to ML
7
Rule Learning
The QAMC system uses Decision /trees (I think!)
It is also possible to extract rules from data:If
Then
No previous normal delivery, and
Abnormal 2nd Trimester Ultrasound, and
Malpresentation at admission
Probability of Emergency C-Section is 0.6
Over training dat 26/41 = 0.63
Over test data: 12/20 = 0.6
<Rule taken from Machine Learning by Tom Mitchell>
Intro to ML
8
Spam Filtering

For Machine Learning…



Lots of training data
High dimensionality data (lots of features)
Email is a diverse concept



Porn, mortgage, religion, cheap drugs…
Work, family, play…
Spam Filtering is a challenge because…



Arms race: spammers vs filters
False Positives are unacceptable
Spam is a changing concept
Intro to ML
9
ALVIN
Problems too difficult to
program by hand
Alvin drives at 70mph on
motorways
Intro to ML
10
Autonomous Vehicles

DARPA Grand Challenge 2005

Winner: Stanley from Stanford

Various modules use ML
Intro to ML
11
SmartRadio


Internet-based music radio
Personalised
 Collaborative Recommendation
 Content-Based


Recommendation
supported by knowledge discovery from log data
supported by feature extraction from sound files


feature seleciton
refinement
Intro to ML
12
Smart Radio

Smart Radio is a web
based client-server
music application
which allows listeners
build, manage and
share music
programmes
The project was set up to look at a possible model for:

The regulated distribution of music on the web

A personalised stream of music service

To provide an architecture and data to test our data mining and collaborative
filtering algorithms
Intro to ML
13
ML Dimensions

Lazy v’s Eager
 k-NN


v’s rule learning
Supervised v’s Unsupervised
Symbolic v’s Sub-symbolic
Intro to ML