Introduction to Machine Learning
Download
Report
Transcript Introduction to Machine Learning
4.30 Machine Learning
Pádraig Cunningham
Machine Learning Group
University College Dublin
2
Outline
Week 1
Introduction & General Overview of Matrix Decomposition
Nearest Neighbour Classifiers
Tutorial
Week 2: Neural Networks
Simple Perceptron, Backpropagation
Other Architectures: Hopfield, Self-Organising Maps
Tutorial
Week 3
Support Vector Machines
Kernel Methods & Evaluation
Tutorial
Week 4
Decision Trees
Naïve Bayes
Tutorial
Intro to ML
3
Outline
Week 5: Ensemble Techniques
Bagging
Boosting
Tutorial
Coursework
3-4 pieces, 15 hours, Weka & Java
Week 6: Unsupervised Learning
Hierarchical Clustering
Other Clustering Algorithms: k-Means, Spectral Clustering
Tutorial
Week 7: Dimension Reduction
Principle Components Analysis, LSI, SVD
Feature Selection
Tutorial
Later
2 revision tutorials
Intro to ML
4
Why Machine Learning
Recent progress in algorithms and theory
Loads of processing power
Computational power is available Growing flood of
online data
Amazon
Google
Intro to ML
5
3 niches for ML
Data mining: using historical data to improve decisions
Software applications that cannot be programmed by hand.
medical records medical knowledge
autonomous driving
speech recognition
i.e. weak theory domains.
Self customising programs
Personalised Newspaper
E-mail filtering
Intro to ML
6
Data-mining in medical records
Quality Assurance in Maternity Care.
http://svr-www.eng.cam.ac.uk/projects/qamc/qamc.html
Intro to ML
7
Rule Learning
The QAMC system uses Decision /trees (I think!)
It is also possible to extract rules from data:If
Then
No previous normal delivery, and
Abnormal 2nd Trimester Ultrasound, and
Malpresentation at admission
Probability of Emergency C-Section is 0.6
Over training dat 26/41 = 0.63
Over test data: 12/20 = 0.6
<Rule taken from Machine Learning by Tom Mitchell>
Intro to ML
8
Spam Filtering
For Machine Learning…
Lots of training data
High dimensionality data (lots of features)
Email is a diverse concept
Porn, mortgage, religion, cheap drugs…
Work, family, play…
Spam Filtering is a challenge because…
Arms race: spammers vs filters
False Positives are unacceptable
Spam is a changing concept
Intro to ML
9
ALVIN
Problems too difficult to
program by hand
Alvin drives at 70mph on
motorways
Intro to ML
10
Autonomous Vehicles
DARPA Grand Challenge 2005
Winner: Stanley from Stanford
Various modules use ML
Intro to ML
11
SmartRadio
Internet-based music radio
Personalised
Collaborative Recommendation
Content-Based
Recommendation
supported by knowledge discovery from log data
supported by feature extraction from sound files
feature seleciton
refinement
Intro to ML
12
Smart Radio
Smart Radio is a web
based client-server
music application
which allows listeners
build, manage and
share music
programmes
The project was set up to look at a possible model for:
The regulated distribution of music on the web
A personalised stream of music service
To provide an architecture and data to test our data mining and collaborative
filtering algorithms
Intro to ML
13
ML Dimensions
Lazy v’s Eager
k-NN
v’s rule learning
Supervised v’s Unsupervised
Symbolic v’s Sub-symbolic
Intro to ML