Learning from Data

Download Report

Transcript Learning from Data

Theme Introduction :
Learning from Data
Dr Gavin Brown
Machine Learning and Optimization Research Group
Learning from Data
The world is drowning in data.
Book sales : Amazon makes 250,000 sales/deliveries per day
Genetics : 100,000 genes sequenced while-u-wait (almost)
Search : ~10 billion Google Images / 48hrs per min uploaded to YouTube
Mobile phone market : 900 billion calls per year (location, duration, etc)
Health records : NHS plan to have 60m electronic records in place by 2015
This theme explores algorithms that enable us to extract meaning from data.
Learning from Data
Data is recorded from some real-world phenomenon.
What might we want to do with that data?
Prediction
- what can we predict about this phenomenon?
Description
- how can we describe/understand this phenomenon in a new way?
Optimization
- how can we control and optimize this phenomenon for our own objectives?
COMP61011
Machine Learning
& Data Mining
COMP61021
Modeling & Visualization
of High Dimensional Data
COMP61032
Optimization for Learning,
Planning & Problem Solving
Period 1
Oct/Nov
Period 2
Nov/Dec
Period 3
Feb/Mar
Prediction
Lecturer:
Dr Gavin Brown
Machine Learning and Data Mining
Spam emails
How can we predict if something is spam/genuine?
Machine Learning and Data Mining
Medical Records / Novel Drugs
What characteristics of a patient indicate they may react well/badly to a new drug?
How can we predict whether it will potentially hurt rather then help them?
Machine Learning and Data Mining
Handwriting Recognition
Google Books is currently digitizing millions of books.
Smartphones need to process non-European
handwriting to tap into the Asian market.
How can we recognize handwritten digits in a
huge variety of handwriting styles, in real-time?
COMP61011
Machine Learning
& Data Mining
COMP61021
Modeling & Visualization
of High Dimensional Data
COMP61032
Optimization for Learning,
Planning & Problem Solving
Period 1
Oct/Nov
Period 2
Nov/Dec
Period 3
Feb/Mar
Description
Lecturer:
Dr Ke Chen
Modeling and Visualization of High Dimensional Data
Gene Maps
The human body has about 24,000 active genes – soon you will be
able to buy your own gene map for a few hundred pounds.
How can we visualize this?
Modeling and Visualization of High Dimensional Data
Image processing
Gesture recognition – how can we represent the motion
of a human with so many complex joints and angles?
COMP61011
Machine Learning
& Data Mining
COMP61021
Modeling & Visualization
of High Dimensional Data
COMP61032
Optimization for Learning,
Planning & Problem Solving
Period 1
Oct/Nov
Period 2
Nov/Dec
Period 3
Feb/Mar
Optimization
Lecturer:
Dr Joshua Knowles
Optimization for Learning, Planning and Problem Solving
Packing Problems
How can we pack as many parcels as possible, when all parcels
are of different sizes, wasting the least amount of space?
Optimization for Learning, Planning and Problem Solving
Outsourcing of tasks to contractors
You have 25 tasks on a project, and 35 contractors.
Each contractor charges a different amount for each of the jobs.
Some contractors refuse to work with each other.
How can you match up task/contractors, getting as many
of your tasks done for as little cost as possible?
Optimization for Learning, Planning and Problem Solving
Online Dating
The eHarmony online dating site stores 4 Tb of data on 20m users.
The company uses proprietary algorithms to score that data against
29 'dimensions of compatibility’ to match up customers with the
best possible prospects for a long-term relationship.
Learning from Data
These modules are a THEME because they complement each other.
Prediction techniques need optimization to find the best parameter settings.
Optimization techniques can use prediction to figure out good ways to solve a very difficult problem.
Description techniques use optimization to set parameters, and are at the heart of some predictors.
Optimization
Prediction
Description
Learning from Data
Where does all this fit?
Statistics / Mathematics
Artificial Intelligence
Data Mining
Learning from Data
Computer Vision
Robotics
(No definition of a field is perfect – the diagram above is just one interpretation, mine ;-)
Learning from Data ….. Prerequisites
MATHEMATICS
This is a mathematical subject.
You must be comfortable with probabilities and algebra.
PROGRAMMING
You must be able to program, and pick up a new language relatively easily.
We use Matlab for the first 2 modules.
In the 3rd module, you may use any language.
http://www.cs.manchester.ac.uk/pgt/COMP61011
Module codes in this theme:
61011 (prediction)
61021 (description)
61032 (optimization)
Learning from Data ….. Why pick this theme?
July 11, 2007
http://tinyurl.com/12skills
12 IT skills that employers can't say no to
“Job hunters with these IT skills are assured of employment, now and in the future”
1. Machine Learning
2. etc…
…
12. C++, C#
August 19, 2010
http://tinyurl.com/3techmajors
“The following is my list of the Top Three hottest academic areas
for a future career in tech:”
1. Data Mining/Machine Learning/AI/Natural Language Processing
2. Business Intelligence/Competitive Intelligence
3. Analytics/Statistics – Web Analytics
Matlab
MATrix LABoratory
• Interactive scripting language
• Interpreted (i.e. no compiling)
• Objects possible, not compulsory
• Dynamically typed
• Flexible GUI / plotting framework
• Large libraries of tools
• Highly optimized for maths
Available free from Uni, but usable
only when connected to our network
(e.g. via VPN)
Module-specific software supported
on school machines only.
Textbooks
Not compulsory purchases. Notes will be provided in class.
“Introduction to Machine Learning”
By Ethem Alpaydin
“Combinatorial Optimization:
Algorithms and Complexity”
By Christos Papadimitrou