Transcript 1_Introx
Intro to Machine Learning
Mark Stamp
Intro
1
What is Machine Learning?
Lots
of different definitions
o Statistical discrimination, where…
o “Machine” does the hard work (“learns”),
so we don’t have to think too much
Often
applied to AI problems
o But actually very, very widely applicable
Usually,
based on a binary classifier
Often said to be “data driven”
Intro
2
What Can it do for Me?
Machine
learning proven very powerful
o Practical and useful
Successfully
applied to problems in…
o Speech recognition and NLP
o bioinformatics
o stock market analysis
o AI (robotics, computer vision, etc.)
o More and more and more apps all the time
Intro
3
Black Box Approach
Machine
learning (ML) algorithm often
treated as a black box
o This is may be its main selling points!
And
often works surprisingly well
o You can get good results even if you know
nothing about underlying algorithms
But,
this can be limiting
o Especially wrt new and novel applications
Intro
4
Doctor Analogy
NP
is a nurse with advanced training
Physician has much more education
Both diagnose, treat, and manage
patients’ problems
Studies show that NPs can do about
80% to 90% of what physicians do
But for the most challenging 10% to
20% of cases, a physician is required
Intro
5
Interesting, But What Does
That Have to Do with ML?
ML
version of NP would have knowledge
beyond black box, but not too much
ML version of physician would really
understand how/why things work
Goal is for you to become ML Physician
For doctors, most challenging 10% to
20% of cases are most interesting…
o …and the most lucrative!
Intro
6
Auto Mechanic Analogy
The
majority of diagnosis work done
by auto mechanics is routine
o Easy to see what the problem is
o Separate from skill needed to fix it
But,
there are some difficult cases
o Where no “cookbook” diagnosis will work
o Skill needed to analyze problem
o Requires understanding of inner workings
Intro
7
Machine Learning from
10,000 Feet
Usually
focus on binary classification
First, we train a model on set of
samples, all of which are “type A”
Then given sample of unknown type
o Score the sample against the model
o If it scores high, classify it as “type A”
o Otherwise, classify it as “not type A”
Key
Intro
ideas are training and scoring
8
Topics Covered in Detail
Hidden
Markov Models (HMM)
Profile Hidden Markov Models (PHMM)
Principal Component Analysis (PCA)
o With Singular Value Decomposition (SVD)
Support
Vector Machines (SVM)
Clustering Basics
o Focus on K-means and EM clustering
Data
Intro
analysis
9
Many Mini-Topics
k-nearest
neighbor
Neural networks
Boosting (AdaBoost)
Random Forests
Linear Discriminant Analysis (LDA)
Vector quantization
Naïve Bayes and regression analysis
Conditional Random Fields (CRF)
Intro
10
HMM
We
cover HMMs in great detail
o More detail than any other ML technique
o You must implement HMM from scratch
o You must understand it very well
We
contrast other techniques to HMM
HMMs useful in many applications
And the models often tell us something
o Not always the case for other ML
Intro
11
PHMM
Like
HMM with positional information
Conceptually simple, but can be complex
and/or problem specific in practice
Widely used in bioinformatics
o And other applications where position
within sequence is critical information
Have
been used successfully in security
research (IDS, malware detection)
Intro
12
PCA
PCA
reduces dimensionality
Training may seem complex
o But scoring is fast and efficient
o So, when the dust settles, PCA is actually
easy to apply in practice
Singular
Value Decomposition (SVD)
(almost) synonymous with PCA
o SVD is one way to train a model in PCA
Intro
13
SVM
SVM
has nice geometric
interpretation
o So, we can draw lots of pretty pictures
In
SVM, we increase the dimension
o May seem strange, in contrast to PCA
SVM
can be used like other ML
o But also ideal as a “meta” score to
combine multiple other scores
Good
Intro
combination of theory, practice
14
Clustering
Usually,
used for “data exploration”
o We cluster hoping to discern structure
from data that we know little about
o Observed structure may or may not be
meaningful --- can cluster anything
We
consider 2 clustering techniques
o K-means
o EM (expectation maximization)
Intro
15
Mini Topics
Neural
nets and random forests
o Both are very popular and useful
o Won’t spend too much time on either
LDA
o Discuss this in some detail
o Close connections to PCA and SVM
CRF
o Generalization of HMM
Intro
16
What about the Math?
We
keep math to a minimum
o Course completely self-contained wrt math
But
some math is unavoidable...
o HMM --- discrete probability
o PHMM --- comparable to HMM
o PCA --- linear algebra (eigenvectors)
o SVM --- calculus (Lagrange multipliers)
o Clustering --- statistics (Gaussian dist.)
Intro
17
Data Analysis
Critical
to analyze data carefully
Especially true in research, as we
want to compare to previous/other
work
Often a major weakness in research
We’ll discuss…
o Experimental design, cross validation,
accuracy, ROC curves, PR curves, the
imbalance problem, and so on
Intro
18
Applications
Applications
mostly from security
o Malware detection or analysis --- HMM,
PHMM, PCA, SVM, and clustering
o Masquerade detection --- PHMM
o Image spam --- PCA and SVM
o Classic cryptanalysis --- HMM
o Facial recognition --- PCA
o Text analysis --- HMM
o Old Faithful geyser --- clustering
Blank
19
Bottom Line
We
cover selected machine learning
techniques in considerable detail
We discuss many applications, mostly
related to information security
Goal is for students to gain a deep
understanding of the techniques
o And be able to apply them in security
o Even in new and novel applications
Intro
20