Transcript 1_Introx

Intro to Machine Learning
Mark Stamp
Intro
1
What is Machine Learning?
 Lots
of different definitions
o Statistical discrimination, where…
o “Machine” does the hard work (“learns”),
so we don’t have to think too much
 Often
applied to AI problems
o But actually very, very widely applicable
 Usually,
based on a binary classifier
 Often said to be “data driven”
Intro
2
What Can it do for Me?
 Machine
learning proven very powerful
o Practical and useful
 Successfully
applied to problems in…
o Speech recognition and NLP
o bioinformatics
o stock market analysis
o AI (robotics, computer vision, etc.)
o More and more and more apps all the time
Intro
3
Black Box Approach
 Machine
learning (ML) algorithm often
treated as a black box
o This is may be its main selling points!
 And
often works surprisingly well
o You can get good results even if you know
nothing about underlying algorithms
 But,
this can be limiting
o Especially wrt new and novel applications
Intro
4
Doctor Analogy
 NP
is a nurse with advanced training
 Physician has much more education
 Both diagnose, treat, and manage
patients’ problems
 Studies show that NPs can do about
80% to 90% of what physicians do
 But for the most challenging 10% to
20% of cases, a physician is required
Intro
5
Interesting, But What Does
That Have to Do with ML?
 ML
version of NP would have knowledge
beyond black box, but not too much
 ML version of physician would really
understand how/why things work
 Goal is for you to become ML Physician
 For doctors, most challenging 10% to
20% of cases are most interesting…
o …and the most lucrative!
Intro
6
Auto Mechanic Analogy
 The
majority of diagnosis work done
by auto mechanics is routine
o Easy to see what the problem is
o Separate from skill needed to fix it
 But,
there are some difficult cases
o Where no “cookbook” diagnosis will work
o Skill needed to analyze problem
o Requires understanding of inner workings
Intro
7
Machine Learning from
10,000 Feet
 Usually
focus on binary classification
 First, we train a model on set of
samples, all of which are “type A”
 Then given sample of unknown type
o Score the sample against the model
o If it scores high, classify it as “type A”
o Otherwise, classify it as “not type A”
 Key
Intro
ideas are training and scoring
8
Topics Covered in Detail
 Hidden
Markov Models (HMM)
 Profile Hidden Markov Models (PHMM)
 Principal Component Analysis (PCA)
o With Singular Value Decomposition (SVD)
 Support
Vector Machines (SVM)
 Clustering Basics
o Focus on K-means and EM clustering
 Data
Intro
analysis
9
Many Mini-Topics
 k-nearest
neighbor
 Neural networks
 Boosting (AdaBoost)
 Random Forests
 Linear Discriminant Analysis (LDA)
 Vector quantization
 Naïve Bayes and regression analysis
 Conditional Random Fields (CRF)
Intro
10
HMM
 We
cover HMMs in great detail
o More detail than any other ML technique
o You must implement HMM from scratch
o You must understand it very well
 We
contrast other techniques to HMM
 HMMs useful in many applications
 And the models often tell us something
o Not always the case for other ML
Intro
11
PHMM
 Like
HMM with positional information
 Conceptually simple, but can be complex
and/or problem specific in practice
 Widely used in bioinformatics
o And other applications where position
within sequence is critical information
 Have
been used successfully in security
research (IDS, malware detection)
Intro
12
PCA
 PCA
reduces dimensionality
 Training may seem complex
o But scoring is fast and efficient
o So, when the dust settles, PCA is actually
easy to apply in practice
 Singular
Value Decomposition (SVD)
(almost) synonymous with PCA
o SVD is one way to train a model in PCA
Intro
13
SVM
 SVM
has nice geometric
interpretation
o So, we can draw lots of pretty pictures
 In
SVM, we increase the dimension
o May seem strange, in contrast to PCA
 SVM
can be used like other ML
o But also ideal as a “meta” score to
combine multiple other scores
 Good
Intro
combination of theory, practice
14
Clustering
 Usually,
used for “data exploration”
o We cluster hoping to discern structure
from data that we know little about
o Observed structure may or may not be
meaningful --- can cluster anything
 We
consider 2 clustering techniques
o K-means
o EM (expectation maximization)
Intro
15
Mini Topics
 Neural
nets and random forests
o Both are very popular and useful
o Won’t spend too much time on either
 LDA
o Discuss this in some detail
o Close connections to PCA and SVM
 CRF
o Generalization of HMM
Intro
16
What about the Math?
 We
keep math to a minimum
o Course completely self-contained wrt math
 But
some math is unavoidable...
o HMM --- discrete probability
o PHMM --- comparable to HMM
o PCA --- linear algebra (eigenvectors)
o SVM --- calculus (Lagrange multipliers)
o Clustering --- statistics (Gaussian dist.)
Intro
17
Data Analysis
 Critical
to analyze data carefully
 Especially true in research, as we
want to compare to previous/other
work
 Often a major weakness in research
 We’ll discuss…
o Experimental design, cross validation,
accuracy, ROC curves, PR curves, the
imbalance problem, and so on
Intro
18
Applications
 Applications
mostly from security
o Malware detection or analysis --- HMM,
PHMM, PCA, SVM, and clustering
o Masquerade detection --- PHMM
o Image spam --- PCA and SVM
o Classic cryptanalysis --- HMM
o Facial recognition --- PCA
o Text analysis --- HMM
o Old Faithful geyser --- clustering
Blank
19
Bottom Line
 We
cover selected machine learning
techniques in considerable detail
 We discuss many applications, mostly
related to information security
 Goal is for students to gain a deep
understanding of the techniques
o And be able to apply them in security
o Even in new and novel applications
Intro
20