Transcript Test

Interpreting: MDS, DR, SVM
Factor Analysis; and Boosting
Peter Fox
Data Analytics – ITWS-4600/ITWS-6600
Week 11a, April 12, 2016
1
This?
library(EDR) # effective dimension reduction
library(dr)
library(clustrd)
##### install.packages("edrGraphicalTools")
##### ? library(edrGraphicalTools)
demo(edr_ex1)
demo(edr_ex2)
demo(edr_ex3)
demo(edr_ex4)
2
Some examples
•
•
•
•
Lab8b_dr1_2016.R
Lab8b_dr2_2016.R
Lab8b_dr3_2016.R
Lab8b_dr4_2016.R
3
Spellman
4
MDS
• Lab8b_mds1_2016.R
• Lab8b_mds2_2016.R
• Lab8b_mds3_2016.R
• http://www.statmethods.net/advstats/mds.htm
l
• http://gastonsanchez.com/blog/howto/2013/01/23/MDS-in-R.html
5
Eurodist
6
You worked on these…
• Lab9b_svm1_2015.R –>
Lab9b_svm11_2015.R
• Lab9b_svm_rpart1_2016.R
• Karatzoglou et al. 2006 http://aquarius.tw.rpi.edu/html/DA/v15i09.pdf
• Who worked on this starting from page 9
(bottom)?
7
Ozone
> library(e1071)
> library(rpart)
> data(Ozone, package=“mlbench”)
>#
http://math.furman.edu/~dcs/courses/math47/R/library/mlbench/html/Ozone.
html # for field codes
> ## split data into a train and test set
> index <- 1:nrow(Ozone)
> testindex <- sample(index, trunc(length(index)/3))
> testset <- na.omit(Ozone[testindex,-3])
> trainset <- na.omit(Ozone[-testindex,-3])
> svm.model <- svm(V4 ~ ., data = trainset, type=“C-classification”,cost =
1000, gamma = 0.0001)
> svm.pred <- predict(svm.model, testset[,-3])
> crossprod(svm.pred - testset[,3]) / length(testindex)
See: http://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdf
8
Glass
library(e1071)
library(rpart)
data(Glass, package="mlbench")
index <- 1:nrow(Glass)
testindex <- sample(index, trunc(length(index)/3))
testset <- Glass[testindex,]
trainset <- Glass[-testindex,]
svm.model <- svm(Type ~ ., data = trainset, cost = 100, gamma
= 1)
svm.pred <- predict(svm.model, testset[,-10])
9
> table(pred = svm.pred, true = testset[,10])
true
pred 1 2 3 5 6 7
1 12 9 1 0 0 0
2 6 19 6 5 2 2
3 1 0 2 0 0 0
5 0 0 0 0 0 0
6 0 0 0 0 1 0
7 0 1 0 0 0 4
10
Example Lab9b_svm1_2016.R
n <- 150 # number of data points
p <- 2 # dimension
sigma <- 1 # variance of the distribution
meanpos <- 0 # centre of the distribution of positive examples
meanneg <- 3 # centre of the distribution of negative examples
npos <- round(n/2) # number of positive examples
nneg <- n-npos # number of negative examples
# Generate the positive and negative examples
xpos <- matrix(rnorm(npos*p,mean=meanpos,sd=sigma),npos,p)
xneg <- matrix(rnorm(nneg*p,mean=meanneg,sd=sigma),npos,p)
x <- rbind(xpos,xneg)
# Generate the labels
y <- matrix(c(rep(1,npos),rep(-1,nneg)))
# Visualize the data
plot(x,col=ifelse(y>0,1,2))
legend("topleft",c('Positive','Negative'),col=seq(2),pch=1,text.col=seq(2))
11
Example 1
12
Train/ test
ntrain <- round(n*0.8) # number of training examples
tindex <- sample(n,ntrain) # indices of training samples
xtrain <- x[tindex,]
xtest <- x[-tindex,]
ytrain <- y[tindex]
ytest <- y[-tindex]
istrain=rep(0,n)
istrain[tindex]=1
# Visualize
plot(x,col=ifelse(y>0,1,2),pch=ifelse(istrain==1,1,2))
legend("topleft",c('Positive Train','Positive Test','Negative
Train','Negative Test'),col=c(1,1,2,2), pch=c(1,2,1,2),
text.col=c(1,1,2,2))
13
Comparison of test classifier
14
Example ctd
svp <- ksvm(xtrain,ytrain,type="C-svc", kernel='vanilladot',
C=100,scaled=c())
> # For example, the support vectors
# General summary
> alpha(svp)
svp
[[1]]
[1] 71.05875 28.94125 100.00000
# Attributes that you can access
attributes(svp) # did you look?
> alphaindex(svp)
# For example, the support vectors [[1]]
[1] 10 74 93
alpha(svp)
alphaindex(svp)
> b(svp)
[1] -17.3651
b(svp)
# remember b?
# Use the built-in function to pretty-plot the classifier
plot(svp,data=xtrain)
15
16
SVM for iris
Anderson's Iris Data −− 3 species
2.5
3.0
3.5
4.0
0.5
1.0
1.5
2.0
2.5
7.5
2.0
4.0
4.5
6.0
Sepal.Length
5
7
2.0
3.0
Sepal.Width
1.5
2.5
1
3
Petal.Length
Petal.Width
0.5
17
4.5
5.5
6.5
7.5
1
2
3
4
5
6
7
SVM for Swiss
18
e.g. Probabilities…
library(kernlab)
data(promotergene)
## create test and training set
ind <- sample(1:dim(promotergene)[1],20)
genetrain <- promotergene[-ind, ]
genetest <- promotergene[ind, ]
## train a support vector machine
gene <- ksvm(Class~.,data=genetrain,kernel="rbfdot",\
kpar=list(sigma=0.015),C=70,cross=4,prob.model=TRUE)
## predict gene type probabilities on the test set
genetype <- predict(gene,genetest,type="probabilities")
19
Result
> genetype
+
[1,] 0.205576217 0.794423783
[2,] 0.150094660 0.849905340
[3,] 0.262062226 0.737937774
[4,] 0.939660586 0.060339414
[5,] 0.003164823 0.996835177
[6,] 0.502406898 0.497593102
[7,] 0.812503448 0.187496552
[8,] 0.996382257 0.003617743
[9,] 0.265187582 0.734812418
[10,] 0.998832291 0.001167709
[11,] 0.576491204 0.423508796
[12,] 0.973798660 0.026201340
[13,] 0.098598411 0.901401589
[14,] 0.900670101 0.099329899
[15,] 0.012571774 0.987428226
[16,] 0.977704079 0.022295921
[17,] 0.137304637 0.862695363
[18,] 0.972861575 0.027138425
[19,] 0.224470227 0.775529773
[20,] 0.004691973 0.995308027
20
kernlab
• http://aquarius.tw.rpi.edu/html/DA/svmbasic_
notes.pdf
• Some scripts: Lab9b_svm12_2016.R,
Lab9b_svm13_2016.R
21
These
• example_exploratoryFactorAnalysis.R on
dataset_exploratoryFactorAnalysis.csv (on
website)
– http://rtutorialseries.blogspot.com/2011/10/rtutorial-series-exploratory-factor.html (this was
the example skipped over in lecture 10a)
• http://www.statmethods.net/advstats/factor.ht
ml
• http://stats.stackexchange.com/questions/157
6/what-are-the-differences-between-factoranalysis-and-principal-component-analysi
22
• Do these - Lab10b_fa{1,2,4,5}_2016.R
Factor Analysis
data(iqitems)
#
data(ability)
ability.irt <- irt.fa(ability)
ability.scores <- score.irt(ability.irt,ability)
data(attitude)
cor(attitude)
# Compute eigenvalues and eigenvectors of the correlation matrix.
pfa.eigen<-eigen(cor(attitude))
pfa.eigen$values
# set a value for the number of factors (for clarity)
factors<-2
# Extract and transform two components.
pfa.eigen$vectors [ , 1:factors ] %*%
+ diag ( sqrt (pfa.eigen$values [ 1:factors ] ),factors,factors )
23
Glass
index <- 1:nrow(Glass)
testindex <- sample(index, trunc(length(index)/3))
testset <- Glass[testindex,]
trainset <- Glass[-testindex,]
Cor(testset)
Factor Analysis?
24
Bootstrap aggregation (bagging)
• Improve the stability and accuracy of machine
learning algorithms used in statistical
classification and regression.
• Also reduces variance and helps to avoid
overfitting.
• Usually applied to decision tree methods, but
can be used with any type of method.
– Bagging is a special case of the model averaging
approach.
• Harder to interpret – why?
25
Ozone
10 of 100 bootstrap samples
average
26
• Shows improvements for unstable
procedures (Breiman, 1996): e.g. neural nets,
classification and regression trees, and
subset selection in linear regression
• … can mildly degrade the performance of
stable methods such as K-nearest neighbors
27
Bagging (bootstrapping aggregation)*
library(mlbench)
data(BreastCancer)
l <- length(BreastCancer[,1])
sub <- sample(1:l,2*l/3)
BC.bagging <- bagging(Class ~., data=BreastCancer[,-1],
mfinal=20, control=rpart.control(maxdepth=3))
BC.bagging.pred <-predict.bagging( BC.bagging,
newdata=BreastCancer[-sub,-1])
BC.bagging.pred$confusion
Observed Class
BC.bagging.pred$error
Predicted Class benign malignant
[1] 0.04291845
benign
142
2
malignant
8
81
28
A little later
> data(BreastCancer)
> l <- length(BreastCancer[,1])
> sub <- sample(1:l,2*l/3)
> BC.bagging <- bagging(Class ~.,data=BreastCancer[,-1],mfinal=20,
+
control=rpart.control(maxdepth=3))
> BC.bagging.pred <- predict.bagging(BC.bagging,newdata=BreastCancer[sub,-1])
> BC.bagging.pred$confusion
Observed Class
Predicted Class benign malignant
benign
147
1
malignant
7
78
> BC.bagging.pred$error
[1] 0.03433476
29
Bagging (Vehicle)
> data(Vehicle)
> l <- length(Vehicle[,1])
> sub <- sample(1:l,2*l/3)
> Vehicle.bagging <- bagging(Class ~.,data=Vehicle[sub, ],mfinal=40,
+
control=rpart.control(maxdepth=5))
> Vehicle.bagging.pred <- predict.bagging(Vehicle.bagging,
newdata=Vehicle[-sub, ])
> Vehicle.bagging.pred$confusion
Observed Class
Predicted Class bus opel saab van
bus
63 10 8 0
opel
1 42 27 0
saab
0 18 30 0
van
5 7 9 62
> Vehicle.bagging.pred$error
[1] 0.3014184
30
Weak models …
• A weak learner: a classifier which is only
slightly correlated with the true classification
(it can label examples better than random
guessing)
• A strong learner: a classifier that is arbitrarily
well-correlated with the true classification.
• Can a set of weak learners create a single
strong learner?
31
Boosting
• … reducing bias in supervised learning
• most boosting algorithms consist of iteratively
learning weak classifiers with respect to a
distribution and adding them to a final strong
classifier.
– typically weighted in some way that is usually related to
the weak learners' accuracy.
• After a weak learner is added, the data is
reweighted: examples that are misclassified gain
weight and examples that are classified correctly
lose weight
• Thus, future weak learners focus more on the
32
examples that previous weak learners misclassified.
33
Using diamonds… boost (glm)
> mglmboost<-glmboost(as.factor(Expensive) ~ ., data=diamonds,
family=Binomial(link="logit"))
> summary(mglmboost)
Generalized Linear Models Fitted via Gradient Boosting
Call:
glmboost.formula(formula = as.factor(Expensive) ~ ., data = diamonds,
= "logit"))
family = Binomial(link
Negative Binomial Likelihood
Loss function: {
f <- pmin(abs(f), 36) * sign(f)
p <- exp(f)/(exp(f) + exp(-f))
y <- (y + 1)/2
-y * log(p) - (1 - y) * log(1 - p)
}
34
Using diamonds… boost (glm)
> summary(mglmboost) #continued
Number of boosting iterations: mstop = 100
Step size: 0.1
Offset: -1.339537
Coefficients:
NOTE: Coefficients from a Binomial model are half the size of coefficients
from a model fitted via glm(... , family = 'binomial').
See Warning section in ?coef.mboost
(Intercept)
carat clarity.L
-1.5156330 1.5388715 0.1823241
attr(,"offset")
[1] -1.339537
Selection frequencies:
carat (Intercept) clarity.L
0.50
0.42
0.08
35
Cluster boosting
• Assessment of the clusterwise stability of a
clustering of data, which can be cases x
variables or dissimilarity data.
• The data is resampled using several
schemes (bootstrap, subsetting, jittering,
replacement of points by noise) and the
Jaccard similarities of the original clusters to
the most similar clusters in the resampled
data are computed.
• The mean over these similarities is used as
an index of the stability of a cluster (other
statistics can be computed as well).
36
Cluster boosting
• Quite general clustering methods are
possible, i.e. methods estimating or fixing the
number of clusters, methods producing
overlapping clusters or not assigning all
cases to clusters (but declaring them as
"noise").
• In R – clustermethod = X is used to select the
method, e.g. Kmeans
• Lab on Friday… (iris, etc..)
37
Example - bodyfat
• The response variable is the body fat measured by
DXA (DEXfat), which can be seen as the gold
standard to measure body fat.
• However, DXA measurements are too expensive
and complicated for a broad use.
• Anthropometric measurements as waist or hip
circumferences are in comparison very easy to
measure in a standard screening.
• A prediction formula only based on these measures
could therefore be a valuable alternative with high
clinical relevance for daily usage.
38
39
bodyfat
## regular linear model using three variables
lm1 <- lm(DEXfat ~ hipcirc + kneebreadth + anthro3a, data = bodyfat)
## Estimate same model by glmboost
glm1 <- glmboost(DEXfat ~ hipcirc + kneebreadth + anthro3a, data =
bodyfat)
# We consider all available variables as potential predictors.
glm2 <- glmboost(DEXfat ~ ., data = bodyfat)
# or one could essentially call:
preds <- names(bodyfat[, names(bodyfat) != "DEXfat"]) ## names of
predictors
fm <- as.formula(paste("DEXfat ~", paste(preds, collapse = "+"))) ## build
formula
40
Compare linear models
> coef(lm1)
(Intercept) hipcirc kneebreadth anthro3a
-75.2347840 0.5115264 1.9019904 8.9096375
> coef(glm1, off2int=TRUE) ## off2int adds the offset
to the intercept
(Intercept) hipcirc kneebreadth anthro3a
-75.2073365 0.5114861 1.9005386 8.9071301
Conclusion?
41
> fm
DEXfat ~ age + waistcirc + hipcirc + elbowbreadth +
kneebreadth +
anthro3a + anthro3b + anthro3c + anthro4
> coef(glm2, which = "") ## select all.
(Intercept)
age waistcirc
hipcirc elbowbreadth
kneebreadth anthro3a anthro3b anthro3c
-98.8166077 0.0136017 0.1897156 0.3516258 0.3841399 1.7365888 3.3268603 3.6565240 0.5953626
anthro4
0.0000000
attr(,"offset")
42
[1] 30.78282
plot(glm2, off2int = TRUE)
43
plot(glm2, ylim = range(coef(glm2, which = preds)))
44
45
Other forms of boosting
• Gamboost = Generalized Additive Model Gradient boosting for optimizing arbitrary loss
functions, where component-wise smoothing
procedures are utilized as (univariate) baselearners.
46
> gam1 <- gamboost(DEXfat ~ bbs(hipcirc) + bbs(kneebreadth)
+ bbs(anthro3a),data = bodyfat)
> #Using plot() on a gamboost object delivers automatically the
partial effects of the different base-learners:
> par(mfrow = c(1,3)) ## 3 plots in one device
> plot(gam1) ## get the partial effects
# bbs, bols, btree..
47
48
Compare to rpart
> fattree<-rpart(DEXfat ~ ., data=bodyfat)
> plot(fattree)
> text(fattree)
> labels(fattree)
[1] "root"
"waistcirc< 88.4" "anthro3c<
3.42" "anthro3c>=3.42" "hipcirc< 101.3"
"hipcirc>=101.3"
[7] "waistcirc>=88.4" "hipcirc< 109.9"
"hipcirc>=109.9"
49
50
cars
51
iris
52
cars
53
54
55
Sparse matrix example
> coef(mod, which = which(beta > 0))
V306 V1052 V1090 V3501 V4808
V5473 V7929 V8333 V8799 V9191
2.1657532 0.0000000 4.8756163 4.7068006
0.4429911 5.4029763 3.6435648 0.0000000
3.7843504 0.4038770
attr(,"offset")
[1] 2.90198
56
57
Aside: Boosting and SVM…
• Remember “margins” from the SVM?
Partitioning the “linear” or transformed
space?
• In boosting we are effectively (not explicitly)
attempting to maximize the minimum margin
of any training example
58
Variants on boosting – loss fn
cars.gb <- blackboost(dist ~ speed, data = cars,
control = boost_control(mstop = 50))
### plot fit
plot(dist ~ speed, data = cars)
lines(cars$speed, predict(cars.gb), col = "red")
59
Blackboosting (cf. brown)
Gradient boosting for optimizing arbitrary loss functions where regression
trees are utilized as base-learners.
> cars.gb
Model-based Boosting
Call:
blackboost(formula = dist ~ speed, data = cars, control =
boost_control(mstop = 50))
Squared Error (Regression)
Loss function: (y - f)^2
Number of boosting iterations: mstop = 50
Step size: 0.1
Offset: 42.98
Number of baselearners: 1
60