Logit Lab material borrowed from tutorial by William B. King Coastal

material borrowed from tutorial by
David Meyer
FH Technikum Wien, Austria
see: http://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdfl
# Start by loading relevant libraries:
# e1071
# mlbench
# If mlbench isn’t available then you will have to install it
Glass Dataset
# Retrieve/Access "Glass" data from mlbench package
data(Glass, package="mlbench")
#The description of the Glass data set is on the following slide
# Number of Attributes: 10 (including an Id#) plus the class
# attribute -- all attributes are continuously valued
Attribute Information:
1. Id number: 1 to 214
2. RI: refractive index
3. Na: Sodium (unit measurement: weight percent in corresponding oxide,
as are attributes 4-10)
4. Mg: Magnesium
5. Al: Aluminum
6. Si: Silicon
7. K: Potassium
8. Ca: Calcium
9. Ba: Barium
10. Fe: Iron
Class Information:
Type of glass: (class attribute)
vehicle_windows_non_float_processed (none in this database)
Create Training and Test Sets
# Create a row index
index <- 1:nrow(Glass)
# Create an index of test samples by randomly selecting 1/3 of the samples
testindex <- sample(index, trunc(length(index)/3))
# Create test set
testset <- Glass[testindex,]
# Create training set
trainset <- Glass[-testindex,]
Train the SVM model
# Train the svm model using:
"Type" (column 10) as the dependent variable,
cost = 100 as the penalty cost for C-classification
This is the ‘C’-constant of the regularization term in
the Lagrange formulation
gamma = 1 as the radial basis kernel function-specific parameter
svm.model <- svm(Type ~ ., data = trainset, cost = 100, gamma = 1)
Apply SVM Model
# Use the SVM to predict the classification for the testset
svm.pred <- predict(svm.model, testset[,-10])
# Compute the SVM confusion matrix
table(pred = svm.pred, true = testset[,10])
# determine accuracy
t = table(pred = svm.pred, true = testset[,10])
Optimize Parameters
# Approach: Grid search with 10-fold cross validation
# Note: a random mixing precedes the partitioning of the data
# Optimize parameters to the svm with RBF kernel
# The grid search iterates with gamma = 2^-4 through 2
# and cost = 2 through 2^7
# The returned object reports the best gamma & cost
# and the corresponding classification error
obj = tune.svm(Type~., data = Glass, gamma = 2^(-4:1), cost = 2^(1:7))
Optimize Parameters
# Inspect the results
# Note the results will very unless you set the seed for the
# random number generator which is used to mix the data
# before the partitioning
> obj
Parameter tuning of ‘svm’:
- sampling method: 10-fold cross validation
- best parameters:
gamma cost
0.0625 128
best performance: 0.2898268
Note: The performance is reported as the error
The accuracy is 1 – error, in this case 1- 0.2898268 = 0.7101732
