bcs513_lecture_week4_class2

Download Report

Transcript bcs513_lecture_week4_class2

What does a classifier do?
Just like linear regression: find weights such that the
weighted sum of inputs matches a desired output
• The inputs are multivoxel activation patterns
• The desired outputs are the stimulus conditions which elicited
that activation
Then, apply a threshold to that output:
• If weighted sum is less than zero, then prediction is Class A
• If weighted sum is greater than zero, then prediction is Class B
Differences between classifiers are mostly a function of
how the weights are calculated
A toy example of a classifier,
using two feature dimensions
Sumo wrestlers
Classifier
boundary
Weight
Basketball
players
Height
A linear classifier draws a straight line
Sumo wrestlers
Classifier
boundary
Weight
Basketball
players
Height
Some dimensions can be more important than others
Sumo wrestlers
Classifier
boundary
Weight
Basketball
players
Height
A nonlinear classifier draws a not-straight line!
Sumo wrestlers
Weight
Classifier
boundary
Basketball
players
Height
Some patterns are more separable than others,
given a particular set of measurements
Sumo wrestlers
Faculty
Basketball
players
Height
Weight
Weight
Students
Height
Neural decoding, as it is usually done
From: Norman,
Polyn, Detre &
Haxby (2006),
Trends in CogSci,
10(9), 424-30
Cross-validation:
Training and testing sets
Cross-validation in everyday life
Replication in science (sort of)
Original study: training set
Replication attempt: testing set
Generalisation,
by classsifiers and by the brain
Overfitting in everyday life
http://content.time.com/time/specials/packages/article/0,28804,1856094_1856096_1856102,00.html
Problem:
With so many feature dimensions,
you can fit anything you want
Solution: Cross-validation
Divide the data into a training set
and a testing set
Classifier succeeds only if it is
able to find an underlying
regularity that is shared across
training and testing sets
Overfitting in 2-D
http://commons.wikimedia.org/wiki/File:Overfitting.svg
The multivariate approach in genetics
What’s the gene for disease X?
Unless X is a monogenetic disease, e.g.
Huntington’s, there is no single gene for X
You need to screen multiple genes
simultaneously, and try to predict disease
occurrence from that multivariate dataset
• E.g. Clarke R, Ressom HW, Wang A, Xuan J, Liu MC,
Gehan EA, Wang Y. (2008) The properties of highdimensional data spaces: implications for exploring
gene and protein expression data. Nature Reviews
Cancer. 8(1), 37-49.
Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA, Wang Y.
(2008) The properties of high-dimensional data spaces: implications for
exploring gene and protein expression data. Nature Reviews Cancer.
8(1), 37-49.