Training/testing/using networks on a the unix system - CBS

Download Report

Transcript Training/testing/using networks on a the unix system - CBS

Morten Nielsen,
CBS, BioCentrum,
DTU
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Neural Network training
• How
– Classification neural network
• Howlin
– Real value neural network
• Nnlinplayer
– Neural network player i.e. no training
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Neural network programs
• How and howlin clumsy but very fast
and efficient Fortran programs
• Three important files
– Parameter file; howlin.dat
– Data file
– Synaps (weight) file
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
How2doit
• Format of output file
• Plotting training and test
performance
– howlinplot fileout
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Output
• Neural networks can learn higher order
correlations!
– What does this mean?
0 0 => 0
0 1 => 1
1 0 => 1
1 1 => 0
No linear function can
learn this pattern
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Neural networks
w11
w12 w21
v1
w22
v2
w11=1, w12=-1 w21=1 w22=-1
V1 = 0.5 v2= -0.5
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Neural networks
• Use weight file(s) to generate neural network
predictions
• Format
– nnlinplayer synapsfilelist inputfile
• Makes consensus prediction over N neural
networks
• Input file must be generated separately
– seq2inp data
• Using pipes
– seq2inp data | nnlinplayet synlist --
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
nnlinplayer
• Classification network
• Generates input data directly from
sequence
– RIISSIEQKEENKGGEDKLKMIREYRQMVE
• Input is how files
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
how
•
•
•
•
•
•
fasta2pep
seq2inp
ranlines
splitfile
balanceset
xycorr
• Examples
fasta2pep ex.fsa | grep -v # | seq2inp -- |
grep -v # | ranlines -- | grep -v # | splitfile
-nc 4 -seq2inp data | nnlinplayer synlist -- |
grep -v # | args 1,3 | xycorr
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Useful programs
•
•
Copy all files from
– /usr/opt/www/pub/CBS/researchgroups/immunology/intro/NeuralNet
works/exercise/* to some directory
Open the file doit
– What does the program do?
– Run the program and save the output to a file named datafile
– Make a howlin neural network training
• Set the number of hidden neurons in the howlin2002.dat file to
0
• Run the training typing
– howlin2002 < howlin2002.dat > output
• Plot the training/test performance using the howlinplot
program
• Redo the training using 2 hidden neurons
– Check the synaps file. What are the weight values?
•
Do the prediction of T cell epitopes exercise
www.cbs.dtu.dk/courses/27485.imm/exercise5/index.php
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Exercises