Computational Analysis of Gene Expression

Download Report

Transcript Computational Analysis of Gene Expression

Computational Analysis of Gene
Expression
(CAGE)
Kristin Blitsch (CS/BB)
Ben Lucas (CS)
Sarah Towey (BB)
Advisors:
Liz Ryder (BB)
Carolina Ruiz (CS)
Motivation
To predict gene expression based on DNA
sequences.
Muscle Cell
Gene 3
Gene 1
Gene 2
Neural Cell
Gene 1
Gene 2
CAGE
On
Gene 3
Seam Cells
Gene 1
Gene 2
Off
Gene 3
Goals

To improve upon previous work:




Incorporate more data.
Obtain more accurate predictions.
To allow for more complex predictions.
Create an automated and user-friendly
system.
Overview
Motif Discovery
Sequences
Motifs
MEME & MAST
Model Building
Motifs
ARMiner
Sequences*
Model
*Test Sequences
Association
Rules
Builder/Tester
Model
Model Testing
Builder/Tester
Results
Acknowledgments



Chris Shoemaker – Modifications to ARMiner
Julia Mullen – Super Computer Liaison
Previous MQP Group
Thank you!

Questions?
Model Building

Association Rules

Motif1, Motif5, Motif8 => Neural
Motif18, Motif30 => Not Muscle

Confidence & Support



ARMiner*
Builder/Tester


Sorts Rules
Adds rules to model until accuracy decreases.
*L. Cristofor, C. Shoemaker
Model Testing



Choosing Test Sequences
Apply Rules
Predicted Expression equal to Actual Expression?
Sequence
Actual
Predicted
Matches
nhr-76
SeamCell
M
No
nhr-76
SeamCell
SeamCell
Yes
gpa-7
M
Cell Types
M
Yes
All Genes Pan-Neural ASK ASE OLL Body Wall Seam M Somatic
gpa-7
M28
Total Sequences
70
42SeamCell
36 31
20 No 14
10
4
Removed for Testing
15
5
9SeamCell
6
6
5 Yes 3
2
1
nhr-77
SeamCell
% Removed
21%
18%
21% 17% 19%
25%
21% 20% 25%
nhr-77
SeamCell
M
No
B0272.2
SeamCell
SeamCell
Yes
B0272.2
M
M
Yes
Results


Accuracy for each model
Comparison of Model Builders
MEBCS
CAGE
Model
1
M-Seam
M-Seam
ASE-ASKOLL
CBA
1st Class
CBA
1st Class
Repressors
0.4
Yes
Yes
No
Yes
Accuracy
0.2
0.7
0.75
0.521
0.818
Expected Randomly
0.5
0.5
0.33
0.33
0.8 Sequence
Builder Used
0.6
0
PanNeu
ASK
ASE
OLL
M
ASE-ASK-OLL
Seam
BodyWall
Conclusion


Goals Accomplished:
 To allow for classification among multiple cell

types.
 Incorporate more data.

 Obtain more accurate predictions.

 Create an automated and user-friendly

system.
Biological Significance