- Cal State LA - Instructional Web Server

Download Report

Transcript - Cal State LA - Instructional Web Server

Bioinformatics Tools
for Genotyping
Frances Tong
Dr. Garry Larson, Ph.D
City of Hope
Department of Molecular Medicine
Southern California Bioinformatics Institute
Summer 2003
Funded by the National Science Foundation and the National Institutes of Health
1
Overview of Summer Program


Learn ASP and VBScript
Learn the biology

Programming Project I : writing code for
mining of online genetic data

Programming Project II : writing a program to
graph linkage disequilibrium data
2
Intro to ASP & VBScript

ASP : Microsoft Active Server Pages
* server generated web pages
* similar to CGI but easier
* works well with databases

VBScript : Microsoft Visual Basic Scripting
* scripting language to enhance
HTML web pages
* default language of ASP
3
Hello World!

Sample ASP file (one line only!)
<% response.write (“Hello, World!”) %>
4
Genetic Mapping of ASPs


ASPs : affected sibling pairs
Identification of genes associated with cancer in
patients and siblings who both have cancer
(breast, prostate, lung or colon)


Determine allele sharing statistics of
susceptibility genes
Look at gene-gene interactions
=> Provide information on a person’s genetic risk
of developing cancer
5
DNA Marker Genotyping


Genetic marker : polymorphic gene or
section of DNA that has identifiable
physical location on a chromosome used to
trace inheritance
Ex. Microsatellite and SNP markers
CDC2d Microsatellite Amplicon ~ 230bp
AGTCCCAAAGAAGATGAGAC agactaaaccatcaactggaagtgaaaaaaatatagtcattg
>>>>>>>>>>>>>>>>>>>>>>>>
Microsatellite repeat (ca)15
aaattaaaa cacacacacacacacacacacacacacacaca
cacacacacacacacacacacacacacaca ctaggtgaaacaactttatagatggaacaactc
ca
tacagaaaaagaattcatgaattggaaaattatagtggggaattcacatagaatgcatcacaaagagcaaaatgaatt
TCTAAGGGGCAAAGCAAAGCA
<<<<<<<<<<<<<<<<<<<<<<<<<<
6
Programming Project I:
Tag Selection For Markers

Need unique way to identify markers (like social
security numbers for people)

Chromosome locations are relative and change
frequently (UCSC)

Use ASP to automate data mining to ease the
generation of these unique 50 base-pair tags for
each marker in database

Tags will be used to locate markers in genome
7
UCSC Genome Browser
8
Marker Tag Selection
Submit accession
number for
microsatellite
Submit accession
number for snp
Submit sequence
surrounding simple
repeat
9
Output
chromosome
Sequence
Sequence
startend
Link to UCSC browser
position
position
Inputted sequence with
repeats highlighted in blue
10
Choosing a 50bp tag
Copy and paste here
Send sequence to UCSC
11
UCSC Blat Results
Blat is similar to BLAST :
searches for alignment in genome
12
List of markers and their tags
13
Convert to FASTA format
FASTA format:
>name
sequence
program converts marker tag file
into fasta format automatically
14
Check tag selection
Program sends fasta file to UCSC Blat
15
Linkage Disequilibrium
A condition where two
polymorphisms are found
together on the same
chromosome at a greater
frequency than that predicted
from the product of their
individual frequencies.
16
5’
3’
G/A
T/C
G : 0.88
T : 0.75
A : 0.12
C : 0.25
5’
G
T
5’
Two snps and their
base frequencies
3’ (0.88)(0.75) = 0.66
3’ (0.88)(0.25) = 0.22
G
C
5’
3’ (0.12)(0.75) = 0.09
A
T
5’
3’ (0.12)(0.25) = 0.03
A
C
17
Expected frequencies
IF observed frequencies of 2 variants
together > expected frequencies
=> LINKAGE DISEQUILIBRIUM
Expected
Frequencies
Observed
Frequencies
G&T
0.66
0.54
G&C
0.22
0.20
A&T
0.09
0.24
A&C
0.03
0.02
A and T together are in linkage disequilibrium
18
A Quantitative Measure of LD



One of the most common measures of
2
linkage disequilibrium is
It is a squared correlation coefficient =>
the correlation of alleles at two sites.

 1
2
Special case:
(“perfect LD”)
~ Exactly two out of the four possible
haplotypes are observed.
~ Markers NOT separated by recombination
19
Programming Project II


Program that helps visualize linkage
disequilibrium by graphing scores such as
Each pair of markers has such a score =>
pairwise comparisons

2
Marker 1 Marker 2 Marker 3

0.7
Marker 1
2
Marker 2
0.7
Marker 3
1
1
0.2
0.2
Symmetric!
20
Sample data for graphing
Read data by row:
Pairwise comparison of
marker 1 and marker 7
results in two different kinds of
measurements
21
GOLD – Graphical Overview of
Linkage Disequilibrium

Existing program from the Univ. of Michigan
to graph linkage disequilibrium
http://www.sph.umich.edu/csg/abecasis/GOLD/

Graphs based on a chromosomal position
scale

Works very well for long range pattern
analysis, but hard to distinguish each specific
measurement.
22
Comparison of Program Output
Same input file
Output from GOLD
Output from LD Color
(my program)
Difficult to see individual points on graph
23
Easier to distinguish individual points
LD Color Program

Program written in ASP to graphically depict
linkage disequilibrium in human genetic data

Color coded for specific numerical ranges of
different measures of each pair-wise
comparison of markers

Complete program:
4 files ; >1,000 lines of code
24
Program Features







Data input : file uploading or text pasting
Allows for variable file formats for input
User defined colors and ranges
Switch between different measures of LD
View actual data on graph or just the colors
Change size of graph
Option to select specific rows of data
25
Upload your file
Paste data
26
Specify marker columns
27
Choose label for numerical data inputted
28
Choose measure of
Specify which column the data is located
linkage disequilibrium
29
Same as before => used to specify data
for other side of diagonal
30
Choose to display data on graph
31
Choose different sizes for the graph
32
Select only the markers you want
graphed by choosing rows
Default : all are graphed
33
Specify the ranges for the
colors you want graphed.
34
Manual
35
Color Legend
36
Sample: Symmetric
37
Sample: Big Size!
38
Sample: Data On, Asymmetric
39
Sample: Row Select
40
Future Directions

LD Color




Mouseover tag to each cell on graph to show
marker id (Javascript)
Ability to accept more kinds of file formats
Better form validation and error checking
More functionality and linking to outside sources
41
Acknowledgements








Dr. Garry Larson, Ph.D
Dave Ko City of Hope Senior Programmer Analyst
Louis Geller City of Hope Senior Research Associate
Dr. Ted Krontiris, M.D.,Ph.D Principal Investigator
The rest of the Krontiris Lab
Southern California Bioinformatics Institute:
Dr. Jamil Momand, Dr. Nancy Warter-Perez,
Dr. Sandra Sharp & Dr. Wendie Johnston,
Jackie Leung & rest of SoCalBSI staff
Fellow interns
NSF & NIH
42