Bioinformatics and the Language of DNA A. Tozeren

Download Report

Transcript Bioinformatics and the Language of DNA A. Tozeren

Bioinformatics and the
Language of DNA
Aydin Tozeren
[email protected]
Center for Integrated Bioinformatics
Drexel University, Philadelphia,
PA, USA
www.gpba-bio.com
Yeni Hayat/ New Life by Orhan
Pamuk
• Bir kitap okudum butun hayatim
degisti
• Read a book and my life has changed
forever
Living systems have the same building blocks:
C, N, H, O, P, Su, minerals.
Five different macromolecules: DNA, RNA,
Protein, Carbohydrates, Fat (lipid)
Information Flow: DNA to RNA (template) to
Protein back to DNA
DNA has four basic building blocks, arranged
in a sequence.
Proteins have 20 building blocks, arranged in
various orders in a linear chain.
Proteins: Molecular Machines
• There are about 40,000 types of proteins in a cell.
• Proteins change configuration upon actively
resulting in movement and motion. They are
responsible for heart beat, neuron firing,
muscular movement, etc.
DNA can be viewed as a long string of
four letters in various combinations:
ACGTTACCGCGCTCA.....
Billions of letters with no coma or period,
just arranged serially.
Genome: Collection of DNA in the
nucleus of a cell.
Next Generation Sequences can human
genome in 6 weeks.
BACTERIA has single (circular)
DNA organized into OPERONS
EUKARYOTES (plants, yeast,
mammals) have DNA organized
into chapters (single linear DNA
molecules or chromosomes).
DNA : Book of Life
Each and every cell in the body has
the same book of life
DNA is the hard drive and the
information storage unit of the living.
Cells from different tissue types may
use (read) different sections (pages) of
the DNA (book of life).
DNA various only so slightly between
individuals in a species.
The sequence of letters along DNA is
similar among species such as dogs,
human, monkey, and even mouse.
Gene is a segment of DNA that
provides a recipe for a protein.
typically it is 300 letters (nucleotides)
but can be much longer. A three letter
along a gene (CODON) represents
an amino acid, one of the 20 building
blocks of proteins.
gatcaggtcc ttatgatgac agattggggc ccactttgtt gtgctttttc
ttattggttg ctgtcattat caactttata ttaagattga agtacaatga
cgctaacact aagttatgaa attgtaattc caatatcgta agcgtgggtt
acgcacaaac tgtattttca agatgctcac aaataattta gtttcatata
tacgcatata tagaaagtat ccatctatag gtaatcatga acaataaaaa
tattcacgtt tcaggagcta ttgtttgtac tcattacgtt
tttggatatc aagttgaaaa tcagcccctt tcactagata tcaagcgcta
taaaaaaatt ttaatttcga tgaggcatct ttcttttctc ttgtggctat
gtaagcctaa gaagccgttt acacatcaat gataaataag
tatacaaaaa gggttccatt ttttttttgg ccgctaccgg
actagcaagg gcctaatggt acgctgagcg tagtacaacc
aagcgcttgt
translation="MSDAVTIRTRKVISNPLLARKQFV
VDVLHPNRANVSKDELREKLAEVYKAEKDA
VSVFGFRTQFGGGKSVGFGLVYNSVAEAKKF
EPTYRLVRYGLAEKVEKASRQQRKQKKNRD
KKIFGTGKRLAKKVARRNAD"
/
EUKARYOTE GENE
SEQUENCES ARE HIGHLY
CONSERVED BUT VIRAL
SEQUENCES VARY WITH TIME
Sequence measurement is fast
and accurate with next generation
sequencing
Will Dampier: SNP Islands on HIV-1 Genome
mutations vary from one HIV-1 protein after another. Less the mutation
density more important the protein is for viral survival.
Will Dampier: Homology Islands Along HIV-1
Genome GenomGenome
Will Dampier, Invariant Sequences
on HIV-1
• Original Seqquence
• AAGGAGAGAGATGGGTGCGAGAGCGTC
AATAAAATAGTAAGAATGTA TAGAAGAAATGATGACAGCATG
CAGGCTAATTTTTTAGGGAA ACAGGAGCAGATGATACAGT
ATGGAAACCAAAAATGATAGG ATTGGAGGAAATGAACAAGT
TTAGCAGGAAGATGGCCAGT ATTCCCTACAATCCCCAAAG
CACAATTTTAAAAGAAAAGGGGGGATTGGGGGG
TACAGTGCAGGGGAAAGAATA AAAATTCAAAATTTTCGGGT
TTCAAAATTTTCGGGTTTATT AATTTTCGGGTTTATTACAG
CTCTGGAAAGGTGAAGGGGCAGTAGTAAT
eukaryotic linear binding motifs
(ELMs): ISNP, LLARKQ, FVVDV
4 - 7 amino acid long sequence segments
of a protein. it has been shown to be
involved in binding interactions with other
proteins in the same species.
there are about 130 such motifs conserved
across eukaryotes
Yichuan Liu: Domains/Motifs stabilities
across Eukaryotics
Perry Evans: Conserved ELM locations
on HIV-1 ENV
Will Dampier: Patient Progression
Will Dampier: Predictive Ability
Will Dampier: ELM Locations
HIV1 Sequence Motifs
Conserved regions can be targeted with micro RNA
to prevent HIV-1 from multiplying.
Presence of Linear Binding motifs at certain
locations in the alignment is correlated with the
severity of the disease.
Given the sequence of a virus, we can predict
motifs on the sequence relevant to clinical
outcome
CROSSTALK
[email protected]
www.gpba-bio.com
• VIRUS/HOST CELL
• CELL-CELL
• CELL EXTRACELLULAR MATRIX
Kar/ Snow –Orhan Pamuk
• Kadife, Lacivert ve Tiyatro trubu Karsta
• People in Kars becomes obsessed with a visiting
theatre group, anticipating and watching a new
show every night. Talk of the town.
Virus proteins
• Virus proteins hijack binding motifs found in the
host cells.
• One mode of protein interaction is due to binding
of an ELM on one protein to a domain on the
other.
Evans: Conserved ELM locations on
HIV-1 NEF
HIV Host Protein Interactions
Evans: Host-pathogen KEGG proj.
Given Viral Sequence
• We can determine with good accuracy the host
proteins targeted by the virus.
• One can then search for optimized therapies for
the virus.
Yichuan Liu: Predicting Protein Interactions
Yichuan Liu: Heat Graph of PPI separated by BP GO
terms in 5 different Cell Compartments
Machine learning
What are the sequence motifs that are enriched in
proteins known to interact with other proteins?
Answer: ELMs and Counter Domains explain only
20% of known protein interactions. Therefore the
language and grammar of crosstalk between two
proteins are yet to be discovered.
Microarray Chips
A slide with thousands of dots with each dot sticky for a
product of a gene higher the number of gene product
copy shinier is the dot. Question is which subset of
genes we should use to differentiate between disease
subtypes?
Noor: Examples of Gene Enrichment
each microarray experiment provides
thousands of values for the activities
of genes at a specific tissue at a given
time
Adam Ertel: Switch-like gene expression
Adam Ertel: Switch-like genes involved in cell
communication pathways
Adam Ertel: Clustering tissue by switch high/low
state
Michael Gormley: Model- based clustering
of tissue type
KMeans
1265 Bimodal Genes
ECM-MEM Bimodal Genes
Hierarchical
Model-Based
Michael Gormley: Bimodal genes expressed
in the “on” mode in specific tissues
Michael Gormley: Bimodal genes expressed
in the “on” mode in HIV infection
Michael Gormley: Model- based clustering
of infectious disease
Michael Gormley: Simulation of supervised
classification with bimodal genes
•
•
•
•
•
•
Parameters
Effect size (μ1-μ2/σ2)
Regression coefficients (β)
Number of samples (n)
Number of genes (p)
Number of significant genes (M)
Number of selected features (N)
Mahdi Sarmady: Sample Model
1.Species and interaction initial value: inactive
2.Effect of TLR1/TLR2 activation by binding of a bacterial
lipoprotein
Noor: Model and Data Description
• Main Model Description:
Comprises of 78 flux equations used to simulate the kinetics of 98
variables found in blood, cytoplasm, mitochondrial intermembrane
space and matrix
o Simualtions take into consideration significant bindings of reactants
to the cations H+, K+ and Mg2+, and are therefore pH sensitive
o
• Data Description:
o
o
Microarray data was collected for each of the 4 tissues and
significantly altered genes representing enzymes from
aforementioned metabolic pathways were identified
Fold change values from SAM test where used to adjust enzymatic
Vmax values assuming Vmax is directly proportional to enzyme
concentration
 Vmax = k[E]
Noor: Examples of ODEs Used
Glycolysis: Glucose
• Hexokinase: A = Mg2+-bound ATP, B = GLC_c, P = G6P_c, Q = Mg2+-bound
ADP
• GLUT4: A = GLC_b, P = GLC_c
Noor: Examples of ODEs Used
TCA: Fumarate
• Succinyl Dehydrogenase: A = SUC_x, B = COQ_x, P = QH2_x, Q = FUM_x
Mitochandria Energy Flux
Pathway for Diabetes
Genomics Summary
• Given: Sequence Data
• Microarray Data
• A priori information (curated literature,
pathways)
• Show: Host proteins targeted by virus
• Discover new protein-protein interactions
• Investigate side effects and treatment potential of
drugs
Sevgi Soysal
• Kadinin Adi Yok!
• A Woman Has No Name!
• More and more women’s names are embedded
onto the marble stones of science.