COMPARATIVE GENOMICS

Download Report

Transcript COMPARATIVE GENOMICS

What is bioinformatics?
Long Definition: The study of the application
of computer and statistical techniques to the
management of biological information,
including development of methods to search
databases quickly, to analyze DNA sequence
information, and to predict protein sequence
and structure from DNA sequence data.
Short Definition: The management, analysis,
and visualization of molecular, cellular, and
genomic information.
Molecular Biology
Computational Biology
Bioinformatics
Computer Science
Genomics
Genomics-what is it?

Development and application of genetic mapping, sequencing, and
computation (bioinformatics) to analyze the genomes of organisms.
Sub-fields of genomics:
1.
Structural genomics-genetic and physical mapping of genomes.
2.
Functional genomics-analysis of gene function (and non-genes).
3.
Comparative genomics-comparison of genomes across species.

Includes structural and functional genomics.

Evolutionary genomics.
COMPARATIVE
GENOMICS
Brief Review
Definition
A comparison of gene numbers , gene
locations & biological functions of gene, in
the genomes of different organisms, one
objective being to identify groups of genes
that play a unique biological role in a
particular organism.
Few Terminologies
Homology :- Homology is the relationship of any
two characters ( such as two proteins that have
similar sequences ) that have descended,
usually through divergence, from a common
ancestral character. Homologues are thus
components or characters (such as
genes/proteins with similar sequences) that can
be attributed to a common ancestor of the two
organisms during evolution.
Homologoues can either be
orthologues, paralogues or
xenologues.
Orthologues are homologues that have evolved
from a common ancestral gene by speciation.
They usually have similar functions.
Paralogues are homologues that are related or
produced by duplication within a genome
followed by subsequent divergence. They often
have different functions.
Xenologues are homologous that are related by
an interspecies (horizontal transfer) of the
genetic material for one of the homologues. The
functions of the xenologues are quite often
similar.
Analogues
Analogues are non-homologues
genes/proteins that have descended
convergently from an unrelated ancestor.
They have similar functions although they
are unrelated in either sequence or
structure.
Comparative Genomics
Two very large problems are immediately apparent in undertaking the
sequencing of entire genomes.
First, the vast numbers of species and the much larger size of some genomes
makes the entire sequencing of all genomes a non-optimal approach for
understanding genome structure.
Second, within a given species most individuals are genetically distinct in a
number of ways. What does it actually mean, for example, to "sequence a
human genome"? The genomes of two individuals who are genetically distinct
differ with respect to DNA sequence by definition.
These two problems, and the potential for other novel applications, have given
rise to new approaches which, taken together, constitute the field of
comparative genomics.
Because all modern genomes have arisen from common ancestral genomes,
the relationships between genomes can be studies with this fact in mind. This
commonality means that information gained in one organism can have
application in other even distantly related organisms. Comparative genomics
enables the application of information gained from facile model systems to
agricultural and medical problems. The nature and significance of differences
between genomes also provides a powerful tool for determining the
relationship between genotype and phenotype through comparative genomics
and morphological and physiological studies.
The Role of Bioinformatics in Identification of Drug Targets from
Bacterial and Fungal Genomes
Dr. Andrew E. DePristo, Director of Bioinformatics, Genome
Therapeutics Corporation
Bacterial genomes are appearing at an ever-increasing rate, with a
September 1999 listing by NCBI indicating 16 completed, 10 being
annotated, and 55 being sequenced. Fungal genomes and
proteomes are less prevalent with one complete, a few nearly
complete, and large collections of cDNA sequences available for
about five organisms. This presentation will discuss use of this
bacterial and fungal genomic diversity, along with high-throughput
bioinformatics tools, to attach confidence to certain functional
predictions and to allow identification and targeting of essential genes
that are unique to specific organisms.
Methods (WET)
Introduction
A DNA walk of a genome represents how the frequency of each
nucleotide of a pairing nucleotide couple changes locally. This analysis
implies measurement of the local distribution of Gs in the content of GC
and of Ts in the content of TA. Lobry was the first to propose this
analysis (1996, 1999). Two complementary representations can be
derived from the DNA walk: the cumulative TA- and the GC-skew
analysis.
Aim: By reading these description of the algorithm, a reader not trained
in genomics is able to redraw our graphs, using the basic genometric
data file that is posted on our web resource for each organism as a zip
file (.zip).
1) DNA walk
1.1) Drawing a DNA walk by reading a sequence file nucleotide by
nucleotide.
A simple algorithm is used to draw a DNA walk by simply assigning
a direction to each nucleotide. We propose the following
assignment, slightly different from Lobry's: to T, C, A, and G
correspond the E(ast), S(outh), W(est), and N(orth) directions,
respectively (Lobry, 1999). Reading the nucleotide sequence
nucleotide by nucleotide, and following the rule, a path clearly
emerges on the graph: Figure 1.
Figure 1: DNA walk of the sequence
GTCTGGTGTCTGGAGTTCCTGGGTCTTGAGACCACAGGACC
CACCAGGGACCCAGGACCC
Starting from the bottom left (bold blue line), the curve end at the bottom left (pink line)
1.2) Drawing a DNA walk by slicing a sequence file nucleotide
into small windows
A simple way to draw quickly this kind of graph is suggested by
Lobry (1996) by cutting a genome into windows of equal length.
Figure 2: DNA walk of the same sequence as the one presented in Figure 1:
GTCTGGTGTCTGGAGTTCCT
GGGTCTTGAGACCACAGGA
CCCACCAGGGACCCAGGAC
CC
The sequence was sliced into 5-nucleotide windows. Only the fifth nucleotide per
window is plotted. We can also work with the mean values of the window…
Comment: this method is not as precise as the first one. We could use
it with a spreadsheet software without affecting the final resolution of
the curve at the genome level.
2) The cumulative TA- and the GC-skew analyses.
2.1) Drawing a cumulative TA- or a GC-skew analysis by reading a
sequence file nucleotide by nucleotide.
Cumulative TA-skew analysis: Assign to each nucleotide the
following direction: to A, T, C, and G correspond the S, N, nd (no
direction), and nd directions, respectively. On the graph, after the
reading of one nucleotide, the pointer has to go one step eastward.
If a A, or T, is read, a further step is added, southward, or
northward, respectively.
Cumulative GC-skew analysis: Assign to each nucleotide the
following direction: to A, T, C, and G correspond the nd, nd, S,
and N directions, respectively. On the graph, after reading one
nucleotide, the pointer has to move one step eastward. If a C,
or G, is read, a further step is added, southward, or northward,
respectively.
Methods (dry)
Bioinformatics.
Its tools (software)
Computational analysis in drug
target discovery
Shannon entropy is a measure of variation
or change over a time series.Genes that
exhibit significant changes are regarded
as good target candidates.
Clustering is a method for grouping
patterns by similarities in their shapes.
GCG
History
(tools)
Founded in 1982 as a service of the Department of Genetics at the
University of Wisconsin, GCG became a private company in 1990 and
was acquired by Oxford Molecular Group in 1997. The company was
one of the pioneers of bioinformatics and its Wisconsin Package
sequence analysis tools are widely used and well regarded throughout
the pharmaceutical and biotechnology industries and in academia. To
support enterprise bioinformatics efforts, GCG developed SeqStore, its
Oracle-based data management system. Desktop solutions are
delivered to bench scientists through products such as MacVector and
OMIGA
GCG Wisconsin Package
Molecular biologists worldwide
use the GCG® Wisconsin
Package® as their software of
choice for comprehensive
sequence analysis. The
Wisconsin Package meets
research needs across
disciplines, project teams, and
labs to provide an enterprisewide solution. Based on
published algorithms from the
fields of mathematical and
computational biology, the
Package includes tools for:
Comparison
Database Searching and Retrieval
DNA/RNA Secondary Structure
Editing and Publication
Evolution
Fragment Assembly
Gene Finding and Pattern Recognition
Importing and Exporting
Mapping
Primer Selection
Protein Analysis
Translation
PAUP* version 4.0 is a major upgrade and new release of the software
package for inference of evolutionary trees, for use in Macintosh,
Windows, UNIX/VMS, or DOS-based formats. The influence of highspeed computer analysis of molecular, morphological and/or behavioral
data to infer phylogenetic relationships has expanded well beyond its
central role in evolutionary biology, now encompassing applications in
areas as diverse as conservation biology, ecology, and forensic studies.
The success of previous versions of PAUP: Phylogenetic Analysis Using
Parsimony has made it the most widely used software package for the
inference of evolutionary trees
Target Validation
Target validation involves taking steps to prove that a
DNA, RNA, or protein molecule is directly involved in a
disease process and is therefore a suitable target for
development of a new therapeutic compound.
Genes that do not belong to an established family are
critical to many disease processes and also need to be
validated as potential drug targets.
Target validation & identification
Computer based Drug- design:- Beginning
with the protein engineering and analysis
tools we can identify and evaluate the
target. Then, with that information we may
attack the target with a variety of tools to
identify new and novel drug candidates.
The complete suite of software products
provides for a seamless environment to
work more efficiently & quickly.
Target validation & identification
Computational component analyzes genomic
sequences resulting in 3D and functional
annotations. Once annotated, sequences can be
identified as potential drug targets for
development.
X-ray crystallography has become a central tool
in modern drug and target discovery.
These annotations, made from knowledge of
predicted protein structure, are an important
component in identifying potential targets,
thereby facilitating successful and competitive
drug discovery.
Outcomes/ Benefits
Provides “first pass” information on the function
of the putative protein based on the existence of
conserved protein sequence motifs.
Advancements in computer software
technologies (Bioinformatics) has made
comparative analysis of genomes an extremely
powerful approach for functional genomics too.
These studies can also reveal insights into the
recruitment of enzymes in a pathway
Outcomes/ Benefits
It will help us to understand the genetic basis of diversity
in organisms, both speciation & variation, events that are
important aspects of evolutionary biology.
Comparative genomics provides a powerful way in which
to analyze sequence data.
Indeed, there is already a long list of 'model' organisms,
which allow comparative analyses in a variety of ways.
Outcomes/ Benefits
The very small vertebrate genome of the pufferfish
provides a simple and economical way of comparing
sequence data from mammals and fish, representing a
large evolutionary divergence and so permitting the
identification of essential elements that are still present
in both species.
These elements include genes and the associated
machinery that controls their expression; elements that,
in many cases, have survived the test of time