Whole Genome Annotations Experimental data involving thousands

Download Report

Transcript Whole Genome Annotations Experimental data involving thousands

Introduction to
Bioinformatics
Spring 2002
Adapted from Irit
Orr Course at WIS
What Is Bioinformatics?
A Marriage Between
Biology and Computers!
What Is Bioinformatics?
(My Correction)
A Marriage Between
Molecular Biology and
Computer Science!
What Is Bioinformatics?
 Bioinformatics is the field of science in which
biology, computer science, and information
technology merge into a single discipline.
 Bioinformatics is the science of managing and
analyzing biological data using advanced
computing techniques.
What Is Bioinformatics?
Bioinformatics ultimate goal, (as is
described by an expert), is to enable the
discovery of new biological insights as
well as to create a global perspective
from which unifying principles in biology
can be discerned.
What Is Bioinformatics?
The Definition From Bioplanet
Bioinformatics is the application of computer
technology to the management of biological
information.
Computers are used to gather, store, analyze
and integrate biological and genetic
information which can then be applied to
gene-based drug discovery and
development.
What Is Bioinformatics?
The need for Bioinformatics capabilities has been
precipitated by the explosion of publicly available
genomic information resulting from the Human
Genome Project.
The science of Bioinformatics, is essential to the use
of genomic information in understanding human
diseases and in the identification of new
molecular targets for drug discovery.
What Is Bioinformatics?
The Definition From Whatis.Com
Bioinformatics is the science of developing
computer databases and algorithms for the
purpose of speeding up and enhancing
biological research.
Bioinformatics is being used most noticeably
in the Human Genome Project, the effort to
identify the 80,000 genes in human DNA .
What Is Bioinformatics?
The Definition From BitsJournal
Two simple statements seem to suffice in
order to create our definition of
bioinformatics and make it reflect the future
of the life sciences:
 Bioinformatics is “a combination of Computer
Science, Information Technology and
Genetics used to determine and analyze
genetic information.”
What Is Bioinformatics?
The Definition From BitsJournal
 The highlight is on the words determine and
analyze. These are expressions which
involve very real human elements.
What Is Bioinformatics?
The Definition From BitsJournal
Bioinformatics is the future of the Life Sciences.
What Is Done in Bioinformatics?
 Analysis and interpretation of various types
of biological data including: nucleotide and
amino acid sequences, protein domains,
and protein structures.
What Is Done in Bioinformatics?
 Development of new algorithms and
statistics with which to assess biological
information, such as relationships among
members of large data sets.
 Development and implementation of tools
that enable efficient access and
management of different types of
information, such as various databases,
integrated mapping information.
Why Use Bioinformatics?
 The explosive growth in the amount of biological
information necessitates the use of computers for
cataloging and retrieval of this information.
 A more global perspective in experimental design.
As we move from the one scientist-one
gene/protein/disease paradigm of the past, to a
consideration of whole organisms, we gain
opportunities for new, more general insights into
health and disease.
Why Use Bioinformatics?
 Data-mining - the process by which testable
hypotheses are generated regarding the function
or structure of a gene or protein of interest by
identifying similar sequences in better
characterized organisms.
– For example, new insight into the molecular basis of a
disease may come from investigating the function of
ortholog of the disease gene in other organisms.
 Equally exciting is the potential for uncovering
phylogenetic relationships and evolutionary
patterns.
Examples of Biology Data for
Bioinformatics ?
 Whole
Genome Annotations
 Experimental data involving thousands
of Genes simultaneously
 DNA Chips, MicroArray, and Expression
Arrays Analyses
 Comparative Genomics (Analyses between
Species and Strains )
 Proteomics ('Proteome' of an Organism, 2D
gels, Mass Spec, 3D structure )
Examples of Biology Data for
Bioinformatics ?
 Medical
applications: Genetic Disease
(SNPs)
 Pharmaceutical
and Biotech Industry
(Drug design)
 Agricultural
applications
(Plants resistance to various types of diseases)
Molecular Biology Information
DNA
Raw DNA Sequence
•
•
•
•
•
Coding or Not coding?
Parse into genes?
4 bases: AGCT
~1 K in a gene,
~2 M in genome
atggcaattaaaattggtatcaat
ggttttggtcgtatcggccgtatc
gtattccgtgcagcacaacaccgt
gatgacattgaagttgtaggtatt
aacgacttaatcgacgttgaatac
atggcttatatgttgaaatatgat
tcaactcacggtcgtttcgacggc
actgttgaagtgaaagatggtaac
ttagtggttaatggtaaaactatc
cgtgtaactgcagaacgtgatcca
Molecular Biology Information
Protein
• 20 letter alphabet
ACDEFGHIKLMNPQRSTVWY
But not BJOUXZ
• Strings of ~300 aa in an
average protein
(I.g bacteria),
• ~200 aa in a protein domain
• LNCIVAVSQNMGIGKNGDL
PWPPLRNEFRYFQRMTTTS
SVEGKQNLVIMGKKTWFSI
LNSIVAVCQNMGIGKDGNL
PWPPLRNEYKYFQRMTSTS
HVEGKQNAVIMGKKTWFSI
ISLIAALAVDRVIGMENAM
PWNLPADLAWFKRNTLDKP
VIMGRHTWESITAFLWAQD
RNGLIGKDGHLPWHLPDDL
HYFRAQTVGKIMVVGRRTY
ESF
Molecular Biology Information
Macromolecular Structure
DNA, RNA, Protein
Molecular Biology Information
Whole Genomes
Viruses, Prokaryotes,
Yeast, Fly, Arabidopsis,
Mouse, Human
Molecular Biology Information
Other Integrative Data
Information to understand whole genomes:
¥Metabolic Pathways
traditional biochemistry
¥Regulatory Networks
¥Whole Organisms Phylogeny
¥Environments, Habitats, ecology
¥The Literature (MEDLINE)
Molecular Biology Information
Redundancy and Multiplicity
• Different Sequences Have the Same Structure.
• One Organism has many similar genes.
• Single Gene May Have Multiple Functions.
• Genomic Sequence Redundancy due to the
Genetic Code.
Exponential Growth of Data Matched
by
Development of Computer Technology
• CPU
vs Diskspace & Net
As important as the increase in computer
speed has been, the ability to store large
amounts of information on computers is even
more crucial and need special attention as well.
Bioinformatics Tasks
(Several Suggestions)
 Finding the genes in the DNA sequences
of various organisms.
 Prediction of promoters and other
regulatory regions.
Bioinformatics Tasks
(Several Suggestions)
 Developing methods to predict the
structure of structural RNA sequences.
 Developing methods to predict the
structure and/or function of newly
discovered proteins.
 Clustering protein sequences into
families of related sequences and the
development of protein models.
Bioinformatics Tasks
(Several Suggestions)
 Aligning similar proteins and generating
phylogenetic trees to examine evolutionary
relationships.
 The process of evolution has produced DNA
sequences that encode proteins with very
specific functions.
 It is possible to predict the three-dimensional
structure of a protein using algorithms that
have been derived from our knowledge of
physics, chemistry and most importantly, from
the analysis of other proteins with similar
amino acid sequences.
Top 10 Future Challenges for
Bioinformatics
(At a Bioinformatics Conference Last
Fall)
 Precise, predictive model of transcription
initiation and termination: ability to predict where
and when transcription will occur in a genome.
 Precise, predictive model of RNA splicing
/alternative splicing: ability to predict the splicing
pattern of any primary transcript in any tissue.
Top 10 Future Challenges for
Bioinformatics
(At a Bioinformatics Conference Last
Fall)
 Precise, quantitative models of signal
transduction pathways: ability to predict
cellular responses to external stimuli
Top 10 Future Challenges for
Bioinformatics
(At a Bioinformatics Conference Last
Fall)
 Determining effective protein:DNA,
protein:RNA and protein:protein
recognition codes
 Accurate ab initio protein structure
prediction
Top 10 Future Challenges for
Bioinformatics
(At a Bioinformatics Conference Last
Fall)
 Rational design of small molecule inhibitors
of proteins.
Top 10 Future Challenges for
Bioinformatics
(At a Bioinformatics Conference Last
Fall)
 Mechanistic understanding of protein
evolution: understanding exactly how new
protein functions evolve.
 Mechanistic understanding of speciation:
molecular details of how speciation occurs.
Top 10 Future Challenges for
Bioinformatics
(At a Bioinformatics Conference Last
Fall)
 Continued development of effective gene
ontologies - systematic ways to describe the
functions of any gene or protein.
 Education: development of appropriate
bioinformatics curricula for secondary,
undergraduate and graduate education.