Intro Bioinform 1-19..

Download Report

Transcript Intro Bioinform 1-19..

Biology/Computer Science 251: Introduction to Bioinformatics
What is bioinformatics?
Definition:
“Bioinformatics is nothing but good, sound, regular
biology appropriately dressed so that it can fit into a
computer” -- Claverie & Notredame, Bioinformatics for Dummies
Class Web Site
http://cs.gettysburg.edu/~leinbach/Bio_CS251/
• This site will contain all important documents related
to the class.
Power points of all lectures
Labs
Exam Answer Keys (Posted after the exam)
Homework Assignments and Answer Keys
• Note the updated syllabus contained at this site. It
supercedes the one distributed on paper.
Where to start: A brief history of Bioinformatics
7 pea traits, or characters, studied by Mendel
Gregor Mendel: 1866 - first described a set of mathematical rules by
Which the appearance of an organism (its PHENOTYPE) could be
related to its inherited genetic makeup (GENOTYPE)
Two Scientists Who In ~900 Words Reshaped the Way In
Which We View Life on Earth
Red blood cells undergo sickling due to
a single base change in the DNA of the
beta-globin gene. This base change in
DNA changes one residue in the protein`
Bioinformatics allows the study of gene origins and
the evolution of new genes
SCIENCE,
12-22-06
We are now moving into the post-genomic era, as entire genomes
are sequenced and made available with individual genome databases
Bioinformatics was born out of databases, constructed to collect
protein and DNA sequences
1.
Proteins were first: 1960’s thru 1970’s Margaret Dayhoff, National Biomedical
Research Foundation (NBRF) established PIR, Protein Information Resource,
which grew into the PIR-International Protein Sequence Database:
http://www-nbrf.georgetown.edu/pir
2.
DNA came second:
1977 - reliable DNA sequencing developed
1974 - GenBank was established, followed by
1980 - European Molecular Biology Laboratory
(EMBL) Data Library, and…
1984 - DNA Databank of Japan (DDBJ)
Today, GenBank is under National Center for Biotechnology Information (NCBI).
GenBank, EMBL, and DDBJ formed International Sequence Database Collaboration,
Data are exchanged on a daily basis.
GenBank: http://www.ncbi.nlm.nih.gov
How fast is the GenBank database growing?
http://www.ncbi.nlm.nih.gov/GenBank/genebankstats.html
The Net (no pun intended) Result is an Astounding
Growth of Biological Information
Source: Scrabanek op. cit.
Whole- genome structure/organization can be compared between species
This is a human karyogram,
or idiogram
This is a human-mouse synteny map
Bioinformatic and genomic approaches have exploded the
study of evolution (molecular evolution)
SCIENCE, 12-22-06
Bioinformatic and genomic approaches allow discovery of new
and unsuspected species of organisms that cannot be detected
using conventional approaches
Archaeal Richmond Mine Acidophilic
Nanoorganism (ARMAN)
Bioinformatics can be applied to study interactions among proteins
within the cell, i.e, proteomics
For example, take a look at the Saccharomyces genome database (SGD):
http://www.yeastgenome.org/
Protein Structure Data is Growing at a Slower Pace
Protein-protein
interaction map for
budding yeast
The color of a node indicates
The effect of deleting the
Corresponding protein.
Red:
Green:
Orange:
Yellow:
lethal
non-lethal
slow growth
unknown
Jeong et al. 2001. Nature 411: 41-2
Novel insights from the layering of 3-D protein structure with
Protein Interaction Networks
Cancer cells
DNA chips
(DNA microarrays)
can be used to
unravel the genetic
basis of cancer
Normal cells
One form of cancer, Non-Hodgkins Lymphoma, is actually two types: two varieties
of Diffuse Large B-Cell Lymphoma (DLBCL) exhibit very different microarray
RNA-expression profiles and very different survival outcomes.
What is bioinformatics?
Definition I:
“Bioinformatics is nothing but good, sound, regular
biology appropriately dressed so that it can fit into a
computer” -- Claverie & Notredame, Bioinformatics for Dummies
Definition II: “A field that involves the building and manipulation of
biological databases. In the context of genomics, this means managing
massive amounts of sequencing data and providing useful access to and
interpretation of the data” -- Weaver, Molecular Biology, 3rd ed.
Definition III: “A field that extracts biological information from large
datasets such as sequences, protein interactions, microarrays, etc. This
field also includes the area of data visualization”
-- Campbell & Heyer, Genomics, Proteomics, and Bioinformatics
Carl’s Definition of Bioinformatics
A study of the algorithms and programs that are used by
Molecular Biologists and others in the Biological and
Medical Sciences in their quest for understanding protein
structure and function in living organisms.
This is just one of many definitions that may be found in
text books, scientific papers, and on the web. The simplest
definition is that it is an interdisciplinary subject drawing
on material from Biology, Mathematics, and Computer
Science. To me this is like saying that e = mc2 has
something to do with relativity theory.
Some Implications of this Definition
• An individual studying Bioinformatics needs to have some
understanding of the basic ideas of Molecular Biology research.
• They also need to have a familiarity with DNA sequences and how
they contribute to 3D Protein Structure as well as gene
identification and phylogenetics.
• They need to be familiar with the many “in silico” tools that are
used and the parameters that control the output of the programs or
algorithmically controlled devices.
• It is important for them to understand the objectives and
limitations of both Computer Science and Molecular Biology.
• They need to have some experience with collecting biological data
for analysis
Computer Science
Biology
Computational
Biology
Micro Biology &
Medical Science
(Note the two way
arrow)
Bioinformatics
Early Pre History
Computer
Science
Micro
Biology
Bioinformatics
Late Pre History
Computer
Science
Micro
Biology
Bioinformatics
Recent History
Computer
Science
Micro
Biology
Bioinformatics
As a result, DNA sequencing and Proteomics have had an
increasing number of important applications in the life,
medical and social sciences.
Pickup any scientific journal that deals with the life or
medical sciences, any popular scientific magazine, or, for
that matter, any daily newspaper and you will find an
article where DNA or related issues play an important role
Why, it even makes the comic section:
For example:
A fungus, Aspergillus nidulans
http://www.broad.mit.edu/annotation/genome/aspergillus_nidulans/Home.html
What can we do with these databases?
What is the purpose of bioinformatics?
Answer: to make sense out of this: AN 1.29 nt 240000 - 270000
Preview a few tools A.n. genome database:
“Browse Regions” - “Feature Map” - “Get DNA sequence”
GenBank: “ORF Finder” - “Blastp search” - “Conserved domain” - “Format Results”