IntroToBioinformatics

Download Report

Transcript IntroToBioinformatics

Welcome to the Bioinformatics
Workshop
July 18, 2002
Introduction
Workshop objectives
Module 1: Retrieval of literature dealing with
molecular life sciences.
 Module 2: Sequence databases and similarity
searches.
 Module 3: Protein structure analysis

Workshop logistics
Course Website
(http://www.calstatela.edu/faculty/jmomand
/Bioinformaticscourse.html)
Power point presentation
 In-class workshop

Definition of Bioinformatics
Many definitions at the moment:

Use of computers to catalog and organize
molecular life science information into
meaningful entities.

Subset of Computational Biology
How can Bioinformatics help
make scientific discoveries?
Bioinformatics is not just the storage of data
in a computer.
Bioinformatics is the use of computers to
test a biological hypothesis prior to
performing the experiment in the laboratory.
Bioinformatics is the design of software
programs that analyze data.
Basis of molecular biology
Hierarchy of relationships (not exactly true):
Genome
Gene 1
Gene 2
Gene 3
Gene X
Protein 1
Protein 2
Protein 3
Protein X
Function 1
Function 2
Function 3
Function X
Genome Sizes
FERN:
160,000,000,000
LUNGFISH:
139,000,000,000
SALAMANDER:
81,300,000,000
NEWT:
20,600,000,000
ONION:
18,000,000,000
GORILLA:
3,523,200,000
MOUSE:
3,454,200,000
HUMAN:
3,400,000,000
Drosophila :
137,000,000
C. elegans
96,000,000
Yeast
12,000,000
E. Coli
5,000,000
Smallest Genome
??????
Genes
31,000
13,500
19,000
6,315
5,361
What is the approach used to
sequence genomes?
Divide and conquer
 Split the genome into fragments
 Clone into vectors that can accept large
fragments: yeast artificial chromosomes (YAC
Library)
 Landmarks within the genome can be obtained
using a Sequence Tagged Site (STS)
 Sequences of YAC clones are matched with
each other.
 Sequences that overlap form contigs.
History of the Human Genome
Project
1953
Watson,
Crick
DNA
structure
1972
Berg,
1st
recombinant
DNA
1977
Maxam,
Gilbert,
Sanger
sequence
DNA
1980
1982
1984
1985
1986
Botstein,
Sinsheimer DOE begins
Wada
MRC
Davis,
genome
proposes to publishes hosts
Skolnick
build
first large meeting to studies with
White
discuss HGP $5.3 million
automated genome
propose to sequencing Epstein-Barrat UCSanta
map human robots
virus (170 Cruz;
genome with
Kary Mullis
kb)
RFLPs
develops
PCR
1987
Gilbert announces
plans to start company
to sequence and
copyright DNA;
Burke, Olson, Carle
develop YACs; DonisKeller publish first
map (403 markers)
History of the Human Genome
Project (continued)
1987 (cont) 1988
1989
Hood
produces
first
automated
sequencer;
Dupont
devolops
fluorescent
dideoxynucleotides
Proposal
Venter
Simon
Hood,
to
sequence
announces develops
Olson,
20
Mb
in
strategy to BACs; US
Botstein
model
sequence
and French
Cantor
propose organism by ESTs. He teams
2005;
plans to
publish first
using
Lipman,
patent
physical
STS’s to map
Myers
partial
maps of
the human
chromosome
genome publish the cDNAs;
BLAST
Uberbacher s; first
algorithm develops
genetic maps
GRAIL, a of mouse and
gene finding human
program
genome
published
NIH
supports the
HGP;
Watson
heads the
project and
allocates
part of the
budget to
study social
and ethical
issues
1990
1991
1992
1993
Collins is
named
director
of
NCHGR;
revise
plan to
complete
seq of
human
genome
by 2005
1995
Venter
publishes
first
sequence of
free-living
organism:
H. influenzae
(1.8 Mb);
Brown
publishes on
DNA arrays
1996
Yeast
genome is
sequenced
(S.
cerevisiae)
History of the Human Genome
Project (continued)
1997
Blattner,
Plunket
complete E.
coli
sequence; a
capillary
sequencing
machine is
introduced.
1998
SNP project
is initiated;
rice genome
project is
started;
Venter
creates new
company
called Celera
and proposes
to sequence
HG within 3
years; C.
elegans
genome
completed
1999
2000
NIH
proposes to
sequence
mouse
genome in 3
years; first
sequence of
chromosome
22 is
announced
Celera and
others
publish
Drosphila
sequence
(180 Mb);
human
chromosome
21 is
completely
sequenced;
proposal to
sequence
puffer fish;
Arabadopsis
sequence is
completed
2001
Celera
publishes
human
sequence in
Science; the
HGP
consortium
publishes the
human
sequence in
Nature
Public funding vs. Private
funding
Public-Taxpayers’ money, international
effort.
Private-Companies that invest money hope
to provide access to their information on a
fee basis. Celera also allows some free
information to small research groups.
Both groups published the sequence of the
human genome in 2001.
Bioinformatics is Multidisciplinary
Genomics
Drug Design
Computer
Science
Molecular
Biology
Phylogenetics
Structural
Biology
Math
Statistics
Bioinformatics at CSULA
www.calstatela.edu/faculty/jmomand/Bioinformaticscourse.html
Upper Div. Standing in
Biology or Biochem
One course in C/C++
programming (CIS 283)
Upper Div. Standing in
CS, IS, CE
One course in
Molec. Biology/Biochem or
Chem/Biol 154L (W’03)
Introduction to Bioinformatics (Chem/Biol 454L)
(offered in Spring ‘03)
How is Bioinformatics Used?
Bioinformatics isn’t going to replace
lab work anytime soon
Experimental proof is still the
“Gold Standard”.
Bioinformatics is used to help “focus”
the experiments of the benchtop scientist
What’s Left To Do? Find out what the rest of the
genome does.
Unknown Function
What is left to do?
Sequence genomes of other organisms
Analyze genes to predict function
Analyze interactions of gene products- Create
genetic networks
Once this is finished, then what?
Start making changes
Modify gene expression patterns to make better
crops or better medicines
Increasing levels of complexity
Metabalome (metabolic
pathways)
Proteome (proteins)
Transcriptosome
(RNA)
Genome
(DNA)
Primary public domain
bioinformatics servers
Public Domain
Bioinformatics
Facilities
National Center
For Biotechnology
Information (NCBI)
United States
Databases
Analysis
Tools
European Bioinformatics
Institute (EBI)
United Kingdom
Databases
Analysis
Tools
Genome
Net
(KEGG & DDBJ)
Japan
Databases
Analysis
Tools
Literature Databases and NCBI
Learning objective- How does one retrieve
information on a particular subject?
National Center for Biotechnology Information
(NCBI)
Databases outside of NCBI
Retrieval of information
Literature Databases
Medline (PubMed)
OMIM
CSULA Library
Other biological databases
BIOSIS
Agriculture http://www.fao.org
Melvyl (Books at UC Libraries)
NCBI ENTREZ
A search engine that provides access and links
between various databases
ENTREZ
PubMed
GenBank
Protein Genomes
databases
PopSet
Taxonomy
OMIM
On-line Mendelian Inheritance of
Man (OMIM)
A catalog of human genes linked to diseases
Began by Victor A. McKusick at Johns Hopkins
University
A good place to start when you want to know
about a certain disease.
This database is linked to PubMed, the OMIM
Morbid Map
The OMIM Gene Map
CSULA and other resources
The best way to access articles at Cal State LA is to obtain the exact
reference from PubMed. Then search to the CSULA library database
for the article: http://www.calstatela.edu/library/mudir1.htm
Publishers to search through at the CSULA
Library Site:



ACS
Wiley InterScience
IDEAL
There is one Website that also offers free access to
journals:

PubMedCentral: http://www.pubmedcentral.nih.gov/
How to keep up to date on your
favorite subject?
Set up Cubby. An automatic retrieval
system that searches PubMed and deposits
the literature citations in your own account
(there is no charge). Demonstration of
how Cubby works. Requires a login.
Workshop Exercise 1-Retrieve information on a topic
from literature databases. Set up Cubby account for yourself.