IntroBio520 - Nematode bioinformatics. Analysis tools and data

Download Report

Transcript IntroBio520 - Nematode bioinformatics. Analysis tools and data

Bioinformatics
BIO520/INF520
Jim Lund
Assigned reading:
Ch1 & 2
Bioinformatics
Bioinformatics applies principles of information
science (derived from applied math, computer science,
and statistics) to make the vast, diverse, and complex
life sciences data more understandable and useful. It
automates simple but repetitive types of analysis.
Computational biology uses mathematical and
computational approaches to address theoretical and
experimental questions in biology.
BIO520 Topics
• Navigating biological databases.
• Sequence alignment.
• Proteins
- 3D structure visualization, prediction, motif
analysis.
• DNA sequence annotation.
– Gene finding in prokaryotes and eukaryotes.
• RNA structure.
• Phylogenetic inference
• Genome/transcriptome/proteome
– Function & Analyses.
Molecular information-DNA
• Raw bacterial DNA
sequence
– Coding or not?
– Parse into genes?
– Find regulatory
sequences?
– PCR primers, vector
engineering?
– 4 bases: ACGT
• 1kb for a gene
• Mb for a genome
19
8
19 2
8
19 3
8
19 4
8
19 5
8
19 6
8
19 7
8
19 8
8
19 9
9
19 0
9
19 1
9
19 2
9
19 3
9
19 4
9
19 5
9
19 6
9
19 7
9
19 8
9
20 9
0
20 0
0
20 1
0
20 2
0
20 3
0
20 4
0
20 5
0
20 6
0
20 7
0
20 8
09
Sequences (millions)
110
100
90
80
70
60
50
40
30
100
80
x
20
10
0
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
60
40
20
0
Base pairs (billions)
Growth of Genbank (1982-2009)
120
Protein Structure
Prediction
Proteomics
1978-1998
MALDI-TOF?
ESI-MS?
Metabolic Networks
KEGG, 1998
Regulatory Networks
KEGG
Bioinformatics-what is it?
Acquisition, curation, and analysis of
biological data
Hypothesis
Bioinformatic Data-1978 to 2008
• DNA sequence
• Gene expression
• Protein
expression
• Protein Structure
• Genome mapping
• Metabolic
networks
• Regulatory
networks
• Trait mapping
• Gene function
analysis
• Scientific literature
Goals of the HGP,1998-2003
•
Reference Human Genome Sequence
•
•
Improved Sequence Technology
•
•
•
•
$0.25 per finished base
Human Genome Sequence Variation
Technology for Functional Genomics
Comparative Genomics
•
•
Draft 2001, Finished in 2003
Finish Mouse by 2005 (well ahead here)
ELSI
Genome sequences highlight the finiteness
of the set of sequences!
•
What remains to be
done?
Comparative
Genomics
• Description of
mRNAs, proteins
(identity and
structure)
• Functional
analysis
• Detailed
understanding of
development,
regulation,
variation
The Gene for…
Other Reasons to Care
Affymetrix
Genentech
Biologist User Training
• Internet sites
–Range from high quality to unreliable.
• Unread documentation
• Popular program sites with NO
documentation
–Perhaps one day I will get around to
writing some documentation”–Help from a WWW service, hit several hundred times
per day!
Dramatic Changes in
Information Science
• Information Storage
– Digital: text, numbers, images
• Computerized Data Analysis
• Automated Data Analysis
• Information Distribution
– Internet, cloud, etc.
Moore’s Law
Intel Corporation
Computer Science and
bioinformatics
• Operating Systems
• Programming
• Algorithms
– New problems keep turning up!
• Data structure/databases
• Interfaces
• Search and visualization
BIO520 Nuts and Bolts
• Syllabus & Schedule • Labs on Fridays
In Young B-35
• Textbook
– Internet
• Exams (2 + final)
– Program
• Grading:
documentation
– 12 labs: 10 pts
– Exams: 50 pts
– Final: 50 pts
http://elegans.uky.edu/520
Textbooks
Required textbook:
• Understanding Bioinformatics by
Marketa Zvelebil and Jeremy Baum
Supplemental reading (don’t buy):
• Bioinformatics: A Practical Guide to
the Analysis of Genes and Proteins,
3rd Ed.
– Baxevanis and Ouellette
Biology background material:
– Genes IX (Lewin)
– Cell Biology (Watson et al, Darnell et al)
– NCBI Bookshelf
(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?d
b=Books&itool=toolbar)
Computer Resources
• http://elegans.uky.edu/520
• Locally installed Programs:
– Cn3D, Clustal, TreeView, Chime
• Web based tools:
– Databases
– Software programs
Biological Principles
Evolution by natural selection
DNA->RNA->Protein
StructureFunction