Lecture1. Introduction to Bioinformatics

Download Report

Transcript Lecture1. Introduction to Bioinformatics

Introduction to Bioinformatics
Fall Semester 2005
CSC 487/687 Computing for
Bioinformatics
What is Bioinformatics
Easy Answer
Using computers to solve molecular biology
problems; Intersection of molecular biology and
computer science
Hard Answer
Computational techniques (e.g. algorithms, artificial
intelligence, databases) for management and
analysis of biological data and knowledge
Bioinformatics
 Bioinformatics = Biology + Information
 Biology is becoming an information science
 Computation methods are necessary to
analyze the massive amount of information
that coming out of the genome projects
Bioinformatics is Another Revolution
in Biology
Three concepts, which remain
central to Bioinformatics
 Data representation
A complex, dynamic, three-dimensional molecule
string of characters
a simple
Three concepts, which remain
central to Bioinformatics
 The concept of similarity
– Evolution has operated on every sequence
– In biomolecular sequences (DNA, RNA or
amino acid sequences). High sequence
similarity usually implies significant functional or
structural similarity.
– The opposite is not true
– Algorithms for comparing sequences and
finding similar regions are at the heart of
bioinformatics
Three concepts, which remain
central to Bioinformatics
 Bioinformatics is not a theoretical science; it is
driven by the data, which in turn is driven by the
needs of biology.
 Sequences
 Microarray technologies
 …
GenBank Growth
Moore’s Law
What do you need to know?
 It all depends on your background
Are you a …?
Biologist with some computer knowledge, or
Computer scientist with some biology
background
Few do both well
Background
 Biology for Computer Scientists
 Computer Science for Biologists
Biological Information Flow
Genome
Introns/Exons
Gene Sequence
Bioinformatics
attempts to model
this pathway
Protein Sequence
Protein Structure
Protein Functions
Cellular Pathways
Living Things
 Entropy (the tendency to disorder) always
increase
 Living organisms have low entropy
compared with things like soil
 They are relatively orderly…
 The most critical task is to maintain the
distinction between inside and outside
Living Things
 In order to maintain low entropy, living organisms
must expend energy to keep things orderly.
 They figured out how to do this 4 billion years ago
 The functions of life, therefore, are meant to
facilitate the acquisition and orderly expenditure of
energy
Living Things
 The compartments with low entropy are
separated from “the world.”
 Cells are the smallest unit of such
compartments.
 Bacteria are single-cell organisms
 Humans are multi-cell organisms
The “living things” have the following
tasks:




Gather energy from environment
Use energy to maintain inside/outside distinction
Use extra energy to reproduce
Develop strategies for being successful and
efficient at the above tasks
– Develop ways to move around
– Develop signal transduction capabilities (e.g. vision)
– Develop methods for efficient energy capture (e.g.
digestion)
– Develop ways to reproduce effectively
How to accomplish…?
 Living compartments on earth have
developed three basic technologies
– Ability to separate inside from outside (lipids)
– Ability to build three-dimensional molecules that
assist in the critical functions of life (Protein,
RNA)
– Ability to compress the information about how
(and when) to build these molecules in linear
code (DNA)
Bioinformatics Schematic of a Cell
Lipids
 Made of hydrophilic (water loving) molecular
fragment connected to hydrophobic fragments
 Spontaneously form sheets (lipid membranes) in
which all the hydrophilic ends align on the outside,
and hydrophobic ends align on the inside
 Creates a very stable separation, not easy to pass
through except for water and a few other small
atoms/molecules
What is Nucleotide?
 Pentose, base, phosphate group
Pentose: RNA and DNA
Base
 Adenine (A),
Cytosine (C),
Guanine (G),
Thymine (T),
Uracil (U).
Nucleic Acid Chain




Condensation reaction
Orientation
From 5’ to 3’
In DNA or RNA, a nucleic
acid chain is called “Strand”
– DNA: double-stranded
– RNA: a single strand
 The number of bases
– Base pair (bp) in DNA
DNA Structure
DNA Structure
DNA Structure
RNA Structure and Function
• The major role of RNA is to participate in protein
synthesis
•Messenger RNA (mRNA)
•Transfer RNA (tRNA)
•Ribosomal RNA (rRNA)
mRNA
The Genetic Code
What is gene?
 A gene includes the entire nucleic acid
sequence necessary for the expression of
its product.
 Such sequence may be divided into
– Regulatory region
– Transcriptional region: exons and introns
 Exons encode a peptide or functional RNA
 Introns will be removed after transcription
Gene
Genome
 The total genetic information of an
organism.
 For most organisms, it is the complete DNA
sequence
 For RNA viruses, the genome is the
complete RNA sequence
Genes and Control
 Human genome has 3,000,000,000 bps divided
into 23 liner segments (chromosome)
 A gene has an average 1340 DNA bps, thus
specifying a protein of about ? (how many) amino
acids
 Humans have about 35,000 genes = 40,000,000
DNA bps = 3% of total DNA in genome
 Human have another 2,960,000,000 bps for
control information. (e.g. when, where, how long,
etc…)
Gene Expression
 An organism may contain many types of cells,
each with distinct shape and function
 However, they all have the same genome
 The genes in a genome do not have any effect on
cellular functions until they are “expressed”
 Different types of cells express different sets of
genes, thereby exhibiting various shapes and
functions
Gene Expression
 The production of a protein or a functional
RNA from its gene
 Several steps are required
– Transcription
– RNA processing
– Nuclear transport
– Protein synthesis
Gene Expression
Central Dogma
DNA
RNA
Protein
Next …
Protein Structure and Function
An Amino Acid
 An amino acid is defined as the molecule
containing an amino group (NH2), a
carboxyl group (COOH) and an R group.
R-CH(NH2)-COOH
 The R group differs among various amino acids.
 In a protein, the R group is also call a sidechain.
An Amino Acid
The Twenty Amino Acids of Proteins
The Twenty Amino Acids of
Proteins
Protein
 Peptide ― a chain of amino acids linked
together by peptide bonds.
 Polypeptides ― long peptides
 Oligopeptides ― short peptides (< 10 amino
acids)
 Protein are made up of one or more
polypeptides with more than 50 amino acids
Protein Structure
 Primary Structure
– Refers to its amino acid sequence
Secondary structure
 Regular, repeated
patterns of folding of
the protein backbone.
 Two most common
folding patterns
– Alpha helix
– Beta sheet
Tertiary Structure
 The overall folding of the entire polypeptide chain
into a specific 3D shape
Quaternary Structure
 Many proteins are formed more than one
polypeptide chain
 Describe the way in which the different
subunits are packed together to form the
overall structure of the protein
 Hemoglobin molecule
Quaternary Structure
Evolution
 Mutation ― rare events, sometimes single
base changes, sometimes larger events
 Recombination ― how your genome was
constructed as a mixture of your two parents
 Through Natural Selection
 Homology (similarity): different species are
assumed to have common ancestors
 The genetic variation between different
people is …(surprisingly ..)
References
 http://www.biology.arizona.edu/biochemistry/
problem_sets/large_molecules/
 http://helixweb.stanford.edu/bmi214/index2004.html
 http://www.web-books.com/MoBio/
 http://www.cs.sunysb.edu/~skiena/549/