Download Report

Transcript presentation

Interrelating Different Types of
Genomic Data
From Proteome to Secretome:
‘Oming in on the Function
Outline the goal of functional genomics
Describe the studies of H. Antelmann
Explain the “-Ome” and it’s purpose
Define the “-Omes”
Identify the different computational and
experimental methods for defining the “-Omes”
Explain how “-Omes” are interrelated
Summarize the ultimate goal of genomics
The Goal of Genomics
To complement the genomic sequence by
assigning useful biological information to
every gene
To improve the understanding of how
different biological molecules contained
within the cell combine to make the
organism possible
Additional Goals
To define the three-dimensional
structures of the macromolecules, their
sub-cellular localizations, intermolecular
interactions, and expression levels
Haike Antelmann & Group
Institute for microbiology and molecular biology
@ Ernst-Moritz-Arndt-Universität Greifswald:
Greifswald, Germany
Previously the group used computational
methods to predict all exported proteins
Now, aim to verify previous predictions by
experimentally characterizing the entire
population of secreted proteins using 2D gel
electrophoresis and mass spectrometry
They showed that about 50% of their original
predictions were accurate
The new Lexicon of the “-Ome”
Antelmann coined the term “secretome” to
define the varied populations and
subpopulations in the cell
-Omes can be divided into two categories:
Those that represent a population of
Those that define their actions
Provides an inventory or “parts list” of molecules
contained within an organism
Transcriptome: the population of mRNA transcripts
in the cell, weighted by their expression levels
Glycome: the popluation of carbohydrate molecules
in the cell
Secretome: the population of gene products that are
secreted from the cell
Ribonome: the population of RNA-coding regions of
the genome
-Omes Continued
Transportome: the population of the gene
products that are transported; this includes
the secretome
Functome: the population of gene products
classified by their expression levels
Translatome: the population of proteins in
the cell, weighted by their expression levels
Foldome: the population of gene products
classified through their tertiary structure
-Omes Continued
Describes the actions of the protein products
Genome: the full complement of genetic information
both coding and non coding in the organism
Proteome: the protein coding regions of the genome
Physiome: quantitative description of the
physiological dynamics or functions of the whole
Metabolome: the quantitative complement of all the
small molecules present in a cell in a specific
physiological state
-Omes Continued
Morphome: the quantitive description of anatomical
structure, biochemical and chemical composition of
an intact organism, including its genome, proteome,
cell, tissue and organ structures
Interactome: list of interactions between all
macromlecules in a cell
Orfeome: the sum total of open reading frames in the
genome, without regard to whether or not they code;
a subset of this is the proteome
Phenome: qualitative identification of the form and
function derived from genes , but lacking a
quantitative, integrative definition
-Omes Continued
Regulome: genome-wide regulatory network
of a cell
Cellome: the entire complement of molecules
and their interactions within a cell
Operome: the characterization of proteins
with unknown biological function
Pseudome: the complement of pseudogenes in
the proteome
Unknome: genes of unknown factor
Computational Methods
Algorithmic methods for predicting genes, protein
structure, interactions, or localization based patterns in
individual sequences or structures.
Annotation transfer through homology. (Inferring
structure or function based on sequence and structural
information of homologous proteins.)
“Guilt-by-Association” method based on clustering
where functions or interactions are inferred from
clusters of functional genomic data, such as expression
Annotation Transfer through
In SWISS- PROT, as in most other sequence databases, two classes
of data can be distinguished: the core data and the annotation. For
each sequence entry the core data consists of the sequence data, the
citation information (bibliographical references), and the taxonomic
data (description of the biological source of the protein), while the
annotation consists of the description of the following items:
Functions of the protein
Post- translational modifications. For example
carbohydrates, phosphorylation, acetylation, GPI- anchor, etc.
Domains and sites. For example calcium binding regions, ATP- binding
sites, zinc fingers, homeobox, kringle, etc.
Secondary structure, Quaternary structure
Similarities to other proteins, Diseases associated with deficiencies in
the protein, Sequence conflicts, variants, etc.
For assessing gene function (although not
logically precise): as genes already known to be
related do, in fact, tend to cluster together based
on their experimentally determined expression
patterns. The approach is made more systematic
and statistically sound by calculating the
probability that the observed functional
distribution of differentially expressed genes
could have happened by chance.
Experimental Methods
Most prominent method is the twodimensional electrophoresis to isolate proteins
followed by mass spectrometry for protein
Protein chip system, capable of highthroughput screening of protein biochemical
Also sometimes use, transposon insertion
2-D Electrophoresis
First introduced in 1975. Most commonly used
method for protein separation in proteomics.
Proteins are first separated across a gel
according to their isoelectric point, then
separated in a perpendicular direction on the
basis of their molecular weight.
Electrophoresis in which a second perpendicular
electrophoretic transport is performed on the
separate components resulting from the first
electrophoresis. This technique is usually
performed on polyacrylamide gels.
Mass Spectrometry
In a typical approach, this technique for
measuring and analyzing molecules
involves introducing enough energy into a
target molecule to cause its disintegration.
The resulting fragments are then analyzed,
based on their mass/ charge ratios, to
produce a "molecular fingerprint.
Transposon Insertion
A segment of DNA which contains the insertion
elements at either end but can contain just about
anything in the middle (genes, markers, etc.). These
types of transposons tend to be very large, and many of
them came about when the inner two insertion elements
of two smaller transposons stopped working and only
the two at the far ends continue to work, so that when
the transposon moves, it takes everything in between the
two original transposons with it. Some composite
transposons are used in genetics experiments; Tn5 and
Tn10 are two such composite transposons which have
genes that encode resistance to certain antibiotics.
Functional Genomics experiments give
rise to very complicated data that is
inherently hard to interpret
This data is often plagued with noise
Both factors can lead to inaccuracies and
conflicting interpretations
Average or combine data to obtain more
accurate results
Interrelating Different -Omes
Fundamental approach in genomics is to
establish relationships between the different
Piecing the individual –omes together, hope to
build a full and dynamic view of the complex
process that support the organism
Example: How do the proteome and regulome
combine to produce the translatome?
Interrelating Different –Omes Cont.
Defining or assigning one ‘ome based on another
Comparing one ‘ome with another to better
understand the processes that shift one
population into its successor
Calculating “missing” information in one ‘ome
based on information in another
Describing the intersection between multiple
Final Thoughts
Ultimate goal of genomics: the clarification of the
functome, but there are many intermediate steps
By viewing the cell in terms of a list of distinct parts,
definition of each part is possible. Determine and
categorize functional information for each gene.
Computational and experimental techniques are
valuable and complementary
Genomic approaches result in inaccurate and noisy data
that must be analyzed further for accuracy
Cambridge Healthtech Institute;
EMBL-EBI: European Bioinformatics Institute;
Greenbaum, Dov. Luscombe, Nicholas. Jansen, Ronald.
Qian, Jiang. Gerstein, Mark. “Interrelating Different
Types of Genomic Data, from Proteome to Secretome:
‘Oming in on Function”
Rolf Apweiler et. al " Protein Sequence Annotation in the
Genome Era: The Annotation Concept of SWISS- PROT
+ TrEMBL" Intelligent Systems in Molecular Biology,