Transcript Slide 1

Transcrição intrônica
antisenso no genoma humano
Sergio Verjovski-Almeida
Departmento de Bioquímica
Instituto de Química
Universidade de São Paulo
SVA-IQUSP
1
The genomic era
Whole-genome random sequencing and assembly of Haemophilus
influenzae Rd.
Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage
AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al., Science 269:496512, Jul 28, 1995
The minimal gene complement of Mycoplasma genitalium
Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA,
Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM, et al.
Science 270: 397-403, Oct 20, 1995
The Sequence of the Human Genome
J. Craig Venter, Mark D. Adams, Eugene W. Myers et al.,
Science 291: 304-1351, Feb 16, 2001
Initial sequencing and analysis of the human
genome
Lander ES, Linton LM, Birren B, Nusbaum et al.,
Nature 409: 860-921, Feb 15, 2001
Departamento de
Bioquímica
Initial sequencing and analysis of the human genome
International Human Genome Sequencing Consortium
Nature 409, 860-921(15 February 2001)
The last quarter of a century has been marked by a relentless drive to
decipher first genes and then entire genomes, spawning the field of
genomics. The fruits of this work already include the genome
sequences of 599 viruses and viroids, 205 naturally occurring
plasmids, 185 organelles, 31 eubacteria, seven archaea, one
fungus, two animals and one plant.
The sequence of the human genome is of interest in several
respects. It is the largest genome to be extensively sequenced so far,
being 25 times as large as any previously sequenced genome
and eight times as large as the sum of all such genomes. It is the
first vertebrate genome to be extensively sequenced. And, uniquely, it
is the genome of our own species.
Initial sequencing and analysis of the human genome
International Human Genome Sequencing Consortium
Nature 409, 860-921(15 February 2001)
We thank Compaq Computer Corporations's High Performance Technical
Computing Group for providing a Compaq Biocluster (a 27 node
configuration of AlphaServer ES40s, containing 108 CPUs, 216 GB
RAM, serving as compute nodes and a file server with one terabyte of
secondary storage) to assist in the annotation and analysis. Compaq
provided the systems and implementation services to set up and manage
the cluster for continuous use by members of the sequencing consortium.
Platform Computing Ltd. provided its LSF scheduling and loadsharing
software without license fee.
Initial sequencing and analysis of the human genome
International Human Genome Sequencing Consortium
Nature 409, 860-921(15 February 2001)
Generating ESTs with low-stringency
RT-PCR and non-degenerate primers
The gene and its protein in higher eukaryotes
Genes are ~ 3 % of the human genome
Processing
Random Reverse
Transcription
Poly-dT Reverse Transcription
cDNA Cloning
Low-stringency
RT-PCR
Departamento de
Bioquímica
poly-dT primed RT
Microarray Lab (Dec. 2000) - CAGE
IQ-USP
. GenIII spotter and scanner (Amersham Pharmacia)
. Microarray system program (Amersham Pharmacia)
. BioRobotics system (clone replication and rearray)
. Clone stocks
. Clean Room 10,000 (humidity and temperature control)
Bioinformatics Lab (Sept. 2000)
. ES40 Workstation, 1 node, 4 processors, 8 Gb RAM, 900 Gb Disc
. BlastMachine, Paracel/Celera, 6 nodes, 12 processors, 24 Gb RAM
Departamento de
Bioquímica
Bioinformatics Lab (Sept. 2000)
. BlastMachine, Paracel/Celera, 6 nodes, 12 processors, 24 Gb RAM
The Paracel BlastMachine runs parallelized versions of NCBI BLAST, PSIBLAST, and MegaBLAST. It is an Intel-based cluster, consisting of 6 dualprocessor nodes. Each node contains two 933 MHz Intel PIII processors and
2 Gigabytes of memory. The operating system is a proprietary cluster variant of
Linux. One dual-processor node acts as the head node, scheduling and controlling
jobs. 10 processors are available to run sequence search jobs. Communication
between nodes and with the fileserver is via a 100 MHz Ethernet connection.
The BlastMachine accelerates the BLAST algorithms by dividing a job across
multiple processors. It can divide both the database and the query among
groups of processors. The number of pieces a database will be divided into is
determined at the time the database is formatted on the BlastMachine. The
BlastMachine always divides a given database the same way to insure precise
repeatability of results.
The BlastMachine has its own queueing and job control system. This system is
accessed from a host computer using the client program pb. The system attempts
to maximize total throughput; different search jobs can and will run simultaneously.
The actual number of processors assigned to any given job varies dynamically
with machine load.
Bioinformatics Lab (Sept. 2000)
. BlastMachine, Paracel/Celera, 6 nodes, 12 processors, 24 Gb RAM
The FAPESP-LICR
Human Cancer Genome Project
A Program for Human Gene Discovery
and Complete Sequence Compilation
Total (all groups together)
Mitochondria
62,114
(5.34%)
rRNA
44,891
(3.86%)
Bacteria
53,384
(4.59%)
Known Human Genes
203,569
(17.50%)
Unigene Contigs
263,930
(22.69%)
94,967
(8.16%)
7,914
(0.68%)
11,080
(0.95%)
ESTs
32
(0.00%)
DNA
126
(0.01%)
75,624
(6.50%)
344,658
(29.63%)
991
(0.09%)
Non-unigene ESTs
Paralogs
Non-Human Protein
Repeats
Nomatches
Human Protein
Total number of sequences
1,163,280
Last update: Fri May18 06:00:01 EST 2001
E. D. Neto et al., Proc. Natl. Acad. Sci. USA 97: 3491-3496, 2000
A. Camargo et al., Proc. Natl. Acad. Sci. USA 98: 12103-12108, 2001
H. Brentani et al., Proc. Natl. Acad. Sci. USA 100: 13418-13423, 2003
Are the novel transcripts expressed in prostate ?
Intronic EST
RefSeq
Genomic sequence
Intergenic ESTs
RefSeq
Genomic sequence
Exonic ESTs
RefSeq
Genomic sequence
SVA-IQUSP
12
Central Dogma
genomic DNA
Processing
Processing
mature sense mRNA
Translation
Translation
Protein
Processing
Processing
AAAAAA-
New
antisense RNA
sense RNA
regulation
Sense Pre-mRNA
Transcription
Transcription
regulation
regulation
Transcription
Transcription
Translation
Translation
Protein
Paradigm
oncogenesis = accumulation of mutations in protein-coding
genes that act as oncogenes/tumor suppressors
Revised paradigm
SVA-IQUSP
oncogenesis = mutations in protein-coding genes AND
alterations in regulatory non-coding antisense RNAs
19
The genomic revolution
Automated DNA sequencing
. Sanger dideoxy method
. Clone and sequence 500 bp frags
Complete sequencing of
the Human Genome (2001)
Sequencing of
5,340,464 human
Expressed Sequence Tags (ESTs)
Future (Aug 2006): “454” pyrosequencing
. whole genomes in 4 hours
. no cloning, sequence 100 bp frags
High-throughput methods
cDNA and oligo microarrays
for measuring
SAGE, MPSS
gene expression
21
Departamento de
Bioquímica
[email protected]
http://verjo2.iq.usp.br
SVA-IQUSP
28