Transcript OGP

Other Genome Projects
BIOL 473
Summer 2003
Why Other Genomes?
•
•
•
•
•
•
Proof of principle
Refinement and advancement of technology
“Relatively simple” data management
Models of human disease
Easy/inexpensive to culture/grow
Many mutant strains/lines already identified
Importance of Mutant Organisms
in Identification of Gene Function
Mutant

Molecular Defect

Gene Function
Vertebrates
Rodent Genome Projects
• ~100 years of genetic research to support genomic
findings
• Hundreds of mutant strains, well-characterized
genealogies of common strains (esp mice)
• Evolutionary position relative to human:
– Close: similar development, physiology & disease
– Divergent: conserved blocks of sequence suggest
essential function
Rodent/Human Genome Comparison
• Extensive conservation of nucleotide sequences
– Protein-coding regions (genes)
– Small, noncoding intergenic regions (SNCIRs)
• Suggests unknown but important function, perhaps regional
control of multiple gene expressions
• Extensive conservation of gene order (synteny)
– MMU11 syntenic with HSA5 at 1 MB IL region: Perfect
correspondence of order, orientation, and spacing of 23 different
genes
– Supports common ancestry
– Suggests segmental rearrangement of chromosomes during
evolution
Zebrafish (Danio rerio)
•
•
•
•
Development rapid and transparent
Easy to grow
Dense map of genetic markers
Many species-specific cell biology tools
– Including human gene transfer
– Including RNAi
– Including organogenesis pathways
• Significant synteny with human and mouse
• >90% similar set of genes with human
Pufferfish
Fugu rubripes
•
•
•
•
•
Tetraodon nigroviridis
Same gene information as humans in 1/8 DNA
Lacks many repeats
Very Small introns (many same ex/in struct)
400 MY of gene sequence and order conservation
Control regions easy to detect: closer to genes/less nonconserved
intergenic region
• 21 chromosomes all smaller than human 21
– Microchromosomes are gene dense
• Important for understanding
– Unknown mechanisms of gene expression control
– Chromosomal expansion
– Function and persistence of “junk DNA”
Other Verts
• Salmon, sticklebacks, cichlids, and other commercial fish
• Cats and Dogs
– Common diseases with humans
– Important models of morphological variation
– Important models of behavioral variation
• Chimpanzee
– Mechanisms of pathogen resistance, incl HIV susceptibility
– genetic changes crucial for evolution of Homo sapiens
• Agrispecies (cattle, horses; true for crops as well)
– Whole genome sequencing prohibitively expensive
– Partial genome sequencing and SNPS enhance decades of selective
breeding data
• First nutria genome report appeared July 2002!!
– Kass & Doucet: Molecular Phylogeny of the Louisiana Nutria.
Proc. LA Acad. Sci. 63:10-24.
The
Founders of
Nutria
Genomics
Why nutria?
Invertebrates
Why?
• Proof-of-principle: sequencing multicellular organisms
• Provide understanding of complex
organismal functions
• Support decades of genetic research (esp
with Drosophila)
Invertebrate Genome Projects
Genomic Surprises in
Drosophila melanogaster & Caenorhabditis elegans
• Gene Expression anomalies
–
–
–
–
•
•
50% more genes in Ce, despite complexity of Dm( # cells, # cell types, morphogenesis)
Large gene families
–
–
•
Ce: steroid hormone-receptor gene family
Dm: olfactory receptor gene family
High conservation of major regulatory and biochemical pathways
–
–
•
•
•
Ce: leader sequence transplicing
Ce: polycistronic transcripts
Dm: high variance in transcript length
Dm: some distant regulatory sequences & long introns
some lost to parasitism in Ce
Some novel to Dm due to complete morphogenesis
RNAi highly effective in Ce: 90% gene knockout in 2/5 chromosomes
Models for human disease: 50-60% human disease genes have Ce &Dm orthologs
Models for drug development
–
–
Prozac resistance in Ce; ETOH tolerance in Dm
No presumption that trait is same, but molecular interaction b/w gene products conserved even
when they affect distinct processes
Plants
Arabidopsis thaliana
• First plant genome sequenced entirety
• 115Mb: about same size as D. melanogaster but 2X
genes (25,500)
–
–
–
–
–
Two rounds of whole genome duplication
Extensive chromosome reshuffling
Considerable gene loss after duplication
1500 tandem arrays repeated genes (2-3 copies @)
Only 11,000 gene families  minimum for complex
multicellularity
• 800 nuclear genes of plastid descent
– Likely ongoing process
– Plastid-targeting signal lost; now function in cyto
• 10% genome is novel miniature repeats (MITEs,
MULEs)
Classes of Arabidopsis genes
absent/underrepresented in animals:
• Enzymes for cell wall biosynthesis
• Transcellular transport proteins
•
•
•
•
– Minerals, organics, metabolites, toxins, macros
Photosynthesis enzymes (rubisco, ETSs)
Mediators of trophisms (turgor pressure, light, gravity sessility)
Enzymes and cytochromes for secondary metabolites
Many R genes (pathogen resistance); interspersed, not clustered
Classes of animal genes
absent/underrepresented in Arabidopsis:
Ras G-protein family • Tyrosine kinase receptors • Nuclear steroid receptors
Other Plants Projects & Why?
•
•
Projects underway for 50 different species
Rice and Maize: small genomes, economically important
–
–
–
•
Focus on QTLs rather than Mendelian (single-locus) traits
–
•
•
Resistance, flowering time, tolerance, sugar content, etc.
Domesticated/wild relationships: maize vs. teosinte
Mutation/morphology relationships: Brassica oleracea
–
•
Cabbage, kale, Brussels sprouts, broccoli, cauliflower, kohlrabi
Support of classical genetics
–
•
Sweet pea, snapdragon
Support of forestry (Poplar: small genome, easy to grow)
–
–
–
•
Many commercial crops plants are polyploid, and genomes are too large to be
feasibly sequenced in entirety
Must rely on comparative genomics to support hybridization data
Rice and Arabidopsis show extensive but complex synteny
Lumber improvement (lignins, enzymes)
Biomass-biofuel improvment
Bioshperic carbon fixation
Parent/Ecotype crop comparisons by comparative genomics