Human Genome Structure and Organization

Download Report

Transcript Human Genome Structure and Organization

Human Genome Structure
and Organization
Bert Gold, Ph.D., F.A.C.M.G.
Genetic Variation
Phenotype
Expression of the genotype (modified by the environment).
The structural or functional nature of an individual.
Includes:
appearance, physical features, organ structure
biochemical, physiologic nature
Genotype
Genetic status, the alleles an individual carries.
Learning Objectives
Recap and Update Public and Private
Human Genome Project Status
Provide Reminders of Necessary
Background for Genetic Disease
Association and Linkage Studies
Definitions
• Penetrance -
The probability that an individual who is ‘atrisk’ for the disorder (ie- carries the gene) develops (expresses)
the condition. May be age dependent.
• Expression - The characteristics of a trait or disease that
are outwardly expressed. Eg-myotonic dystrophy: myotonia,
cataracts, narcolepsy, frontal balding, infertility.
• Ascertainment – The method used in gathering genetic
data. Study conclusions differ depending on how affected
individuals entered the study.
• Phenocopy – Individuals whose phenotype, under the
influence of non-genetic agents, has become like the one
normally caused by a specific genotype in the absence of nongenetic agents.
• Pleiotropy - The quality of an allele to produce more than
one effect; ie- to manifest its expression in the structure and/or
function of more than one organ system or tissue
• Recurrence Risk – Likelihood that a relative of a
proband for a rare disease will have the same disease.
Penetrance and Expressivity
• Penetrance: Proportion that expresses a trait
– Complete: P=1.0 or 100%
– Incomplete (“reduced”): P<1.0 or < 100%
• Expressivity: Severity of the phenotype
– Expressivity may vary
• Between families (interfamilial) or
• Within families (intrafamilial)
• TRY NOT TO CONFUSE “VARIABLE
EXPRESSIVITY” WITH “INCOMPLETE
PENETRANCE”
Chromosomes, Genes and
Proteins
Genes are on Chromosomes
Genes may encode proteins or RNA
Non-coding RNA ‘genes’
• tRNAs (497 were counted, 821 when count
genes and pseudogenes)
– tRNAs found are consistent with Wobble
– Codon bias only roughly correlated with tRNA
distribution
•
•
•
•
•
•
•
rRNAs
small nucleolar RNAs (snoRNAs)
snRNAs (spliceosome constituents)
7SL RNA
telomerase RNA
Xist transcript
Vault RNA
tRNAs
Some chromosomes are
richer in genes than others
3500000
3000000
Number of
Nucleotides
in
Exons
2500000
2000000
1500000
1000000
500000
0
1
3
5
7
9
11 13
15 17 19 21
Chromosomes
X
HOXA, HOXB, HOXC and HOXD are in
regions with a particularly low density
of repeats: This is believed to result
from the presence of Cis-acting
elements in this vicinity.
Proteins demonstrate patterns and
similarity of function
Functionally and Structurally similar
proteins are organized into families
e.g.- E.C., SWISS-PROT, TrEMBL,
In silico approaches to
characterize genes include:
• PFAM, searchable via HMMER
• Other in silico collections include:
–
–
–
–
PRINTS
PROSITE
SMART
BLOCKS
• Creation of an Integrated Protein Index
(IPI)
How many genes are there?
Estimates from the Public Program
–
–
–
–
–
–
–
–
–
RefSeq
Exons
Introns
Average Sizes
Coding Sequences (CDS)
Alternative splice products (about 3%)
Creation of an Integrated Gene Index (IGI)
Genscan to Ensembl to Pfam via GeneWise (31,778)
Could be as low as 24,500 using overprediction
corrections.
Estimates from Celera
25,086 in Assembly 3
• 25,086 in Assembly 3
Pre-existing estimates
• W. Gilbert’s back of the envelope
calculation
• Reassociation Kinetics
• Estimates from Double Twist using
Promoter Inspector plus
• Unpublished estimates from Human
Genome Sciences
Size of Genes:
•
•
•
•
•
Largest: Dystrophin 2.7 Mb
Titin
80,780 bp coding
178 exons
largest single exon 17,106
GENE HOMOLOGS,
ORTHOLOGS, PARALOGS
•
•
•
•
•
•
Vaculolar sorting machinery in yeast
ABC gene superfamily
Ig gene superfamily
FGF superfamily
Intermediate filament superfamily
PROTEIN FAMILY EXPANSION
APPEARS TO BE A PRIMARY
EVOUTIONARY MECHANISM
The proteome
•
•
•
•
•
Functional categories
PRINTS
Prosite
Pfam
Interpro (http://www.ebi.ac.uk/interpro/)
GENE ONTOLOGY
• Standard Vocabulary
• Hierarchy of terms (Directed ACYCLIC
Graph)
• Ashburner Nature Genetics 25:25-29
(2000)
• ‘Bushy’ model
Horizontal Transfer controversy
• One of the major conclusions of the Public Genome effort,
published in Feb. 15, 2001 Nature was:
“Hundreds of human genes appear likely to have resulted
from horizontal transfer from bacteria at some point in the
vertebrate lineage. Dozens of genes appear to have
been derived from transposable elements”
• This has now been widely disputed and is believed to result
from:
– Microbial contaminants in the sequence.
– Bacterial gene integration into pre-vertebrates
– And
• “The more probable explanation for the existence of
genes shared by humans and prokaryotes, but missing in
nonvertebrates, is a combination of evolutionary rate
variation, the small sample of nonvertebrate genomes,
and gene loss in the nonvertebrate lineages. “
-Salzberg et. al., Science
Splice Pattern, 98% GT-AG
Chromatin Structure
• Euchromatin
• Heterochromatin
• Nucleosomes
Chromosome Facts
• Chromosomes replicate during S
phase
• Chromosomes recombine during
Pachytene
• Recombination is an obligate activity
• Sex chromosomes recombine with each
other
Cytogenetics is done by Karyotyping
• Chromosomes are chemically frozen in
metaphase
• Must be carried out on dividing cells
• Microfilament inhibitors
• Microtubule inhibitors
• Membrane lysis
• Pronase, trypsin digest
• Giemsa stain
• G-bands correspond to regions of relatively low
GC content
http://genome.ucsc.edu/goldenPath/mapPlots/
http://genome.ucsc.edu/goldenPath/hgTracks.html
Cell Division: Meiosis
– Segregation
• Defined: Alleles are paired; gametes
receive one of each.
• Exceptions: trisomy and uniparental disomy
– Independent Assortment
• Gene Pairs segregate independently
• Exception: linkage
Meiosis Creates Gametes
And provides a basis for genetic
recombination!
Genetic Recombination
• Crossing Over
• Resolution
• Recombinant Chromosomes
– OBLIGATE ACTIVITY
– FEMALE RECOMB. RATES HIGHER THAN
MALE
– INCREASED RATES AT TELOMERES
– PARADOX: SHORT ARMS SHOW MORE THAN
LONG ARMS
– 1cM is 1 Mb on long arms, but short arms are 2 cM
per Mb and the Yp-Xp pseudoautosomal region is 20
cM per Mb.
INCREASED RATES AT TELOMERES
PARADOX: SHORT ARMS SHOW
MORE THAN LONG ARMS
Genes
• Units of heredity
• Encode proteins (and some RNAs)
• Human genetics is the study of gene variation in
humans
• ‘Gene’ as a term is used ambiguously to refer
both to the ‘locus’ and the ‘allele’ ie- There is only
one locus but two alleles in a given individual.
• Sequencing in both genome projects took place
upon multiple alleles; this has led to some
assembly confusions.
• Ultimately want a haploid genome map.
The Human Genome Project
• International public effort commencing in 1990 to
sequence the entire human genome by 2005.
• STS approach chosen in 1991
• Private effort launched in 1996 by Celera using
‘Shotgun’ cloning
BAC clones, sequenced into BAC end
reads, and assembled into ‘contigs’
Markerless ‘contigs’ in the Celera
assembly are called ‘Scaffolds’
Markers are BAC ends in the
‘shotgun’
Mate pair reads provided
the core of Celera sequence
Draft human genome sequences
complete by February 2001.
• Published simultaneously in Feb. 2001
– Public Sequence in NATURE (409: 745-964)
– Celera Sequence in SCIENCE (291: 11451434)
Greater than 50% of sequence
is repetitive
45% of the human genome is derived
from transposable elements
• Long Interspersed Elements: LINEs (21% of genome)
– LINE1 – Some Still Active, Autonomous, consist of two ORFs
(one is a pol).
– LINE2
– LINE3
• Short Interspersed Elements: SINEs (13% of genome)
– ALU – Some still active, use L1 enzymes to replicate
– MIR
– Ther2/MIR3
• LTR Retroposons
– Consist of gag and pol
– Protease, rt, RNAseH, integrase all encoded
– Reverse transcription occurs cytoplasmically, using a tRNA to
prime replication
• DNA Transposons
98.5 % of sequence is non-coding.
Approximately 1/3 of the human genome
is transcribed (public guess).
Allelism
•
•
•
•
•
Alternate forms of a gene
e.g.- Sickle Cell, CFTR
Recessive disease
e.g. Achondroplasia, Tuberous Sclerosis
Dominant Disease
Heterozygote or
Homozygote
• 1,2 or 1,1
• homogeneity of alleles at a locus
Genetic Markers
•
•
•
•
•
•
•
RFLPs
VNTRs (STRs)
Microsatellites
STSs
SNPs
“Tools” used to find disease genes
“Flags” with locations throughout the
genome
Polymorphism Information Content
versus Heterozygosity (PIC vs. het)
• Determining heterozygosity from SNP rare
allele frequency
• Information Content in SNPs versus
STRs
Typology of SNPs
• Type I- Coding, non-synonymous, nonconservative
• Type II- Coding, non-synonymous, conservative
• Type III- Coding, synonymous
• Type IV- Non-coding, 5’-UTR
• Type V- Non-coding, 3’UTR
• Type VI- Other non-coding
• Type I and Type II SNPs have lower
heterozygosity than other SNPs, presumably as a
result of selective pressure.
– About 25% of type I and type II SNPs have minor allele
frequencies > 15%
– About 60% have minor allele frequencies < 5%
Mutation
• Occurs more often during male meiosis
• Occurs more often in ‘long genes’
• More easily detected in Dominant
Diseases
– Achondroplasia
– Duchenne Muscular Dystrophy
• May often involve CpG mutating to TpG
Autosomal Recessive Inheritance
• Two copies of a gene required to be affected
• Carriers have one copy of the mutation and are
unaffected
• 25% of offspring of two carriers will be
affected
• Males and females affected in equal number
• Eg. Sickle Cell, beta-thal., CF
X Linked Recessive (Sex
Linked)
• Females rarely affected
• No male to male transmission
• Affected males transmit gene to all
daughters
• Eg- Duchenne Muscular Dystrophy,
Hemophilia A
Autosomal Dominant
Inheritance
•
•
•
•
Each child at 50% risk
Does not skip generations
Often, lethal in double dose
Large genetic load
X-linked Dominant Pedigree
• Example is Hypophosphatemic, Vitamin
D Resistant Rickets
• Distinguished from Autosomal Dominant
by:
– No male-to-male transmission
– All daughters of affected fathers are affected
IMPORTANT NOTE:
Dominant and Recessive refer to the
phenotypic expression of alleles, NOT to
intrinsic characteristics of gene loci.
Inheritance Pattern Complexities
• Pseudodominant Transmission of a Recessive
• Pseudorecessive Transmission of a Dominant
– Misassigned paternity, causal heterogeneity,
incomplete penetrance, germline mosaicisim
• Mosaicism
• Mitochondrial Inheritance
• Penetrance and Expressivity
– Semi-dominant, gender- influenced, age-related,
transmission-related, imprinting
• Uniparental Disomy (UPD)
• Environmental effects, phenocopies
Preview of linkage analysis
• Characterizing Human Genetics:
–
–
–
–
Long generation time
Inability to control matings
Inability to control study population
Inability to control exposures to environmental
conditions
– It is possible to define phenotypes well!
– Can study genetic structures through family history
– Link phenotypes and genetic structures through
statistical methods