CFTRlecture10

Download Report

Transcript CFTRlecture10

CFTR – gene cloning
and initial bioinformatic analysis
Riordan et 12(*) et Tsui (1989)
Science 245:1066
Carlow IT Bioinformatics
November 2006
* Including Francis Collins, later leader of the
Human Genome Sequencing Project
Cystic fibrosis
• Horrible inherited disease
– Affecting lung, pancreas, sweat-glands
• Abnormally high trans-membrane
electrical potential
– Decreased Cl- ion membrane transport
• Often associated with failure to respond to
ATP dependent kinase
– no phosphorylation: no function
More symptoms etc.
•
•
•
•
•
Difficult breathing
Early death (1959 6mths, 2006 38yrs)
More prone to infections (thicker mucus)
Can do pre-natal diagnosis or sweat test
"Woe is the child who tastes salty from a
kiss on the brow, for he is cursed, and
soon must die“ German proverb 1700s
• We modify AMPs defensins: can make
one effective in high salt environment??
Genetics & epidemiology
• Located on chr 7q31.2 180Kb gene
• 1 in 25 europeans carries a CFTR mutation so
1:2500 live birth have the disease
• Males and female equally affected
• Life expect higher in males – nobody knows why
• Why so common?
• Cholera toxin requires normal CFTR
• Also possible connexion with typhus
Mapping
• Genetic association with markers pinpoints
chromosome 7
• Chromosome walking to zero in
• NO genome sequence in those days
Clone and sequence
• Why bother?
– because we can!
– ? can predict features/functions
– ? Can compare CF v normal to identify mutation
• Working with cDNA not genomic
• Generate cDNA libraries from cells & cell-lines
• Screen for cDNAs that hybridise with known
CFTR fragment
• Eventually (much hard work) got 19 overlapping
cDNA clones
Fig 1
19 normal
clones
2 CF clones
Fig3 - where expressed
Patchy
expression
profile
Gene sequence
• Clones span 6.1kb of RNA
• ORF protein of 1480 amino acids
– So bigger than 300AA average
• In 1989 << 1000 human genes sequenced
• Bioinformatic analysis possible then:
–
–
–
–
–
Start codon, consensus seq for transl start + AUG
2nd structure prediction
Hydropathy plot
Homology searches (pre BLAST)
Glycosylation, Ser, Thr kinase sites
Start of ORF
• 5’- AGACCAUGCA-3’ in CFTR
• 5’-(CC)[A/G]CCAUGG(G) consensus
– Convinced?
– I’m not
The sequence 1
Exon splice
Trscr Start
AA count
RNA count
2 TM domains
Pred kinase sites
The sequence 2
First ATP Binding fold
Is underlined
Delta F 508
circled
Protein analysis
Whole protein is two similar halves each with 6 membrane
Spanning domains (hydropathic peaks) and two NBFs
(hydrophilic regions) and a charged R region
Fig6 – homology/similarity
Conserved, hydrophobic
DF508
Aromatic position at 508
Comparing two conserved regions in CFTR and other proteins: some with
Two, some with one similar region, multidrug resistance, transporters etc.
Structure of the fold
• Two halves similar structure but low AA
conservation (best is only 27/66 identities)
• Others in family have much tighter
conservation
• No signal peptide says that orientation of
first TM domain is (i – o)
• External loops very short
• …except between TM7 and TM8 where
there is N glycosylation site
More…
• R domain is one exon 69/241 residues are polar
alternating +ve and –ve charge regions
• Also most of the phosphorylation kinase sites
• All family members secrete something:
– Chloride (CFTR)
– Pigment (drosophila white gene)
– lytic peptide (E. coli hemolysin)
• …so what about the “function unknown” mbpX
gene in liverwort chloroplasts ?
More…
• Hypothesise that CFTR is the ion channel
• 10/12 of TM domains have >1 +ve AA
– ie. amphipathic helix
– cf. brain Na+ channel & GABA-R Cl- channel
• Contrast p-glycoprotein
– Closely realted but no +ve TM AAs
• Big protein – maybe also other functions
Fig 7 a composite model
Glycosylation
In colour from wikipedia
Conclude
• From very little data and very small DB
N=bases
N=seqs
• 1988 23,800,000
20,579
1989 34,762,585
28,791
1990 49,179,285
39,533
• 2000 11,101,066,288
10,106,023
• to compare with can make predictions
about structure and function that have
stood the test of time.
Postscript
•
DF508 may be about delivery of protein
to the membrane
– Functions fine if you trick cells to deliver!
• By 1995 300 different mutations identified
in the gene
• Last month 1531 different mutations at
– http://www.genet.sickkids.on.ca/cftr/StatisticsPage.html
• With human genome, SNPs, ESTs much easier to
interpret sequence information