Linkage analysis
Download
Report
Transcript Linkage analysis
6
Linkage analysis
Jan Hellemans
Finding causal mutations
2 opposing strategies
sequence then select
select then sequence
Sequencing
traditional Sanger sequencing only possible after selection
Massively parallel sequencing possible prior to or after selection
RNA sequencing
exome sequencing
genome sequencing
Finding causal mutations
Selection
positional (prior to sequencing)
linkage analysis
GWAS
structural variations (e.g. microdeletions)
functional (prior to & after sequencing)
candidate genes selected based on known function or involvement
in related disorders
filtering of variants based on functional predictions
overlap (after sequencing)
looking for genes / variants that occur in multiple independent
patients
mostly a combination is used
exome sequencing
Aims
Interprete microsatellite results
Add genotypes to pedigrees
Create pedigree and genotype files
Calculate and interprete LOD-scores
Delineate linkage intervals
Basic principles of linkage analysis
Analyze other types of markers
Association studies
Learn how to work with specific pedigree programs
Starting linkage analysis
Preparations
Clearly define the phenotype
If not specific enough than you may analyze different disorders that can
map to different genomic loci
LOD scores are additive
Find suitable families
larger is better
more patients is better
Collect genomic DNA from as much family members as
possible
Determine the type of inheritance
Calculate the power to prove linkage with the available
material (SLink – not part of this course)
Linkage analysis types
Directed linkage analysis
Evaluate linkage at a specific locus such as a candidate gene
Common approach: evaluate an intragenic, 5’ and 3’ marker
often microsattelites
Genome wide linkage analysis
Screen for linkage for markers spread across the entire genome
Microsatellites: ~400 markers spaced at about 10cM
SNP’s: 500k SNP array
Homozygosity mapping
Screen only affected individuals in inbred families
Select homozygous markers (typically SNP markers)
Very efficient technology
Fine mapping
Some linked markers are known, but the borders of the linkage interval
still need to be defined
Exercise – Part 1
2 inbred families with a recessive disorder
With a homozygosity mapping based on 500k SNP
arrays 2 candidate regions could be identified
Chromosome 4
Patient 1 homozygous for
6.052Mb - 14.488Mb
21.008Mb – 37.477Mb
Patient 2 homozygous for
11.186Mb – 37.219Mb
40,000
35,000
30,000
25,000
20,000
15,000
Task: find microsatellite markers to
confirm linkage
10,000
5,000
1
2
Find additional flanking markers
Find physical position of marker in NCBI > UniSTS
NCBI map viewer: http://www.ncbi.nlm.nih.gov/mapview/
Go to Homo sapiens and to the wright chromosome
Maps & options: show
DeCode, Généthon & Marshfield (genetic maps)
Genes
Set region: e.g. 2Mb up- and downstream of your marker
Click ‘Data as table view’
Click on STS behind a marker to see its details
Select markers that
locate to only 1 genomic location
have a PCR product with an extended size range
one size not polymorphic
http://www.ncbi.nlm.nih.gov/projects/mapview
http://www.ncbi.nlm.nih.gov/projects/mapview
http://www.ncbi.nlm.nih.gov/projects/mapview
Exercise – Part 1 > possible solution
Markers in 1st candidate region
D4S3017 (21.078Mb)
D4S3044 (25.189Mb)
D4S1618 (33.857Mb)
D4S3350 (33.857Mb)
D4S2988 (36.889Mb)
Markers in 2nd candidate region
D4S1582 (10.311Mb)
D4S2906 (12.321Mb)
D4S2944 (13.141Mb)
D4S1602 (14.059Mb)
D4S2960 (15.437Mb)
Order primers & analyze them on all family members
Analyzing microsatellite data
Microsatellites > basics
Repeats of short sequences (e.g. 2bp)
NNNNAC(AC)nACNNNN
Number of repeats is variable (instable sequence)
Number of repeats determines the allele
Number of repeats corresponds to specific length of
PCR product:
allel 1: NNNNACACACACACNNNN
allel 2: NNNNACACACACACACNNNN
allel 3: NNNNACACACACACACACNNNN
...
(5*AC 18bp)
(6*AC 20bp)
(7*AC 22bp)
Determine length to know the allele (sequencer)
Microsatellites > basics
Microsatellites > determine size
Use internal size standard (other color)
220bp
230bp
225bp
Microsatellites > heterozygotes
220bp
230bp
223bp
225bp
Microsatellites > stutter peaks
Repeats are difficult to copy polymerase slips
Some amplicons have 1 repeat less
a few even loose multiple repeats
Small repeats are more prone to slippage and show
more pronounced stutter peaks
Largest product is the correct one
Distance between peaks = length of a repeat
Microsatellites > stutter peaks
allelic peak
1st stutter peak
2nd stutter peak
Microsatellites > stutter peaks
Allelic peaks are the heighest
Stutter peaks are lower
A1
A2
Microsatellites > stutter peaks
A1
A2
Microsatellites > +A peaks
Taq polymerase tends to add an extra A at the 3’ end
Variable degree of products with or without this extra A
Do not confuse with stutter peaks (only 1bp difference)
allelic peak
allelic peak + A
1st stutter peak
1st stutter peak + A
2nd stutter peak
2nd stutter peak + A
Microsatellites > complex plots (stutter & +A)
A1
A2
Microsatellites > mutliplex
Combine multiple markers in a single analysis ($$$)
Different size range
Multicolor
Commercial kits: e.g. 16 markers / lane
Microsatellite plots examples
Genotyping pedigrees
Genotyping pedigrees
Screen one or multiple markers for some or all family
members
For every marker:
Make a list of all occuring allele sizes
Due to technical variation on sizing the same allele can have a slightly
different size in different measurements (-0.4bp _ +0.4bp). Give all
alleles within this range the same allele number
Add the allele numbers to the pedigree at the corresponding
individual/marker combination
Find the wright phase
Advanced software like GeneMapper can generate
tables with allele numbers for every sample / marker
Advanced pedigree programs like Progeny can store
genotype information for family members
Verify inheritance
Exercise – Part 2
Genotype 3 markers in all available individuals of 2
families
Pedigrees & microsatellite plots in
ExercisePart2-GenotypingData.pdf
Add allele numbers for the 3 markers to the pedigree
Interprete the genotyped pedigrees: linked?
Family 1
Family 2
Exercise – Part 2 > Conclusions
D4S1582
Mendelian error can not be interpreted
D4S2944
Linked
D4S3017
Not-linked: unaffected individuals with the same genotype as a patient
Calculate LOD scores
EasyLinkage
EasyLinkage = UI for linkage analysis
http://genetik.charite.de/hoffmann/easyLINKAGE/index.html#start
Bioinformatics. 2005 Feb 1;21(3):405-7
PMID: 15347576
Bioinformatics. 2005 Sep 1;21(17):3565-7
PMID: 16014370
Interface for many linkage analysis programs
Input
Pedigree file (linkage format)
Genotype file(s)
Marker information (already provided for popular markers)
Settings
Pedigree file
Naming requirements for EasyLinkage:
p_xxx.pro e.g. p_SMMD.pro
Format:
Tab delimited text file
1 individual per row
Columns:
1 family ID
2 person ID
3 father ID
4 mother ID
5 sex (1=male, 2=female, 0=unknown)
6 affection status (1=unaffected, 2=affected, 0=unknown)
7 DNA availability (optional, relevant for power calculations)
8 liability class (to be provided if multiple liability classes are used)
Genotype files
Person ID’s have to match exactly with those provided in
the pedigree file
Naming requirements for EasyLinkage:
MarkerName_xxx.abi e.g. D1S1609_SMMD.abi
Format:
Tab delimited text file
1 individual per row
Columns (for microsatellite based analysis):
1 marker (same as in file name and matching a marker in an
available marker set)
2 custom information (content doesn’t matter, but column must be
present)
3 individual ID (match person ID in pedigree file)
4 & 5 genotypes for 2 alleles (unknown=0)
Marker information
Contains information on the chromosome and position of
every marker
Already available for a number of commercial SNParrays and for the microsatellite markers from
Genethon
Marshfield
DeCode
Custom marker sets can be created (see manual)
EasyLinkage settings
Choose a program:
FastLink Parametric, single-point
SuperLink Parametric, single-/multipoint
SPLink Nonparametric, single-point
Genehunter Nonpara-/parametric, single-/multipoint
Genehunter Plus Nonpara-/parametric, single-/multipoint
Genehunter MOD Nonpara-/parametric, single-/multipoint
Genehunter Imprinting Nonpara-/parametric, single-/multipoint
GeneHunter TwoLocus Parametric, two-locus, single-/multipoint
Merlin Nonpara-/parametric, single-/multipoint
SimWalk Nonparametric, single-/multipoint
Allegro Nonpara-/parametric, single-/multipoint & simulation, single/multi-point
PedCheck Mendelian error check
FastSLink Simulation, single-/multi-point
EasyLinkage settings
Parametric <-> non-parametric
Single point <-> multipoint
Frequency of the disease allele
Penetrance vectors (wt/wt, wt/mt, mt/mt)
Standard dominant: 0 1 1
Standard recessive: 0 0 1
Reduced penetrance: replace 1 by penetrance (e.g. 0.9)
Phenocopy: replace 0 by percentage of phenocopy (e.g. 0.1)
Example: 0.01 0.9 0.99
1% chance to show a similar phenotype despite a normal genotype
90% chance to show the phenotype when 1 mutant allele (dominant
with incomplete penetrance)
99% likelihood to present with the phenotype if both alleles are mutant
Evaluate calculated LOD-scores
Maximum LOD-scores can be seen in EasyLinkage
Details about LOD-scores at different recombination
fractions can be found in text files generated by
EasyLinkage process in Excel (generate graphs, ...)
Standard rules for LOD-scores
>3 significant linkage
2<LOD<3 suggestive linkage
-2<LOD<2 uninformative
<-2 significant absence of linkage
Interpreting LOD plots
5
5
4
4
3
3
2
2
1
1
0
0
0
0,1
0,2
0,3
0,4
0,5
-1
-1
-2
-2
-3
-3
-4
-4
-5
-5
5
5
4
4
3
3
2
2
1
1
0
0
0,1
0,2
0
0,1
0,2
0,3
0,4
0,5
0
0
0,1
0,2
0,3
0,4
0,5
-1
-1
-2
-2
-3
-3
-4
-4
-5
-5
0,3
0,4
0,5
Exercise – Part 3
Generate one pedigree file containing all family
members of both families (use Global ID’s)
Generate a genotype file for each of the tested markers
Run SuperLink analysis with the right settings
Evaluate results
Exercise – Part 3 > Results
Strengthen the evidence
Analyze more family members
Analyze more families
Analyze flanking markers
Look for more informative markers that result in higher LOD-scores
A series of flanking markers allows for multipoint linkage analysis
A series of linked markers gives more confidence (subjective)
Flanking markers can also be used to fine-map the linkage interval
Determine the linkage interval
NL
?
L
NL
NL
L
?
...
L
?
NL
L
L
?
NL
candidate
region
Exercise 2: find the linkage interval
Post linkage
Create a list of all the genes within the linkage interval
NCBI map viewer
UCSC (also for non-coding RNA’s)
Evaluate known gene functions for relevance to the
investigated phenotype
Sequence genes
Start with those that seem the most relevant to the disorder
Start with the coding regions
Screen the entire region with capture sequencing
Finding a mutation and proving its causality is the
ultimate proof