The Fosse Way Cohort

Download Report

Transcript The Fosse Way Cohort

UK Biobank:
how big is “big”?
Paul Burton
Dept of Health Sciences
Dept of Genetics
University of Leicester
PHOEBE
Structure of talk




What is UK Biobank?
How big is “big”?
Expected event rates in UK Biobank
Biobank harmonisation
PHOEBE
What is UK Biobank?
PHOEBE
Basic design features
A prospective cohort study
500,000 adults across UK
Middle aged (40-69 years)
A population-based biobank
• Not disease or exposure based
 “Broad spectrum” not “fully representative”
 Individuals not families
 MRC, Wellcome Trust, DH, Scottish Executive
• £61M




PHOEBE
Basic design features
 Longitudinal health tracking
 Nested case-control studies
 Long time-horizon
 Owned by the Nation
 Central Administration – Manchester
• PI: Prof Rory Collins - Oxford
 6 collaborating groups (RCCs) of university
scientists
PHOEBE
Justification for UK Biobank
 Primary justifications
• Roles that can best be fulfilled by a new large
cohort study of the type represented by UK
Biobank
 Secondary justifications
• Roles that could be provided by other types of
study, but given that UK Biobank is to go ahead
anyway these additional roles can be taken on
at relatively low marginal cost
PHOEBE
A platform for research in
biomedical science
 Studies of the joint effects of genes and
environment/life-style
 Genotype-based studies***
 The genetics of disease progression
 Direct association of genes with disease***
 Universal controls
 Family-based studies
PHOEBE
How big is “big”?
With Anna Hansell, Imperial College
PHOEBE
The statistical power of
case-control studies
 Contemporary pre-eminence of genetic
association studies rather than genetic linkage
studies
 Covers both stand-alone case-control studies,
and nested case-control studies in large
cohorts
 Sample size determining in both settings
PHOEBE
Power calculations
 Work with the least powerful setting
• Binary disease, binary genotype, binary
environmental exposure
 Logistic regression; interactions = departure
from a multiplicative model
 Complexity (arbitrary but realistic)
Frailty
variance
Symmetrical Symmetrical
Disease to Non-diseased Critical
genotyping environmental non-diseased to diseased P-value
error rate
error rate
error rate
error rate
100 fold
PHOEBE
1%
20%
80%
0.2%
10-4
Power
80%
Summarise power using MDORs
calculated by ‘iterative simulation’
 Estimate minimum ORs detectable with 80%
power at stated level of statistical significance
under specified scenario
PHOEBE
Summary of results
 80% power for genotype frequency = 0.1
• Genetic main effect  1.5, p=10-4  5,000 cases
• Genetic main effect  1.3, p=10-4  10,000 cases
• Genetic main effect  1.2, p=10-4  20,000 cases
• Genetic main effect  1.4, p=10-7  10,000 cases
• Genetic main effect  1.3, p=10-7  20,000 cases
• G:E interaction with environmental exposure
prevalance = 0.2  2.0, p=10-4  20,000 cases
PHOEBE
Expected event rates
in UK Biobank
With Anna Hansell, Imperial College
PHOEBE
Taking account of






Age range at recruitment 40-69 years
Recruitment over 5 years
All cause mortality
Disease incidence (“healthy cohort effect”)
Migration overseas
Withdrawal from the study
PHOEBE
Bladder cancer
Breast cancer (F)
Colorectal cancer
Prostate cancer (M)
Lung cancer
Non-Hodgkins lymphoma
Ovarian cancer (F)
Stomach cancer
Stroke
MI and coronary death
Diabetes mellitus
COPD
Hip fracture
Rheumatoid arthritis
Alzheimer’s disease
Parkinson’s disease
PHOEBE
Time to
achieve
1,000
cases
11 yrs
4 yrs
5 yrs
6 yrs
7 yrs
11 yrs
12 yrs
16 yrs
5 yrs
2 yrs
2 yrs
4 yrs
7 yrs
7 yrs
7 yrs
6 yrs
Time to
achieve
2,500
cases
19 yrs
6 yrs
9 yrs
9 yrs
12 yrs
22 yrs
26 yrs
29 yrs
8 yrs
4 yrs
3 yrs
6 yrs
11 yrs
14 yrs
10 yrs
10 yrs
Time to
achieve
5,000
cases
31 yrs
10 yrs
14 yrs
14 yrs
19 yrs
12 yrs
5 yrs
4 yrs
8 yrs
15 yrs
27 yrs
13 yrs
15 yrs
Time to
achieve
10,000
cases
17 yrs
22 yrs
22 yrs
34 yrs
18 yrs
8 yrs
6 yrs
13 yrs
21 yrs
18 yrs
23 yrs
Time to
achieve
20,000
cases
40 yrs
42 yrs
41 yrs
28 yrs
13 yrs
10 yrs
23 yrs
31 yrs
23 yrs
37 yrs
Smaller sample sizes
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]
[9,]
[10,]
[11,]
[12,]
[13,]
[14,]
[15,]
[16,]
Bladder cancer
Breast cancer:females only
Colorectal cancer
Prostate cancer:males only
Lung cancer
Non-Hodgkins lymphoma
Ovarian cancer:females only
Stomach cancer
Stroke
MI and coronary death
Diabetes Mellitus
COPD
Hip fracture
Rheumatoid.Arthritis
Alzheimers Disease
Parkinsons Disease
PHOEBE
500K
7500
21800
21800
22000
12600
4900
3800
3900
34700
59200
89300
35500
37900
6800
73900
24900
250K
3800
10900
10900
11000
6300
2400
1900
1900
17400
29600
44600
17700
18900
3400
36900
12500
100K
1500
4400
4400
4400
2500
1000
800
800
6900
11800
17900
7100
7600
1400
14800
5000
Conclusions
 Having taken account of realistic bioclinical
complexity, a cohort-based biobank such as UK
Biobank needs to be very large if it is to provide a
stand-alone infrastructure
 Anything much less than 500,000 recruits will severely
curtail the number of diseases that will be able to be
studied based on that biobank alone
 The value of any biobank will be greatly augmented if
it proves possible to set up a coherent and
scientifically harmonised international network of
biobanks and large cohort studies
PHOEBE
Harmonising
biobanks
internationally
PHOEBE
Why harmonise?
 Investigate less common (but not rare!!!)
conditions
• UKBB: Ca stomach 2,500 cases in 29 years

6 UKBB equivalents:  10,000 cases in 20 years
 Investigate smaller ORs
• GME 1.5  1.2 requires 5,000  20,000

4 UKBB equivalents
 Analysis based on subsets – homogeneous
classes of phenotype, or e.g. by sex
PHOEBE
Why harmonise?
 Earlier analyses
• UKBB: Alzheimers disease, 10,000 cases in 18 yrs

5 UKBB equivalents  9 years
 Events at younger ages
 Broad range of environmental exposures
 Aim for 4-6 UKBB equivalents
• 2M – 3M recruits
PHOEBE
International biobank
harmonisation programs
 Public Population Program in Genomics (P3G)
• Tom Hudson, Bartha Knoppers, Leena Peltonen …
 Population Biobanks
• FP6 Co-ordination Action (PHOEBE – Promoting
Harmonization Of Epidemiological Biobanks in Europe)
Camilla Stoltenberg, Leena Peltonen, Paul Burton …
•
 Human Genome Epidemiology Network (HuGENet)
• Muin Khoury, Julian Little …
PHOEBE
Extra slides
PHOEBE
Disease
Polymorphism
Approximate
frequency of the
disease
associated allele
Approximate
odds ratio
for disease
associated
allele
F5
Leiden
Arg506Gln
0.03
4
12
CARD15
3 SNPs
0.06(composite)
4.6
67
Alzheimer’s
disease
APOE
2/3/4
0.15
3.3
13,68
Osteoporotic
fractures
COL1A1
Sp1 restriction
site
0.19
1.3
69,70
Type 2
diabetes
KCNJ11
Glu23Lys
0.36
1.23
71
Type 1
diabetes
CTLA4
Thr17Ala
0.36
1.27
72,73
Graves’
Disease
CTLA4
Thr17Ala
0.36
1.6
74
Type 1
diabetes
INS
5’ VNTR
0.67
1.2
75
Bladder
Cancer
GSTM1
Null (gene
deletion)
0.70
1.28
76
Type 2
diabetes
PPARG
Pro12Ala
0.85
1.23
11
Thrombophilia
Crohn’s
disease
Gene
Ref
Hattersley AT, McCarthy MI. Lancet 2005;366:1315-1323
Examples of some polymorphisms or haplotypes that
have shown consistent association with complex disease
PHOEBE
Genetic main effects
Prevalence
of
genotypic
exposure
PHOEBE
0.05
0.1
0.2
0.33
1,000
2.64
2.08
1.79
1.68
Number of cases
2,500
5,000
10,000
1.99
1.67
1.48
1.65
1.46
1.31
1.47
1.32
1.24
1.39
1.28
1.19
0.5
1.65
1.39
1.27
1.19
20,000
1.32
1.22
1.16
1.14
1.13
Whole genome scan
 Genetic main effect, p<10-7
Prevalence of
genotypic
exposure
PHOEBE
0.05
0.1
0.2
0.33
2,500
2.30
1.87
1.63
1.54
Number of cases
5,000
10,000
1.86
1.65
1.60
1.42
1.44
1.30
1.39
1.24
0.5
1.52
1.35
1.24
20,000
1.41
1.27
1.21
1.18
1.16
Gene-environment interaction
 20,000 cases
Genotype Prevalence
Prevalence of
Environmental
Exposure
0.1
0.2
0.33
0.5
PHOEBE
0.1
0.2
0.33
0.5
2.47
1.97
1.79
1.79
1.94
1.67
1.58
1.61
1.82
1.58
1.47
1.46
1.72
1.54
1.45
1.44
Gene-environment interaction
 10,000 cases
Genotype Prevalence
Prevalence of
Environmental
Exposure
0.1
0.2
0.33
0.5
PHOEBE
0.1
0.2
0.33
0.5
3.03
2.32
2.15
2.16
2.36
1.95
1.80
1.86
2.11
1.87
1.68
1.70
2.05
1.78
1.64
1.70
Rarer genotypes
 Genetic main effects
Number of Genotype Environmental Symmetrical Symmetrical
Disease to Non-diseased Critical
cases
prevalence
exposure
genotyping environmental non-diseased to diseased P-value
prevalence
error rate
error rate
error rate
error rate
10000
20000
10000
20000
0.01
0.01
0.002
0.002
PHOEBE
0.2
0.2
0.2
0.2
1%
1%
1%
1%
20%
20%
20%
20%
80%1
80%1
80%1
80%1
0.2%1
0.2%1
0.2%1
0.2%1
10-4
10-4
10-4
10-4
Minimum
detectable*
OR for
genetic
effect
2.37
2.09
[[8.2]]
[[5.1]]
Proposed assessment visit model
Welcome
5
Consent
5
Blood/Urine
10
Touchscreen Questionnaire
25
Interviewer Questionnaire
5
Physical Measures
15
Exit
5
TOTAL
PHOEBE
70
Taking account of







Age range at recruitment 40-69 years
Recruitment over 5 years
All cause mortality
Disease incidence (“healthy cohort effect”)
Migration overseas
Comprehensive withdrawal (max 1/500 p.a.)
Partial withdrawal (c.f. 1958 Birth Cohort)
PHOEBE
PHOEBE
Necessary to contact subjects
Bladder cancer
Breast cancer (F)
Colorectal cancer
Prostate cancer (M)
Lung cancer
Non-Hodgkins lymphoma
Ovarian cancer (F)
Stomach cancer
Stroke
MI and coronary death
Diabetes mellitus
COPD
Hip fracture
Rheumatoid arthritis
Alzheimer’s disease
Parkinson’s disease
PHOEBE
Time to
achieve
1,000
cases
12 yrs
4 yrs
6 yrs
6 yrs
7 yrs
12 yrs
13 yrs
17 yrs
5 yrs
2 yrs
2 yrs
4 yrs
7 yrs
7 yrs
7 yrs
7 yrs
Time to
achieve
2,500
cases
21 yrs
7 yrs
10 yrs
10 yrs
13 yrs
26 yrs
33 yrs
36 yrs
8 yrs
4 yrs
3 yrs
6 yrs
12 yrs
15 yrs
11 yrs
11 yrs
Time to
achieve
5,000
cases
39 yrs
11 yrs
15 yrs
15 yrs
22 yrs
12 yrs
5 yrs
5 yrs
9 yrs
16 yrs
36 yrs
14 yrs
17 yrs
Time to
achieve
10,000
cases
19 yrs
25 yrs
24 yrs
19 yrs
8 yrs
7 yrs
14 yrs
24 yrs
19 yrs
26 yrs
Time to
achieve
20,000
cases
33 yrs
14 yrs
11 yrs
27 yrs
36 yrs
26 yrs
-
Issues that are often ignored in
standard power calculations
Multiple testing/low prior probability of association*
Interactions*
Unobserved frailty
Misclassification*
• Genotype
• Environmental determinant
• Case-control status
 Subgroup analyses*
 Population substructure




PHOEBE
Harmonisation
 Prospective
 Retrospective
• Description
• Comparison
• Harmonised synthesis
PHOEBE
Study
Sample
size
Recruitment
Age at
recruitment
Web site
EPIC Europe
>500,000
1993-1997
45-74
ProtecT Study
120,000
1999-2006
50-69
Kadoorie Study
China
Prospective Study
Mexico
UK Biobank
500,000
Ongoing
35-74
200,000
Ongoing
>40
500,000
2006-2010
40-69
100,000
babiesA
2001-2005
At birth
http://www.fhi.no/
100,000
babiesB
1997-2002
At birth
http://www.serum.dk/
sw9314.asp
>600,000
twin pairs
Various
Various
http://www.genomeutwin.org/
>100,000
>100,000C
Iceland
Estonia
Various
Various
http://www.decode.com/
http://www.geenivaramu.ee
~80,000D
Australia
Various
http://www.genepi.com.au/wagp
Cohort studies
Birth Cohorts
Mother and Child
Cohort Study
(Norway)
Danish National
Birth Cohort
Twin Cohorts
GenomEUtwin
Total populations
Decode Genetics
Estonian Genome
Project
Western
Australian
Genome Project
PHOEBE
http://www.iarc.fr/epic/
centers/iarc.html
http://www.epi.bris.ac.uk/
protect/index.htm
http://www.ctsu.ox.ac.uk
http://www.ctsu.ox.ac.uk/projects/
mexicoblood.shtml
http://www.ukbiobank.ac.uk/