Locomotor Experience Applied Post

Download Report

Transcript Locomotor Experience Applied Post

New Statistical Opportunities
with Big Data and Fast Computing
吴尚武
美国佛罗里达大学生物统计系
2012年12月27日
National Cheng Kung University
1
Three Helpful Websites
http://www.wikipedia.org/
http://mathworld.wolfram.com/
http://www.ted.com/
Ken Robinson says schools kill creativity
Bill Gates on energy: Innovating to zero!
Hans Rosling: Stats that reshape your worldview
Outline
Examples of big data
New technologies in data collection and
experiment design
Personal thoughts about the statistics field
3
Part I – Big Data Provides
Many Opportunities
4
Next Generation Sequencing Data
(200 GB per sample; Shyr, 2011)
5
Next Generation Sequencing Data
(Shyr, 2011)
6
One of Life’s Great Mysteries
1 cell
1 cell type
6 feet of DNA
23 pairs of chromosomes
Six billion base pairs of DNA
~ 34,000 genes
100,000,000,000,000 cells
Greater than 220 cell types
18.9 billion miles of DNA
23 pairs of chromosomes / cell
Six billion base pairs of DNA / cell
~ 34,000 genes / cell
7
Massive Amount of Data
Credit Card Purchases
Phone Call Records
Online Social Networks
Scientific Data (Sensors, Instruments,
Devices)
Medical Data
Advanced Multimodality Image Guided Operating Suite
8
IBM Ad (Kunming Airport)
9
IBM Ad (Tianjin Airport)
10
How Should Doctors Be Valued?
Background:
Chinese doctors in crisis (Lancet, May 12, 2012;
The Economist, July 21, 2012)
Improvement in patient QALY?
11
Statistics is about extracting information from
(massive) data that are noisy (errors and
variability) and uncertain, a principled way to
reason about data.
Some 7,000 unmanned aerial vehicles are now deployed in Iraq
and Afghanistan. Their video output this year will would take one
person four decades to watch it. (The Economist, Nov 25, 2010)
12
Big Data in Astronomy
(Feigelson and Babu, 2012)
Least square method (1805)
•
Y4x1= X4x2 b2x1 + e
Sloan Digital Sky Survey (SDSS)
•
A billion stars surveyed, spectra taken of a
million objects; 800 new planets discovered.
•
200 GB/Night; ~ 50 TB (10^6 GB) today
Large Synoptic Survey Telescope (LSST)
•
Photograph half of the sky every 3 nights.
•
5-10 TB/Night; 20 billion rows of data
13
Part II – Modern Computing
Power Offers Huge
Opportunities
14
Research Spending Per New Drug
(Matthew Harper, Forbes 2-10-2012)
At $12B per drug, inventing medicines = unsustainable business!
Less than 1 in 10 medicines that enter clinical trials succeed!
Pharma in crisis of drug innovation and drug development costs!
15
Adaptive Treatment Selection
(Wu, Wang and Yang, 2010, Biometrika 97:405-418)
Given effect size (0.1, 0.6, 0.6, 0.9), a two-stage Drop-the-loser
design with N=80 may achieve the same power as a standard design
using N=116 (45% more)
16
2006 FIFA World Cup
Match Schedule
17
New Double Elimination Tournament
(Wu and Yang, 2008, Mathematical Scientist 33:79-92)
Probability that
Team One
wins
tournament
improves from
36% to 49%
when P1j=0.8
18
NPF QII Form
19
The Acoustic Startle Response
http://scienceblogs.com
http://www.aic.cuhk.edu.
Netter 2009, Koch 1998, Davis 1982
Methods
Stimulation:
120ms
locochrome
105dB
Positions:
Supine
Standing on firm ground
Standing on foam pillow
Recording:
Orbicularis Oculi
(reference)
24 muscle pairs
http://scienceblogs.com
http://content.answcdn.com
http://armymedical.tp
ub.com
Methods
Stand
Foam
Supine
Stand
Supine
Foam
Foam
Stand
Supine
Stand
Foam
Supine
Supine
Foam
9 able-bodied control subjects and 9 individuals with
iSCI
15 stimulations
1 stim
(5 Blocks x 1 Stim per Position)
1 stim
1 stim
standin
standing
standing
Positions: Randomized in each block
g
ISI: Randomized 2-8min
48hrs
9 able-bodied control subjects
1 stim
standing
15 stimulations
(5 Blocks x 1 Stim per Position)
Positions: Randomized in each block
ISI: Randomized 2-8min
1 stim
standing
Count 10% of responses wrong!
23
Count 30% of responses wrong!
24
Privacy-preserving Data Collection and Analysis
(PDCA)
Background: Researchers are interested in objectively
monitoring patients’ activities
iPhone + GPS + Data Encryption
Neither site/investigators nor DMC knows true Y
Hospital / Inv
X, Z, A
Z=YB, Ai
W=[AX, AZ]
Patient / Device
X, Y, B, Ai
B
DMC
W, B (AX, AY)
25
Part III.a – Two Essential Functions
Allow Statistics to Grow
Tremendously
26
统计学是什么?
由小见大
从繁到简
27
What is Statistics?
“Statistics is a broad mathematical discipline which studies ways to
collect, summarize and draw conclusions from data.”
– Wikipedia, 2005
“Statistics is the study of the collection, organization, analysis,
interpretation, and presentation of data… There is also a discipline
called mathematical statistics that studies statistics mathematically”
– Wikipedia, 2012
“Experiment Trumps Analysis””
– Don Rubin, 2012
Statistics is not just a branch
of mathematics.
Applied statistics seeks to
solve problems.
28
Population of Potential Outcomes
All current and future
patients meeting
inclusion and
exclusion criteria
Primary
Outcome
Subject No Treatment
Control
1
2
3
4
5
6
7
8
9
.
.
.
.
.
.
N
YT1
YT2
YT3
YT4
YT5
YT6
YT7
YT8
YT9
.
.
.
.
.
.
YTN
YC1
YC2
YC3
YC4
YC5
YC6
YC7
YC8
YC9
.
.
.
.
.
.
YCN
Problems:
1. For each patient, we
only observe one of
the outcomes
2. We can not conduct
experiment on all
patients (units)
  T   C
2
29
Samples of Outcomes
Primary
Outcome
Subject No Treatment
Control
1
2
3
4
5
6
7
8
9
.
.
.
.
.
.
N

YT1
YT2
YT3
YT4
YT5
YT6
YT7
YT8
YT9
.
.
.
.
.
.
YTN
YC1
YC2
YC3
YC4
YC5
YC6
YC7
YC8
YC9
.
.
.
.
.
.
YCN
  YT  YC
1. Sample patients
from selected
hospitals
2. For each patient,
randomly select
one outcome
[Random Assignment]
30
Potential Improvements
over Simple Randomization
Blocking/Stratification
Covariate Adaptive Randomization
Sequential Monitoring
31
From Sample to Population
(一叶知秋 以小明大)
You can know the autumn's coming through a fallen leaf. The
color of a leave tells the season.
从一片树叶的凋落,知道秋天的到来。
32
Central Limit Theorem
Let pT and pC be the proportion of success in the treatment and
control groups, respectively. Denote by rT and rC the corresponding
sample proportions based on n patients.
Central Limit Theorem:


(rT  rC ) ~ AN ( pT  pC ),
n

2

.

33
Part III.b – Statistical Applications
(Solving Problems) Are
Important and Tough
34
“Baseball, Shakespeare,
and Modern Statistical
Theory”
Bradley Efron, 2006
统计发展
缺一不可
35
LEAPS Enrollment
36
Sex Bias in Graduate Admissions:
Data from Berkeley
44.3% admission for Men vs 34.6% for Women applicants.
Bickel et al. (1975), Science, 187:398-404.
37
Women applicants were more ambitious, applied to harder
to admit departments.
38
There is evidence of bias
in favor of women
39
Applications Are Important
“数理统计学算不算是数学的一个分支?笔者的回答是肯定
的.理由只有一条:数理统计学所研究的数据收集和分析,是抽
象的,脱去了任何实际意义的数据.”
-- 陈希孺, <<数理统计学简史>>导言
做大统计需要考虑数据的实际意义.
40
Thank You!
41
如何做大统计?
人才是基础
就业是关键
制度是保障
科研基金分配
晋升
42
A Theory Of Inference
Statistics is a theory of learning from experience,
especially experience that arrives a little bit at a time.
Example: in a clinical trial, no one patient’s response is conclusive,
but information can be accrued across patients.
In the first four decades of the 20th century, an enormously
ambitious and successful intellectual effort produced a theory of
inference that cuts across individual scientific disciplines.
It is probably the most important contribution of mathematics to
the science in the 20th century.
43