Slide 1 - University of South Carolina

Download Report

Transcript Slide 1 - University of South Carolina

Gaining Market Share for
Nonparametric Statistics
Michael J. Schell
Moffitt Cancer Center
University of South Florida
Web of Science
•
•
•
•
Source of count data for this talk
Words/phrases found in title or abstract
Mainly title only references before 1991
The number of articles has increased over
the years, thus the need for benchmarking
But is the Market Itself
Expanding?
Non-Linear Regression
Methods
Article Counts and Growth Rate of
Regression Sub-Fields
Sub-Field
Non-linear
Wavelets
Linear
Logistic
Mixed models
1990-94
1469
1025
4360
4291
750
2005-07*
2494
6114
8281
16,728
2817
GR
3.4
11.9
3.8
7.8
7.5
Data mining
Bioinformatics
11
14
2979
4194
542
599
* Estimated 5-year rate obtained by doubling the count
GR = Growth Rate
How Many Discoveries Have Been Lost by
Ignoring Modern Statistical Methods?
Rand R. Wilcox, American Psychologist, 1998
Arbitrarily small departures from normality result in low
power; even when distributions are normal,
heteroscedasticity can seriously lower the power of
standard ANOVA and regression methods.
… most quantitative articles tend to be too technical for
applied researchers.
If the goal is to avoid low power, the worst method is the
ANOVA F test.
…the Theil-Sen estimator deserves consideration as well.
British Medical Journal articles by
Doug Altman
The scandal of poor medical research, 1994
Why are errors so common? Put simply, much
poor research arise because researchers feel
compelled for career reasons to carry out
research that they are ill equipped to perform,
and nobody stops them.
Statistics and ethics in medical research. The
misuse of statistics is unethical, 1980
Marketing of Pharmaceuticals
1) Must have the produced the drug and
shown its efficacy
2) Need to produce the drug in mass
quantities
3) Marketing
Marketing of Statistical Ideas
1) Must have derived the statistic and
demonstrated its efficacy
2) Need to have available software
3) Need to disseminate the idea
Key Principle
In an environment where
ideas are not marketed, first
on the market wins
First-on-the-market winners
T-test, 1905
ANOVA
Kolmogorov-Smirnov test, 1937
Duncan’s test, 1950
Kaplan-Meier curves, 1958
Cox regression, 1972
Hodges and Lehmann , 1961
4th Berkeley Symposium
Chernoff and Savage (1958) proved that the
ARE of the normal scores test is at least 1
“The above results suggest that on the basis
of power, at least for large samples, both
the Wilcoxon and normal scores tests are
preferable to the t-test for general use.”
First Simulation on Robustness of t-test
CA Boneau, 1960
320 citations
Conclusion: t-test is fine, exponential
distribution simulation was done wrong
Highest citation count on any subsequent
simulation study (39 thru 2000) = 96
Textbook Placement
Basic Practice of Statistics, 4th Ed. 2006 David S.
Moore (728 pages)
Non-parametric tests don’t make the book; they
appear in the virtual appendix.
Statistics: A Biomedical Introduction, 1977
Hollander and Wolfe
T-test in Chapter 5; Wilcoxon in Chapter 13
Biostatistics, 2nd Ed. van Belle, Fisher, et al., 2004
T-test in Chapter 5; Wilcoxon in Chapter 8
One-Way Layout for Books of
Psalms
Book
1
2
3
4
5
N
41
31
17
17
44
150
Mn
15.0
15.0
21.1
18.9
15.9
SD
9.3
8.0
16.7
13.2
26.1
Sk
1.9
1.1
2.3
1.2
5.6
Kurt
4.6
0.9
5.4
0.5
34.5
Range
5-50
5-36
7-72
5-48
2-43,176
Md
12
12
18
15
9
Results
•
•
•
•
ANOVA
ANOVA on logged data
Kruskal-Wallis
Normal scores
p = .7015
p = .0586
p = .0458
p = .0378
• AD sum for data:
14 = 2.2 + 1.0 + 2.0 + 0.9 + 7.9
• AD sum for log data: 1.9 = 0.3 + 0.3 + 0.5 + 0.2 + 0.6
Deciding Between ANOVA and KW
on Principle
• If one is convinced that the metric of the
values is what one wants, then ANOVA is
fine
• ANOVA – political kin is the monarchy
• KW – political kin is democracy
• Power assessed as P(X < Y)
Cancer Research
It has been my experience as a statistician
in cancer research, that we are:
1) rarely sure of the metric for the data,
2) typically interested in answering the
democratic question
Thus, nonparametric analysis has
predominated in my applied articles
Ethical Considerations
Applied statistical work is very important in
decision-making
Educators have an ethical responsibility to
properly train their “tool user” students in
best practices
“Tool user” statisticians have an ethical
responsibility to seek best practice
information