Introduction

Download Report

Transcript Introduction

Why Statistics?
Statistical literacy is a necessary precondition for an
educated citizenship in a technological democracy.
Understanding risks and asking critical questions can also
shape the emotional climate in a society so that hopes and
anxieties are no longer as easily manipulated from outside
and citizens can develop a better-informed and more
relaxed attitude toward their health.
Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L. M. and
Woloshin, S. 2007 “Helping doctors and patients to make sense of health
statistics” Psychological Science in the Public Interest, 8, 53–96.
1
Saturday, 26 March 2016
4:02 AM
Can You Trust Papers?
This is what happened when psychologists tried to
replicate 100 previously published findings (BPS)
After some high-profile and at times acrimonious failures
to replicate past landmark findings, psychology as a
discipline and scientific community has led the way in
trying to find out more about why some scientific findings
reproduce and others don't, including instituting
reporting practices to improve the reliability of future
results.
2
Can You Trust Papers or Texts?
While 97 per cent of the
original results showed a
statistically significant
effect, this was
reproduced in only 36 per
cent of the replications.
However Survey that revealed widespread iffy research practices in
psychology was itself iffy
Can you trust texts? Neuroscience still haunted by Phineas Gage
3
What the textbooks don't tell you about psychology's most famous case study
Can You Trust Papers?
Researchers have found
more than half of the public
datasets provided with
scientific papers are
incomplete, which prevents
reproducibility tests and
follow-up studies.
Public Data Archiving in Ecology and Evolution: How Well Are We Doing?
Dominique G. Roche, Loeske E. B. Kruuk, Robert Lanfear, Sandra A. Binning. PLOS
Biology, 2015 13(11) e1002295 DOI: 10.1371/journal.pbio.1002295
Simple errors limit scientific scrutiny - ScienceDaily - 11 November 2015
4
Are you statistically challenged?
1.
What percentage of drivers are better than
average?
Calculated Risks: How to Know When Numbers Deceive You
5
By Gerd Gigerenzer, Simon & Schuster, New York.
Are you statistically challenged?
1.
What percentage of drivers are better than
average?
Cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
Around
63%, when “average” is determined by number of
accidents. This is so because the distribution of accidents
is asymmetrical; bad drivers account for more accidents
than good ones, so most drivers have fewer than the
average number of accidents.
mean/median/mode - normality
cccccccccccccccccccccc
Calculated Risks: How to Know When Numbers Deceive You
6
By Gerd Gigerenzer, Simon & Schuster, New York.
Are you statistically challenged?
2.
If men with high cholesterol have a 50% higher risk
of heart attack than men with normal cholesterol, should
you panic if your cholesterol level is high?
Calculated Risks: How to Know When Numbers Deceive You
7
By Gerd Gigerenzer, Simon & Schuster, New York.
Are you statistically challenged?
2.
If men with high cholesterol have a 50% higher risk
of heart attack than men with normal cholesterol, should
you panic if your cholesterol level is high?
Cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
Probably not. Although 50% sounds frightening, it is only
because it is given in relative terms: 6 out of 100 men with
high cholesterol will have a heart attack in 10 years, versus
4 out of 100 for men with normal levels. In absolute terms,
the increased risk is only 2 out of 100 – or 2%. Look at it
this way: Even in the high cholesterol category, 94% of the
men won’t have heart attacks.
cccccccccccccccccccccc
Calculated Risks: How to Know When Numbers Deceive You
8
By Gerd Gigerenzer, Simon & Schuster, New York.
Are you statistically challenged?
3. HIV tests are 99.9 percent accurate. You test positive
for HIV, although you have no known risk factors. What
is the likelihood that you have AIDS, if 0.01 percent of
men with no known risk behaviour are infected?
Estimate your risk.
Calculated Risks: How to Know When Numbers Deceive You
By Gerd Gigerenzer, Simon & Schuster, New York.
9
Are you statistically challenged?
3.
HIV tests are 99.9 percent accurate. You test positive for HIV, although you
have no known risk factors. What is the likelihood that you have AIDS, if 0.01 percent of
men with no known risk behaviour are infected?
Cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
Fifty-fifty. Most people assume the possibility is much
higher, an illustration of the “illusion of certainty.” The
correct answer is clear if the problem is framed in
frequencies: Take 10,000 men with no known risk factors. 1
of these men has AIDS; he will almost certainly test
positive. Of the remaining 9,999 men, 1 will also test
positive. Thus, the likelihood that you have AIDS given a
positive test is 1 out of 2. A positive AIDS test, although
cause for concern, is far from a death sentence.
cccccccccccccccccccccc
Calculated Risks: How to Know When Numbers Deceive You
By Gerd Gigerenzer, Simon & Schuster, New York.
10
Are you statistically challenged?
4.
The blood found under the fingernails of a murdered
woman matches the defendant’s blood type, which only 17.3
percent of the population shares. The blood on defendant’s
shoes matches the victim’s type, which only 15.7 of the
population shares. An expert witness at trial testified that
multiplying these two probabilities together gives a joint
probability of 2.7 percent (17.3%×15.7%) that these two
matches would occur by chance – and that there was,
therefore, a 97.3 percent chance that defendant is the
murderer. What is the flaw in the expert’s reasoning?
Calculated Risks: How to Know When Numbers Deceive You
11
By Gerd Gigerenzer, Simon & Schuster, New York.
Are you statistically challenged?
Cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
4.
This is an example of the “prosecutor’s fallacy” –
namely, the erroneous assumption that the random match
probability equals probability of guilt. The actual possibility
that the defendant is the murderer based solely on these
two matches is very small. Frequency analysis again shows
why: Assume that any of the 100,000 men in the city where
the murder took place could be the murderer. One of them,
the murderer, will show both matches with practical
certainty. Of the remaining 99,999 other residents, we can
expect that 2,700 (2.7%) show the same matches. Thus,
the probability that a man with both matches is the
murderer
is 1 in 2,700 - less than one-tenth of 1 percent.
cccccccccccccccccccccc
Calculated Risks: How to Know When Numbers Deceive You
By Gerd Gigerenzer, Simon & Schuster, New York.
12
Are you statistically challenged?
5.
In his argument to the court to exclude evidence
that O.J. Simpson in 1995 had battered his wife, Alan
Dershowitz successfully argued that the evidence was
irrelevant because, although there were between 2.5 and 4
million incidents of abuse of domestic partners, there were
only 1,432 homicides. Thus, he argued, “an infinitesimal
percentage – certainly fewer than 1 of 2,500 – of men who
slap or beat their domestic partners go on to murder
them.” Dershowitz’s argument is outrageously incorrect:
the actual likelihood that a batterer actually murdered his
partner is 8 out of 9, or around 90%. What is missing from
Dershowitz’s analysis?
Calculated Risks: How to Know When Numbers Deceive You
By Gerd Gigerenzer, Simon & Schuster, New York.
13
Are you statistically challenged?
Cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
5.
Either Dershowitz was confused, or he purposely
hoodwinked the court, in much the same way that the
tobacco industry seeks to obscure the risks of smoking. His
analysis omits a key element: what number of battered
women are killed each year by someone other than their
partners? The answer is around 0.05%. Now, think of
100,000 battered women. 40 will be murdered this year by
their partners. 5 will be murdered by someone else. Thus,
40/45 murdered and battered women will be killed by their
batterers - in only 1/9 cases is the murderer someone
other
than the batterer.
cccccccccccccccccccccc
Calculated Risks: How to Know When Numbers Deceive You
14
By Gerd Gigerenzer, Simon & Schuster, New York.
Are you statistically challenged?
Just what will you need to know in your future careers?
For a questionnaire investigating
Statistical needs of non-specialist young workers
P.K. Holmes 1995 Royal Statistical Society, Statistical
Education Project 16 - 19
With a useful summary
Reinventing Business Statistics: Statistical Literacy for
Managers Milo Schield 2013 (see Appendix A).
15
How To Open An Excel
Worksheet in SPSS
Data preparation in Excel.
Each column is a variable.
The data type and width for each variable are
determined by the data type (text, numeric
etc.) and width in the Excel file.
16
Blank Cells
For numeric variables, blank cells are converted
to the system-missing value, indicated by a
period.
17
Variable Names
The name must begin with a letter. The
remaining characters can be any letter, any
digit, a period, or the symbols @, #, _ or $.
18
Variable Names
Variable names can be defined with any mixture
of uppercase and lowercase characters, and
case is preserved for display purposes.
19
Variable Names
Variable names cannot end with a period.
20
Variable Names
The length of the name cannot exceed 64
characters.
21
Variable Names
Blanks and special characters (for example !, ?,
' and *) cannot be used.
22
Variable Names
Reserved keywords such as
ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT,
OR, TO, WITH
cannot be used as variable names.
23
Data Entry
For simplicity enter your data into Excel with
the first cell in each column containing the
variable name.
24
Data Entry
25
Data Entry
It is essential that all column widths are
stretched to accommodate their data.
You may perform all subsidiary calculations,
such as collating of responses to form a score
and any negative coding, within Excel.
26
Data Entry
It is important that you “proof” your data.
For instance check the minimum and maximum
value in each column.
You can do this either in Excel (not forgetting
to delete these extra rows) or in
SPSS (not forgetting to make the corrections
to the Excel version).
27
Loading Into SPSS
In SPSS
File > Open > Data
28
Loading Into SPSS
Select the Excel data type
29
In SPSS
Note that sex has been recoded as 0/1, as preferred by SPSS.
Use a single column for this binary variable.
As a general rule use as few columns as possible, so JOBCAT has
potentially 7 responses but is stored in a single column.
30
SPSS Tips
Perhaps the simplest way to ease yourself into using
SPSS syntax is to click on the Paste button instead
of the OK button after you have set up your analysis.
This will paste the code that SPSS uses to run your
analysis into a syntax window. A syntax file is
nothing more than a text file; hence, you can type
code and comments into it, and you can cut-and-paste
in it as you would in any text editor.
31
SPSS Tips
Select Paste
32
SPSS Tips
What action does this code perform?
33
SPSS Tips
To run the code that you have pasted, you simply
highlight it and click on the right-pointing arrow
(green triangle) at the top. Your results will be
displayed in the output window just the same as if
you had used the point-and-click interface. Another
option is to, “Right click” on the highlighted
commands and choose “Run All”.
34
SPSS Tips
Activate
35
SPSS Tips
Alternately to activate, return via the drop down
menu’s to your intended commanded. Your previous
selection will be intact, you can now select OK as
usual.
However you have preserved your syntax for use on
future occasions. For instance running a full analysis
having already conducted a pilot study.
36
SPSS Tips
The EXPORT command exports output from an open
Viewer document to an external file format, such as
Word. By default, the contents of the designated
Viewer document are exported, but a different
Viewer document can be specified by name. The
target file/format may be selected.
It may be activated by “right clicking” within the
output shown by the statistics viewer and selecting
Export.
37
SPSS Tips
Export
38
SPSS Tips
Destination
39
SPSS Tips
Now you should go and try for yourself.
Each week a cluster is booked to follow this session.
This will enable you to come and go as you please.
Obviously other timetabled sessions for this module
take precedence.
40
For the Complex and
Important Topics of Power
and Sample Size Refer to
the Module Webpage
41