Choosing the right Statistical Test

Download Report

Transcript Choosing the right Statistical Test

N U I T, Research Support
• Our remit is to support researchers and
includes support in:
– Statistical Computing
– Numerical Analysis (Condor, HPC, cloud
computing, etc.)
– Internet Tools (website development,
collaboration through the net with other
researchers within and outside NU(Sakai),
Wikis, blogs, mobile applications
• Training in the above
Statistical Computing Support /
Advice
• Staff
• Postgraduate Students (Master and PhD)
• Website: http://www.ncl.ac.uk/itservice/dataanalysis/
If you need help get in touch:
Email: [email protected] also copy to
[email protected]
Based in Claremont Tower
Stats News Mailing List
• To subscribe go to:
• https://lists.ncl.ac.uk/wws/info/statsnews
• Community of Users
Research Data Management (RDM)
• What do you understand by research data?
• Why do you think managing your research data is
important?
• RDM is becoming very important and includes:
– Creating, Processing, Analysing, Preserving,
Giving Access, and Re-using your research data
• http://datalib.edina.ac.uk/mantra/
• RDM at NCL
http://www.ncl.ac.uk/res/research/gov-ethics/rdm/
Other Statistic Software Training
• Dr Collin Gillespie:
http://www.ncl.ac.uk/maths/rcourse/
• Mr David McGeeney: Practical Statistic
•
Text Analytics
• Anyone currently involved with text
analysis? Software?
• Text Analytics
– Open ended questions
– Social media
– document
Introduction: Why use SPSS,
Minitab, SAS or Excel?
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
There are many statistic software e.g. STATA, PRISM, STATISTICA, CLUSTAN, GENSTAT,
Mathematica, GLIM, SPLUS, JMP, R, etc
Some Definitions
Availability of Basics Statistics
Original Intent of the Software
Ease of Use
Industrial Vs Academic Usage
Frequency of New Releases
Operating Platform (Systems)
Graphics Output
Advanced Statistics
Availability of help Interpreting Output
Availability of Technical Support
Online Getting Started Tutorial / Online forum
Onscreen Statistics Wizard
Design of Experiment
Prospective and Retrospective Power Analysis
Interchanging Files with Other Software
Getting Output (results of analysis) into Other Software
Programming Language
Automatic Updating of Output after adding more Data
Pivot Tables availability and Interactivity
Text Analytics
ANOVA, Regression and Time Series Analysis
Popularity and Peer Pressure
Some Definitions
• SPSS: Statistical Package for the Social
Sciences (IBM SPSS Statistics these days)
• SAS: Statistical Application Systems (Just
SAS these days)
• Minitab
• Excel
• SPSS, SAS, and Minitab are statistical
packages while Excel is a spreadsheet
Descriptive (Summary) Statistics Options
Minitab
SPSS
Excel
SAS EG
Advanced Statistics: Model Building
Excel
SPSS
Minitab
SAS
Fertility: Average number of kids.
Infant mortality: deaths per 1000 live births
Bar Charts
30.0%
25.0%
SAS
SPSS
15.0%
10.0%
5.0%
0.0%
CONDO
RANCH
SPLIT
TWOSTORY
style
Chart of Style
Percents
30
25
30
25
20
20
15
15
Percents
10
10
5
SPLIT
Style
TWOSTORY
O
RY
TW
O
ST
RANCH
SP
LI
T
CONDO
RA
NC
ND
O
0
H
0
5
CO
Percent
Percent
20.0%
Percent within all data.
Minitab
Excel
How Easy to Use?
• SPSS and Minitab are relatively easy to use
• SAS a bit more difficult to use but made
easier via Enterprise Guide (EG)
• Excel very easy to use!
• In general most software are easy to use if
you learn how to use them! Getting Started
Guides are available for SPSS, Minitab,
Excel and SAS.
Online Tutorial / Online Forum
• Demonstrate the Getting Started Tutorials
for all four packages via RAS or Common
Desktop and ask for the views of audience
• Which is the best?
• Online Tutorial available in Help menu
• Does the software has an active online
forum? sasprofessionals.net; Assess (SPSS);
Help to Interpret Output
• SPSS provide assistant to help you interpret
some of the output via Case Studies
• Minitab provide assistant too via StatGuide
• SAS by examples (similar to Case Studies)
• Excel
• Demonstrate with example
Technical Support
•
•
•
•
•
SPSS Good, but only via representative
Minitab Good, deals directly with anybody
SAS very Good, deals directly with anybody
Excel Good, deals directly with anybody
User Groups if available can be very useful
Statistics Wizard
• SPSS has a statistics wizard that ask you
some questions and then suggest what
statistical test you can do (demonstrate).
But it is best to be very clear about what
you want!
• SAS (via EG)
• Minitab and Excel don’t
Design of Experiment
Very Important aspect of Statistics: helps you
to set optimum conditions for your
experiment. For example what temperature,
catalyst and pressure are needed for
maximum yield of a chemical reaction.
• Not available in SPSS
• Available in Minitab
• Available in SAS
• Not available in Excel
Prospective and Retrospective
Power Analysis
Power Analysis helps you to make a decision
on sample size / power
•
•
•
•
Only retrospective in SPSS
Both in SAS
Both in Minitab
Not available in Excel except maybe
through an add-in module
Reading Data Files from other
software
•
•
•
•
SPSS is okay
Minitab Good
SAS Good
Excel Very Good (most statistical software
can read (open) an Excel file
• The best way to store data is as flat file
using Notepad (text).
Getting Output (tables and graphs)
into other software
• They are all good in this, particularly getting
output into a Word document or PowerPoint
presentation via Copy and Paste.
• SPSS output file can be exported as html, pdf, rtf,
doc
• SAS output can be listing (local to SAS) or html
(or pdf and rtf via EG)
• Minitab output can be txt, html, pdf, rtf
• Excel output can also be copied and paste in a
variety of formats
Programming Language
• SPSS, Minitab and SAS all have very
powerful programming language. As you
point-and-click a programme is being built
behind the scene. Demonstrate
• Excel also have a programming language
e.g. Visual Basics (Anyone here used it
before?)
Automatic Updating of Graphs and
Tables after adding more data
• SPSS does not have this function, you
simple have to reproduced the output again!
Syntax.
• With Minitab, you can update the graph
and table
• SAS does not have this function
• Excel has this function
New Releases
• IBM SPSS Statistics produces new version too
often, almost every year! Currently version 23 on
campus, but 24 is out!
• Minitab Ltd produces new releases just at the
right pace, currently on version 17 on campus
• SAS Institute Inc. produces new releases about
every couple of years, latest version is 9.4.
9.1/9.2/9.3 for SAS Base and 7 for SAS EG
• Microsoft produces new releases about every
couple of years, currently on Windows 10.
Usage In Industry Vs Academia
• SPSS and Minitab heavily used in
Academia, used in Industry but not a lot
• SAS not heavily used in Academia, heavily
used in Industry (most clinical trials use
SAS)
• Excel heavily used in both Academia and
Industry
Operating Systems
• SPSS runs on Windows, Macintosh, and
Unix
• Minitab runs mainly on Windows and
Macintosh (10Xtra last version on Mac)
• SAS runs on Windows, Macintosh, Unix,
Linux
• Excel runs mainly on Windows and
Macintosh
Pivot Tables
Very good for displaying information online.
Can be very interactive
•
•
•
•
Minitab: static, not interactive
Excel: interactive
SPSS: interactive
SAS: interactive
Demonstrate if possible!
Text Analytics
• SPSS
– SPSS Modeler (different from IBM SPSS
Statistics)
• SAS
– SAS Enterprise Miner (different SAS Base and
Enterprise Guide)
• Minitab
– ??
• Excel
– With extension
Statistical Modelling
•
•
•
“A statistical model is a class of mathematical model, which embodies a set of assumptions
concerning the generation of some sample data, and similar data from a larger population.
A statistical model represents, often in considerably idealized form, the data-generating process”
There are three purposes for a statistical model:
– Predictions
– Extraction of information
– Description of stochastic structures
Some common models:
– Regression Model
• Simple Linear Regression
• Multiple Linear Regression
• Logistics Regression (Binary and Multinomial)
– Ordinal
– Nonlinear
SAS, SPSS and Minitab are good for statistical modelling
ANOVA
Product (Software)
One-Way
Two-Way
MANOVA
GLM
Post-hoc
Tests
Minitab
Yes
Yes
Yes
Yes
Yes
R
Yes
Yes
Yes
Yes
Yes
SAS
Yes
Yes
Yes
Yes
Yes
SPSS
Yes
Yes
Yes
Yes
Yes
Excel
Yes
Add on
Add on
Add on
Latin
Squares
Analysis
Yes
Yes
Yes
Regression
Product (Software)
OLS
WLS
2SLS
NLLS
Logistic
GLM
LAD
Stepwise
Minitab
Yes
Yes
No
Yes
Yes
Yes
No
Yes
R
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
SAS
Excel
Yes
SPSS
Yes
Ordinary Least Square (OLS); Weighted Least Square (WLS); Two Stage Least Square 2SLS;
Non-Linear Least Square NLLS); General Linear Model (GLM); Least Absolute Deviation regression (LAD)
Time Series Analysis
Product
ARIMA
GARCH
Unit root test
Cointegration
test
VAR
Minitab
Yes
No
No
No
No
R
Yes
Yes
Yes
Yes
Yes
SAS
Yes
Yes
Yes
Yes
Yes
Multivariat
e GARCH
Yes
Excel
No
No
No
No
No
No
SPSS
Yes
Yes
No
No
No
No
Compare Groups for Significant Differences
One Continuous Variable
Divided into One Grouping
Variable having 3 Categories
Normal Data and
Equal Variance
NT
Not Normal Data or
Unequal Variance
Three or more Groups
One Continuous
Variable Divided into Two
Or more Grouping Variables
Two Continuous variables that
Represent Related Data
(e.g. pre-test and post-test)
Continuous Data
Divided into Groups
Data
One Group (variable) Compared
To A known value (Gold Standard)
Categorical Data
(Nominal or Ordinal)
Crosstabulations with Chi-Square Test (S10)
One Sample T Test (S9)
Nonparametric Tests for Several
Independent Samples (S3)
General Linear Model Univariate
Analysis (S4)
Normal Data
Parametric Paired Samples T
test (S5)
Not Normal Data
Nonparametric Two-Related
Samples test (S6)
Normal Data and
Equal Variance
Parametric Independent
Samples T test (S7)
NT
Two Groups (variables)
One Continuous variable Divided
Into Two Unrelated Groups
Parametric One-Way ANOVA (S2)
NT
Not Normal Data
OR Unequal
Variance
NT= Normality Test
and Variance Test (S1)
Nonparametric Two Independent
Samples T tests (S8)
Now Some Demonstrations
• Most of these packages can open more than
one file in a session
• SPSS
• Minitab
• SAS
• Excel
Big Data Analytics
• IBM SPSS Modeler
• SAS Enterprise Miner
• R
Project Panel
Text Mining Tools
Text Mining Tools Palette
Diagram Workspace showing Connected Nodes
Help Panel
Properties Panel
END
• You will use many different software
depending on your needs!!!!
• Questions
Summary
•
•
•
•
•
•
•
•
DOE: Minitab or SAS
Power / Sample size calculation: Minitab or SAS
Best way to store your data: Notepad
Automatic Update of Output: Minitab or Excel
SAS very popular in industry
Pivot table: SAS, SPSS or Excel
Modelling: SAS, SPSS or Minitab
Summary statistics: SAS, SPSS or Minitab