Data Preparation for Analytics Using SAS

Download Report

Transcript Data Preparation for Analytics Using SAS

Data Preparation for Analytics
Using SAS
Gerhard Svolba, Ph.D.
Reviewed by
Madera Ebby, Ph.D.
What is the purpose of this book?
2

Introduces the reader to data preparation

Why data preparation is not only important
but a must prior to data analysis

From data preparation process to data
analytics
The Analysis Path: From raw data to
results that can be implemented
Data sources
3
Data Preparation
Analytic
Modeling
Results and
Actions
Different Data Merges,
Sources
Denormalization
Modeling,
Parameter
Estimation,
Tuning
Usage of Results
Relational
Models, Star
Schemes
Predictions,
Classifications
or
Clustering
Profiling
Interpretations
Derived Variables
Transpositions,
Aggregations
The Analysis Path: From raw data to
results that can be implemented
`
Good Results
Clever Modeling
Adequate Preparation
Data availability
4
Four Dimensions
for Analytic Data Preparation
Business and
Process Knowledge
Analytical
Knowledge
Analytic Data
Preparation
Documentation and
Maintenance
5
Efficient SAS
coding
Business question: How did students
who met the provincial standard in
grade 3 perform in grade 6?
6

Generates many other questions

Work with people in other departments such
as IT to carry out a data analytic process
Why is this author qualified or not
qualified to address this topic?
7

He is an experienced SAS user as
exemplified in the many Macros

He addresses issues by presenting
examples from different background
What are the strengths or
weaknesses of this book?
8

The book is written clearly and is easy to
read

Provides the reader with a lot of examples of
codes, input and outputs
Would you recommend this book? If
so, who would you recommend it to
and for what purpose?




9
Those who prepare data marts for statistics or data
mining or time series analyses
Those who provide data used in creating data marts
IT and data warehousing
Both new and experienced SAS users who perform
data analyses using data marts
Those who prepare data in relational databases with
SQL
Does the book achieve its
purpose?
Absolutely! It enables one to:
 Understand the business environment in which data
preparation occurs
 Extract and structure your data
 Create derived variables from different tables
 Program SAS in an efficient way
10
What is the best tip or technique
addressed in this book?


11
There are many new techniques that I learnt from this book. For
example:
Examine the mean scores for math by board mident
Continued…






12
Proc means data=datalib.boards noprint
nway;
class board_mident;
var Math_score;
output
out=datalib.aggr_static(drop=_type_
_freq_)
Mean= Sum= N= STD= MIN= MAX= /Autoname;
run;
Continued…







13
To run analysis by board_mident, we use a CLASS statement. A BY statement
could also be used but data would have to be sorted by board_mident
NWAY suppresses grand total mean and all other totals so that output data
contains only rows for 5 boards which are the analysis subjects
The NOPRINT in order to suppress the printed output from the log, which can
be thousands of descriptive measures even for a small sample of 5
observations
In the OUTPUT statement we specify the statistics that will be calculated . The
AUTONAME option creates the new variable names in the form of
VARIABLENAME_ STATISTIC
If we want to calculate different statistics for different input variables we can
specify it on the OUTPUT statement: e.g SUM(VARIABLE)=sum_variable
In the OUTPUT statement we drop the _TYPE_ and _FREQ_vaiables, although
we could keep the _FREQ_ and omit N from the statistics list.
Chapter 18, Multiple Interval-Scaled Observations per subject, page 183.
CONTINUED…
14
Are there other books (or sources of
information) available with similar content?



Yes, but tend to present bits and pieces of information
E.g. Resources on the internet
The Little SAS Book by Delwiche and Slaughter
If so, how does this book compare?
Comprehensive,
15
well illustrated presentation of material
What will your SAS log look like?
16
or
17
or
18
or
19