Transcript Lecture 10

Quick Data Summaries in SAS
• Start by bringing in data
– Use permanent data set for these examples
• Proc Summary
– Produces summaries relatively easily
– Designed to produce a table of output that
can be manipulated further ***This is a critical
difference from tabulate***
– Need to pre-sort data by any “by” groups
– Need to print out results
Quick Data Summaries in SAS
• Basic Summary Syntax:
Proc sort;
By var1 var2;
Run;
Proc summary;
By var1 var2;
Var variable3;
Output out=new_table mean=mean_name n=n_name….;
Run;
Proc print;
Run;
Statistics available in Proc Summary
• Mean, n, standard deviation, standard
deviation, variance, coefficient of variation,
sum
• Minimum, maximum, range, number of missing
observations, median
Some Quirks of Proc Summary
• Whenever you use proc summary, it adds two new
variables: _type_ and _freq_ (note underscores at
beginning and end of variable names
– _freq_ indicates the number of observations
– _type_ indicates whether the output is a matrix or not
• You can ignore these variables in virtually all cases
• You need to remember what is the “active” dataset, or
specify the dataset that summary will operate on
– The active dataset is the most recently used dataset by
default
Shannon’s Diversity Index
H= -∑ pi ln(pi)