PROC SORT - People Server at UNCW

Download Report

Transcript PROC SORT - People Server at UNCW

• Chapter 4 concerns various SAS procedures
(PROCs). Every PROC operates on:
– the most recently created dataset
– all the observations
– all the appropriate variables
unless you tell SAS otherwise by
– changing the DATA= statement
– using the WHERE statement or the BY statement
– using the VAR statement
• All PROCs begin with the keyword PROC,
followed by the name of the PROC, followed by
additional options or statements required by the
specific PROC.
• Some PROCs we’ve already seen are:
–
–
–
–
–
PROC CONTENTS;
PROC PRINT ;
PROC SORT; BY sortingvariablename;
PROC MEANS;
PROC FREQ; TABLES list_of_variables;
• The BY statement is required with PROC
SORT, optional with others... when used
with other PROCs it tells SAS to perform
separate analyses for each of the values of
the BY variables, instead of keeping all the
observations together in one group.
• Let’s do an example with Dr. Padgett’s
dataset...suppose you wanted to analyze the
flowering and the number of live leaves of the
plants separately for each marsh ...
PROC SORT ; BY marsh;
PROC FREQ; TABLES flower*lleaves;
BY marsh;
• NOTE that I have SORTed the data prior to
doing the PROC FREQ BY school... when
you do a PROC BY a variable, SAS assumes
that the dataset is SORTed BY the variable...so
if it’s not already sorted , use PROC SORT to
do the sorting...
• NOTE also that you may use TITLE,
FOOTNOTE, and LABEL statements to
enhance your output from any PROC... the
syntax is TITLEn ‘...’; and
FOOTNOTEn ‘...’; and up to 10 titles and
footnotes are allowable. The LABEL
statement allows you to give labels to your
variable names up to 256 characters long...
If you want your labels to be used
throughout a dataset, use the label statement
in the DATA step - otherwise use it within
the PROC step...
• The WHERE statement may be used with
PROCs to perform the procedure only on certain
observations in the dataset, those satisfying the
conditions given in the WHERE statement... for
example:
– in Dr. Padgett’s data, try to PRINT only those
flowering plants from the Shell Island marsh:
PROC PRINT;
WHERE flower=“yes” AND marsh=“si”;
• Note the similarity with the subsetting IF
statements we saw earlier. The same
comparison, logical, and arithmetic symbols
may be used to construct the condition. See the
examples on page 102...
• PROC SORT is one of the most useful PROC in SAS.
Besides being required to sort data prior to doing other
PROCs BY the sorting variable, SORTing is useful in
its own right...
• You may specify as many sorting variables as you like,
but be careful to note how the multiple sorts are done...
try a SORT of Dr. Padgett’s data first by marsh and
then by plant height. What happens if you reverse the
order or the sorting variables?
• Note that you may save the SORTed data in its own
dataset using the OUT= statement... otherwise the
original dataset is overwritten with the SORTed data...
• PROC PRINT is the most widely used PROC and it has
great flexibility
• Some of the options used with PROC PRINT are...:
– LABEL - to print w/labels previously assigned
– NOOBS - to not print the observation number
– ID - to use another variable instead of the OBS
– SUM - to sum up particular variables this one is helpful
for financial printouts...
– VAR - to print specific variables in a certain order
– BY - to print the data in sections defined by the BY
variable in a previous PROC SORT
• Example: Get a printout of Dr. Padgett’s data so that we
print only the total mass of the plants in groups by the
number of live leaves summed up in those groups and using
the plant number to identify the observations.
• For Monday:
– Make sure you understand the material in
Chapters 2 and 3 in preparation for the midterm
- ask questions if you have them…
– Read Chapter 4 up through 4.4 - we’ll discuss
FORMATs next time along with writing our
own formats with PROC FORMAT