Some useful but not well known SAS Features

Download Report

Transcript Some useful but not well known SAS Features

Copy and paste almost anything
Arthur Tabachneck
Myqna, Inc.
Nate Derby
Stakana Analytics
John King
Randy Herbison
Ouachita Clinical Data Services, Inc.
Westat
Ben Powell
Genworth Financial
Richard DeVenezia
Independent Consultant
2012OrlandoFlorida
April 22-25, 2012
Suppose you found a table on the web that you
needed to have in a SAS dataset
Table A. Total tax revenue as percentage of GDP
1975
1985 1990 1995
2000
2004 2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
The table might be in the form of:
an html page
a pdf document
a spreadsheet
a word document
a page from a wiki
or any other form that you can
copy to your system's clipboard
and, of course, it could come from
any source .. not just the web
things that can complicate copying and pasting:
some columns may not have variable names
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004 2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
things that can complicate copying and pasting:
variable names may take up more than one row
Table A. Total tax revenue as percentage of GDP
Row 1
1975 1985 1990 1995 2000 2004 2005 2006
Row 2
Row 3
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
things that can complicate copying and pasting:
tables may contain one or more blank rows
Table A. Total tax revenue as percentage of GDP
1975
1985
1990 1995
2000
2004 2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
things that can complicate copying and pasting:
rows may have some missing values
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004 2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
things that can complicate copying and pasting:
data may contain sub or superscript values
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004 2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
plus, there might be things you want to change:
you may want to name or rename some variables
Table A. Total tax revenue as percentage of GDP
Country
1975
1985
1990 1995
2000
2004 2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
plus, there might be things you want to change:
you may want to add a variable name prefix/suffix
Table A. Total tax revenue as percentage of GDP
Country
Revenue
1975
1975
Revenue
1985
1985
Revenue
1990
1990
Revenue Revenue
2000
1995
1995
2000
Revenue
2004
Revenue Revenue
2006
2004 2005
2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
plus, there might be things you want to change:
you might want to add a variable label
Table A. Total tax revenue as percentage of GDP
1975
1985
1990 1995
2000
2004 2005
2006
2007
2007
Guesstimate
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
plus, there might be things you want to change:
you might want to specify missing values
Table A. Total tax revenue as percentage of GDP
1975
1985
1990 1995
2000
2004 2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
plus, there might be things you want to change:
you may want to multiply a variable by a constant
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004 2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
.384
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
plus, there might be things you want to change:
you may want to specify formats or informats
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004 2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5%
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
you may want to specify which row(s) should be
used to guess formats and informats
Table A. Total tax revenue as percentage of GDP
1975
1985
1990
1995
2000
2004 2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
plus, there might be things you want to change:
you may want some data converted to upper case
Table A. Total tax revenue as percentage of GDP
1975 1985 1990 1995 2000 2004 2005 2006
2007
Provisional
KOREA
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
NEW ZEALAND
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
AUSTRIA
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
BELGIUM
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
CZECH REPUBLIC
DENMARK1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
plus, there might be things you want to change:
you may not want all of the data
Table A. Total tax revenue as percentage of GDP
Country
Revenue
1975
1975
Revenue
1985
1985
Revenue
1990
1990
Revenue Revenue
2000
1995
1995
2000
Revenue
2004
Revenue Revenue
2006
2004 2005
2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
plus, there might be things you want to change:
you may want to drop one or more columns
Table A. Total tax revenue as percentage of GDP
Country
1975
1985
1990
1995
2000
2004 2005
2006
2007
Provisional
Korea
15.1
16.4
18.9
19.4
23.6
24.6
25.5
26.8
28.7
New Zealand
28.5
31.1
37.4
36.6
33.6
35.3
37.5
36.7
36.0
Austria
36.7
40.9
39.6
41.2
42.6
42.8
42.1
41.7
41.9
Belgium
39.5
44.4
42.0
43.6
44.9
44.8
44.8
44.5
n.a.
37.5
35.3
37.8
37.5
36.9
36.4
48.8
49.4
49.0
50.7
49.1
48.9
Czech Republic
Denmark1
38.4
46.1
46.5
Source: http://www.oecd.org/dataoecd/48/27/41498733.pdf
things that can complicate copying and pasting:
you might want merged cells to apply to more than one variable
things that can complicate copying and pasting:
so that you get a table that looks like:
things that can complicate copying and pasting:
the table might need to be transposed – Have:
things that can complicate copying and pasting:
Need: what you really want is
things that can complicate copying and pasting:
the table might copy in the following form – Have:
ProjNum
1
2
3
4
ClaimNum
4
19
12
9
PreAdv
1
1
(continued)
things that can complicate copying and pasting:
Need: what you really want is
things that can complicate copying and pasting:
the data might not even be in tabular form
Note: Ensure that use of these methods does not violate a site's terms of use
things that can complicate copying and pasting:
and may even be a bit more complex
Note: Ensure that use of these methods does not violate a site's terms of use
Would you know how to get such data into SAS
quickly, painlessly and totally accurately?
"Excuse me. Is this the Society for Asking Stupid Questions?"
other than asking some very careful
person to re-enter all of the data
Of course you could use your mouse to highlight
and copy everything to your system's clipboard
and, if only a simple extract was needed,
and SAS/Access to PC-Files was licensed,
you might be able to paste it into an Excel file
and then use proc import
but what if one doesn't have
SAS/Access to PC-Files?
or if the task requires some features
not currently offered with proc import?
proc import currently doesn’t provide a way to:
access the system clipboard
assign or rename variable names
account for merged cells in variable names
import multiple row variable names
add a prefix or suffix to variable names
parse a structured document
assign missing values
specify variable labels
indicate which data rows to select
specify the rows to use to guess (in)formats
specify formats and/or informats
transpose data
change a variable's unit of measurement
upcase any variable's values
drop one or more columns
but, for tables that you can paste,
the code presented in this paper includes all of
those options and capabilities
our Truth in Advertising commitment
WARNING: The code/method presented in this paper:
are NOT substitutes for proc import
may not work on all systems
should NOT be used if such use
violates any copyright or terms of
agreement
is NOT production quality
IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. The authors shall not be liable whatsoever for any damages
arising out of the use of this documentation or code, including any direct,
indirect, or consequential damages. In addition, the authors will provide no
support for the materials contained herein.
copy and paste almost anything
This presentation, the code and a paper on the
topic are all available on the past presentation’s
page for September 16, 2011 at: www.torsas.ca
copy and paste almost anything
things that can complicate copying and pasting:
what you actually copy is extremely
software and system dependent
our proposed solution
first, declare needed macro variables
clipbrd method lets you
%let spaces="
";
%let hrows=2;
account
for
multiple
"paste"
data
from
your
account
for
merged
%let var_share=3~2 5~4;
row in
variable
names
system's
clipboard,
but
add
and recode
cells
variable
names
%let var_renames=;
translates
tabs
into
variable
names
%let var_prefix=;
%let var_suffix=;
add a spaces
prefix or suffix
specify
formats
%let var_labels=;
specify
desired
to
any
variable
name
account
for
rows
between
specify
missing
%let first_data_row=3;
and
informats
variable
labels
and
%let var_missing=10~n.a.;
datavalues
and variable
for anynames
specify
which
rows
%let var_formats=2-5~best12.;
specify
one
variable
variables
%let var_informats=2-5~best12.;
or
a
range
of variables
should
be considered
different
systems
%let var_units=;
ifguessing
needed,
indicate
in
formats
translate
tabs
into
%let var_drop=;
drop
any
columns
amount
data
should
and
informats
different
numbers
of
upcase
any
variables
%let var_upcase=;
thatbeyou
don’t
want
multiplied by
%let guessingrows=3-3;
spaces
%let outfile=revenue;
our proposed solution
first, declare needed macro variables
%let spaces="
";
%letor
hrows=2;
any combination you might need
%lete.g.,
var_share=3~2 5~4;
%let var_renames=;
%let%let
var_prefix=;
var_informats=
%let var_suffix=;
1~$20.
%let var_labels=;
2-5~best12.
%let first_data_row=3;
6~comma8.
%let var_missing=10~n.a.;
7~percent8.
%let var_formats=2-5~best12.;
8~anydtdte21.
%let var_informats=2-5~best12.;
9~trailsgn8.
%let ;var_units=;
%let var_upcase=;
%let guessingrows=3-3;
%let outfile=revenue;
our proposed solution
first, declare needed macro variables
%let spaces="
";
%letor
hrows=2;
any combination you might need
%lete.g.,
var_share=3~2 5~4;
%let var_renames=;
Note: there are additional macro variables
%let%let
var_prefix=;
var_informats=
to address
structured layouts and files that
%let var_suffix=;
1~$20.
need to be transposed
%let var_labels=;
2-5~best12.
%let first_data_row=3;
6~comma8.
%let transpose=YES;
%let var_missing=10~n.a.;
7~percent8.
%let columns=5;
%let var_formats=2-5~best12.;
8~anydtdte21.
%let rows=80;
%let var_informats=2-5~best12.;
9~negative8.
%look_for=view all editions and formats;
%let ;var_units=;
%skip_lines=1;
%let var_upcase=;
%let guessingrows=3-3;
%let outfile=revenue;
our proposed solution
in the code, instructions are shown as comments. e.g.
%let var_prefix=; *specify any string that you want
appended before any variable name. A ~ must be used
to separate variable number(s) and variable prefixes,
and either a space or separate line to represent
additional entries. If you want the same prefix used for
a range of variables, specify the range as #-#. E.g., if
variables 2 thru 4 are named 1996, 1997 and 1998, and
you want them to be named Revenue_1996,
Revenue_1997 and Revenue_1998 you would specify:
%let var_prefix=2-4~Revenue_;
create macros for code that is used repeatedly
%macro expandr (type,string);
f i=1; hold_rec="";
do while (scan("&string.",i," ") ne "");
if scan(scan(scan("&string.",i," "),1,"~"),2,"-") ne ""
then do;
start=scan(scan(scan("&string.",i," "),1,"~"),1,"-");
end=scan(scan(scan("&string.",i," "),1,"~"),2,"-");
end;
else do;
start=scan(scan("&string.",i," "),1,"~");
end=scan(scan("&string.",i," "),1,"~");
end;
do j=start to end;
hold_rec=catx(" ",hold_rec,cat(strip(j)||"~"||
strip(scan(scan("&string.",i," "),2,"~"))));
end;
i+1;
end;
call symput(&type.,strip(hold_rec));
%mend expandr;
create macros for code that is used repeatedly
f%macro expandr (type,string);
%macro filarray (type,string);
if scan("&string.",i," ") ne "" then
&type(scan(scan("&string.",i," "),1,"~"))=
scan(scan("&string.",i," "),2,"~");
%mend
filarray; ",hold_formt,cat(strip(j)||"~"||
hold_formt=catx("
strip(scan(scan("&string.",i," "),2,"~"))));
end;
i+1;
end;
call symput(&type.,strip(hold_formt));
%mend expandr;
create datastep to count number of variables
f%flipfile
data _null_;
length hold_rec $32767;
infile clippy;
input;
_infile_=tranwrd(_infile_, &spaces., '09'x);
var_count=countc(_infile_,,"H")+1;
call symput('var_count',strip(put(var_count,8.)))
%expandr("var_formats",&var_formats)
%expandr("var_informats",&var_informats)
%expandr(("var_missing",&var_missing.)
%expandr("var_units",&var_units)
%expandr("var_prefix",&var_prefix)
%expandr("var_suffix",&var_suffix)
%expandr("var_upcase",&var_upcase.)
%expandr("var_drop",&var_drop.)
%expandr("var_labels",&var_labels.)
%expandr("var_share",&var_share.)
stop;
run;
declare arrays and variables in main datastep
data _null_;
fd file revised lrecl=32767;
infile clippy end=eof;
array headers(%sysfunc(max(&hrows.,1))) $32767.;
array varnames(&var_count.) $32.;
array formats(&var_count.) $32.;
array informats(&var_count.) $32.;
array renames(&var_count.) $32.;
array prefix(&var_count.) $32.;
array suffix(&var_count.) $32.;
array labels(&var_count.) $32.;
array miss(&var_count.) $255.;
array upcases(&var_count.) $3.;
array drops(&var_count.) $3.;
array units(&var_count.) $32.;
r array share(&var_count.) $32.;
array varlens(&var_count.);
array vartypes(&var_count.);
length hold_rec temp ivartype fvartype var_units
var_names var_labels var_drop $32767;
length missval $255;
retain headers renames varnames vartypes varlens
formats informats units prefix suffix labels drops
miss upcases share grows_start grows_end;
parse the header row(s) and macro variables
input;
fd
_infile_=tranwrd(_infile_,
&spaces., '09'x);
data _null_;
if _n_ le &hrows. then headers(_n_)=tranwrd(tranwrd(
tranwrd(_infile_, '%', 'percent'),'-','_to_'),'–','_to_');
if _n_ eq &hrows. or (_n_ eq 1 and &hrows eq 0) then do;
grows_start=scan("&guessingrows.",1,'-');
if missing(grows_start) then grows_start=&first_data_row.;
grows_end=scan("&guessingrows.",2,'-');
if missing(grows_end) then grows_end=999999;
do i=1 to &var_count.;
%filarray(renames,&var_renames.)
%filarray(prefix,&var_prefix.)
%filarray(suffix,&var_suffix.)
%filarray(units,&var_units.)
r %filarray(formats,&var_formats.)
%filarray(informats,&var_informats.)
le
%filarray(upcases,&var_upcase.)
%filarray(drops,&var_drop.)
%filarray(labels,&var_labels.)
%filarray(miss,&var_missing.)
%filarray(share,&var_share.)
if &hrows. eq 0 then varnames(i)=cat("Col"||strip(i));
obtain and assign variable names
fd
data _null_;
else do;
varnames(i)="";
do j=1 to &hrows.;
if j eq 1 and share(i) ne "" then do;
if strip(scan(headers(j),share(i),,"HM")) ne "" then
varnames(i)=strip(scan(headers(j),share(i),,"HM"));
end;
else do;
if strip(scan(headers(j),i,,"HM")) ne "" then do;
% if strip(varnames(i)) ne "" then varnames(i)=
strip(varnames(i))||"_"||strip(scan(headers(j),i,,"HM"));
else varnames(i)=strip(scan(headers(j),i,,"HM"));
end;
r end;
leif
c j eq &hrows. and varnames(i) eq "" then
varnames(i)=cat("Col"||strip(i));
end;
end;
create macro variables and write file (part 1 of 4)
f;%;
if renames(i) ne "" then varnames(i)=renames(i);
%;
f; ;%;
if prefix(i) ne "" then varnames(i)=
; strip(prefix(i))||strip(varnames(i));
; ;;if;suffix(i) ne "" then varnames(i)=
;g
strip(varnames(i))||strip(suffix(i));
%;
; ;if strip(labels(i)) eq "" then labels(i)=strip(varnames(i));
r;
;else labels(i)=tranwrd(strip(labels(i)), '^', ' ');
r; ;
" varnames(i)=tranwrd(strip(varnames(i)),'%', 'percent');
;
varnames(i)=tranwrd(strip(varnames(i)),'-','_to_');
;;%;
");
varnames(i)=tranwrd(strip(varnames(i)),'–','_to_');
%;
1;
varnames(i)=tranwrd(strip(varnames(i)),'#',
'number');
;
%;
;
; varnames(i)=tranwrd(strip(varnames(i)),
' ', '_');
i
;
varnames(i)=compress(varnames(i),,'kn');
;
if anydigit(substr(varnames(i),1,1)) then
varnames(i)=cat("_",strip(varnames(i)));
var_names=catx(" ",var_names,strip(varnames(i)));
var_labels=cat(strip(var_labels)||"label "||
strip(varnames(i))||"="||quote(strip(labels(i)))||";");
create macro variables and write file (part 2 of 4)
f;%; if units(i) ne "" then var_units=
%;
",var_units,strip(varnames(i))||"="||
f; ;%; catx("
strip(varnames(i))||"*"||strip(units(i))||";");
;
; ;; ; if drops(i) eq “YES” then var_drop=
; g catx((" ",var_drop,strip(varnames(i)));
%;
; ; end;
r; if var_drop ne “” then var_drop=
;
r; ; “(drop=)||strip(var_drop)||”)”;
" call symput('varnames',var_names);
;
call
;;%;
"); symput('varlabls',var_labels);
%;
call
1; symput('varunits',var_units);
;
%;
call
symput('vardrop',var_drop);
;
; end;
i
;
;if
_n_ ge &first_data_row. then do;
if countc(_infile_,,"H")+1 eq &var_count. then do;
do i=1 to &var_count.;
temp=strip(scan(_infile_,i,,"HM"));
if upcase(upcases(i)) eq "YES" then temp=
upcase(temp);
if strip(temp) ne "" then do;
if miss(i) ne "" then do;
k=1;
create macro variables and write file (part 3 of 4)
do while (scan(miss(i),k,"` ") ne "");
f;%;missval=tranwrd(strip(scan(miss(i),k,"` ")),'^',' ');
%;temp=tranwrd(strip(temp),strip(missval), '');
f; ;%;
k+1;
;
; ;;end;
;
end;
;g
%;
if grows_start LE _n_ and grows_end GE _n_ then do;
;;
r; call missing(vartype);
;;
= input(temp, ?? best12.);
r; in_test
" if not missing(in_test) then vartype=0;
; do;
else
;;%;
");
%;
in_test
= input(temp, ?? anydtdte21.);
1;
; not missing(in_test) then vartype=2;
if
%;
;
; else
do;
;
; if index(temp,"$") then in_test =
input(temp, ?? dollar21.);
if not missing(in_test) then vartype=4;
else do;
if index(temp,",") then in_test =
input(temp, ?? comma21.);
if not missing(in_test) then vartype=5;
i
create macro variables and write file (part 4 of 4)
else do;
f;%;
if index(temp,"%") then in_test =
%;
input(temp, ?? percent21.);
f; ;%;
if not missing(in_test) then vartype=3;
;
else vartype=1;
; ;; ;
end;
;g
%;
end;
;;
end;
r;
;
r; ; end;
missing(vartypes(i)) then vartypes(i)=vartype;
"
; if
else if vartype ne vartypes(i) then vartypes(i)=1;
;
");
%;;%;1;if missing(varlens(i)) or length(temp)
; gt varlens(i) then varlens(i)=length(temp);
%;
; ;end;
;
;end;
if i eq 1 then hold_rec=strip(temp);
else hold_rec=cat(strip(hold_rec),"09"x,strip(temp));
end;
put hold_rec;
end;
i
assign formats and informats
if eof then do;
ivartype=""; fvartype="";
f;%;
do i=1 to &var_count.;
%;
f; ;%;
if vartypes(i)=1 then do;
; itempvar=cat("$",strip(put(varlens(i),3.)),".");
; ;; ;ftempvar=itempvar;
;g
end;
g
%;
;;
else if vartypes(i)=2 then do;
r;
;; itempvar="anydtdte21."; ftempvar="date9.";
r;
" ;end;
else if vartypes(i)=3 then do;
;%;
%;
; ");1;");1; itempvar="percent."; ftempvar="percent8.2";
end;
;
%;
;;
; else
if vartypes(i)=4 then do;
;
; itempvar="dollar.";
; ;
; ftempvar=cat("dollar",strip(put(varlens(i),3.)),".");
end;
else if vartypes(i)=5 then do;
itempvar="comma.";
ftempvar=cat("comma",strip(put(varlens(i),3.)),".");
end;
else do;
itempvar="best12."; ftempvar="best12.";
end;
i
i
bring in user specs and create macro variables
f;%;
%;
f; ;%;i
if strip(informats(i)) ne "" then
;
;
; ;; ;
itempvar=strip(informats(i));
;g ;
if strip(formats(i)) ne "" then ftempvar=
g;
%;
;;
strip(formats(i));
r;
;; r; ivartype=catx(" ",ivartype,"informat",
r;
" ; ;
varnames(i),itempvar,";");
;
fvartype=catx(" ",fvartype,"format",
;
");
%;%;
;
; 1;");
varnames(i),ftempvar,";");
1;
;
;
%;
; ;;;; end;
i
i
;;
;; ;
call symput('informt',ivartype);
; call symput('formt',fvartype);
end;
end;
run;
create the data step that pastes the data
f;%;
%;
f; ;%;i
=
;
;
; ;; ; ;data &outfile. &vardrop.;
;g ; infile revised lrecl=32767 dsd delimiter="09"x;
g;
%;
;;
&informt.;
r;
;; r; &formt.;
r;
" ; ;
&varlabls.;
;
;%;
%;
; input &varnames.;
; ");1;");
;;;1; &varunits.;
%;
; ;;; run;
i
i
;;
;; ;
;r
proc delete data=work.form_varnames;
run;
filename clippy clear;
filename revised clear;
combined, the code really isn't very intimidating
and to complete the task
highlight the table
click copy
enter desired macro variable settings
run the code
however, it may not be THAT simple in all cases!
not all tables are directly copyable
(i.e., without losing critical table metadata)
a useful free set of tools you might find helpful
Adobe Acrobat Reader 6
http://www.oldapps.com/adobe_reader.php
Adobe Acrobat 5 TAPS Plugin
http://www.pdfhacks.com/TAPS/
any version of Adobe Acrobat
(other than Reader) may be needed
to convert a table into a PDF file in
order to take advantage of TAPS
how TAPS can be helpful (an example)
copy the first four lines from the table at
http://www.oecd.org/dataoecd/48/27/41498733.pdf
how TAPS can be helpful (an example)
on your monitor the table will appear as:
how TAPS can be helpful (an example)
but, if you paste it into notepad, it will appear as:
the heading for
column #10 appears
thereiniscolumn
no indication
#1 on the
nd
rd rows
that 2column
and 3#1's
heading is missing
there is no indication that
column #2 is missing
how TAPS can be helpful (an example)
but if you open it with Adobe 6 Reader with TAPS:
how TAPS can be helpful (an example)
drag a rectangle around the data you want
Table
right click on Text-Flow, click on Table, then copy
how TAPS can be helpful (an example)
now, if you paste it into notepad, it will appear as:
how TAPS can be helpful (an example)
then, using the following settings:
filename clippy clipbrd;
variable names are to
filename revised temp;
be combined
the variable
name from the
%let hrows=3;
three rows
for the firstfirst
column
%let spaces="
";
data
2ndbegin
thru on
10throw 4
shouldthe
bethe
"Country"
%let first_data_row=4;
nd thru
th
%let var_renames=1~Country;
variablesthe
should
all 10
use
2
%let var_labels=;
a percent8.1
format
variables
should
all
nd
th
%let var_prefix=2-10~Revenue_;
the
2 thru 10by the
be
preceded
%let var_suffix=;
variables
should all
string "Revenue_"
%let var_share=;
%let var_formats=2-10~percent8.1; use a best12. informat
%let var_informats=2-10~best12.;
the 2nd thru 10th
%let var_units=2-10~.01;
variables should all
%let var_drop=;
be multiplied by .01
%let var_upcase-;
%let var_missing=;
%let guessingrows=;
%let outfile=revenue;
how TAPS can be helpful (an example)
you will obtain the following SAS file:
not all browsers were created equally
the method described in this paper appears to
work better with Google Chrome
than with Microsoft Internet Explorer
and better with a TAPS enhanced version of
Adobe Acrobat Reader 6
than with newer versions of Acrobat Reader
sometimes a table will copy as a single column
e.g., copy the four column table at
http://www.thelawyer.com/directory/uk-200-table-top-100/
sometimes a table will copy as a single column
you'll find that entire columns get highlighted as
you drag your mouse from left to right
although it appears as a 101 row 4 column table
it actually copies as a 404 row 1 column table
adding the following macro variable assignments
will transpose the data as it is being "pasted"
%let columns=4;
%let rows=100;
importing non-tabular data
irrelevant
listing
id is
for example, what if you wanted
to
copy
a
page
titles
appear
each
entry's
firstwithout
record
any identifier
from a search at: http://www.worldcat.org/
author is always preceded
by type
the word
"by"without
appears
any identifier
Language is always
preceded by the string
"Language:"
Publisher
is always
Record
always
ends
preceded
by
the
string
with
the
string
"Publisher:"
"View all editions and formats"
Note: This site's terms of use explicitly prohibits the use of
programming routines to "mine" or harvest material amounts of Data
our solution: two extra macro variables
Record always ends
with the string "View all
editions and formats"
options NOQUOTELENMAX;
options datestyle=mdy;
filename clippy clipbrd;
filename revised temp;
%let spaces="
";
%let look_for=View all editions and formats;
%let skip_lines=1;
There
is
always
%let var_renames=2~Author;
one blank line
+ the same macro variables used previously between
records
a datastep to describe the form's layout
data form_varnames;
informat varname $50.;
input varname &;
cards;
1~Title~1~1~0
2~by~1~1~1
3~Type~1~1~0
4~Language:~1~1~1
5~Publisher:~1~1~1
;
Variable number
Variable
name
Whether
or
not
the field
or
field
header
has a header in the data
Field's
within a
(i.e.,position
0=no, 1=yes)
row (in right to left order)
Number of rows that
must be read to read
the entire field
and to complete the task
highlight the form
click copy
enter macro variable settings and field definitions
run the code
benefits
tables can be pasted from any source that
you can copy to your system's clipboard
should work on any operating system
has a number of useful import features
avoid extra datasteps
modifiable (new features can be added)
only requires base SAS
limitations
extremely software dependent
requires you to know your data
may require some additional software
unsupported - can't complain to
anyone if it doesn't work correctly
Author Contact Information
Your comments and questions are valued and encouraged.
Contact the authors:
Arthur Tabachneck, Ph.D.
Myqna, Inc.
Toronto, Ontario
e-mail: [email protected]
Richard A. DeVenezia
9949 East Steuben Road
Remsen, NY 13438
http://www.devenezia.com/contact.php
Nate Derby
Stakana Analytics
815 First Ave., Suite 287
Seattle, WA 98104-1404
e-mail: [email protected]
John King,
Ouachita Clinical Data Services, Inc.
Mount Ida, AR
e-mail: [email protected]
Ben Powell
Genworth Financial
e-mail: [email protected]
Randy Herbison,
Senior Systems Analyst Westat
1650 Research Boulevard
Rockville, MD 20850
e-mail: [email protected]