ASA-TEF workshops on SDA

Download Report

Transcript ASA-TEF workshops on SDA

Using the SDA on the Web
Ed Nelson, CSU Fresno
Social Science Research and
Instructional Council
Survey Documentation and
Analysis (SDA) Program
 Written
at UC Berkeley
 Used by ICPSR and others-- referred to as
DAS (Data Analysis System)
 Data files must be converted to SDA format
before use. ICPSR has converted a number
of data sets in their topical archives into
SDA format and are converting more.
Sources of Data at ICPSR
(http://www.icpsr.umich.edu)
 ICPSR
topical archives
– National Archive of Computerized Data on
Aging (NACDA)
– National Archive of Criminal Justice Data
(NACJD)
– International Archive of Education Data
– Substance Abuse and Mental Health Data
Archive (SAMHSA)
 General
Social Survey
 National Election Study
General Procedure
 Select
study
 Open window to browse codebook
 Select what you want to do
 Click on START
What Can You Do?
 Browse
codebook
 Subset data
 Download data and documentation
 Run statistical procedures
Statistical Procedures
 Frequencies
 Crosstabs
 Comparison
of means
 Comparison of correlations
What Else Can You Do?
 Recode
(temporarily)
 Use control variables
 Use filter variables
 Use weight variable
Documentation and Data
 Codebook
(ASCII/PDF)
 SPSS/SAS/Stata syntax
 Data file
Using Statistical Programs
 Specify
variables
 Select display options (e.g., statistics, text
to display)
 Select action (run, clear)
Frequencies Program -- Specify
Variables
 Row
variable (required)
 Filter variables
 Weight variable
Frequencies Program -- Select
Statistics
 Percents
 Central
tendency -- mean, median, mode
 Variability -- standard deviation, variance
 Coefficient of Variation
 Standard error of the mean
Example: Monitoring the Future
 Explores
values, behavior, and lifestyles of
American youth
 Focus on drug use
 1975 to present
 Investigators: Jerald G. Bachman, Lloyd D.
Johnson, and Patrick M. O’Malley,
University of Michigan, Institute for Social
Research
Monitoring the Future -- Study
Design
 Self-administered
questionnaire
 8th, 10th, and 12th graders
 Multistage area probability sample
 Students randomly assigned to one of six
questionnaires
 Core questions -- demographics and drug
use
Select Study -- 1998 Monitoring
the Future
 ICPSR
study number 2751
 12 graders
 Year: 1998
Monitoring the Future -Variables of Interest
Demographics: V150 (sex), V151 (race) V163
(father’s educational level), V164 (mother’s
educational level)
 Religious variables: V169 (attend religious
services), V170 (importance of religion)
 Educational aspirations: V183 (attend four-year
college)
 Recreation: V194 (# of times go out per week),
V195 (# of dates per week)
 Drug use: V103 to V108 (alcohol), V112 to V114
(Marijuana), V124 to V126 (Cocaine)

Monitoring the Future -Frequencies
 Alcohol
use (V107--number of times drank
alcohol enough to feel pretty high)
 Importance of religion in life (V170)
Crosstabs Program -- Specify
Variables
 Dependent
variable -- row variable
(required)
 Independent variable -- column variable
(required)
 Control variables
 Filter variables
 Weight variable
Crosstabs Program -- Select
Statistics
 Percents
-- vertical (row), horizontal
(column), total
 Chi square (Pearson’s, Likelihood Ratio)
 Eta
 Gamma
 Tau-b and Tau-c
 Somer’s d
Monitoring the Future -Crosstabs (Bivariate)
 Row
(dependent) variable -- V107, number
of times drank alcohol enough to feel pretty
high
 Column (independent) variable -- V170,
importance of religion
Recoding (temporarily)
Let’s start by recoding the number of times the
respondent drank alcohol enough to feel pretty
high into two categories--none or few (1-2) and
half or more (3-5)
 V107 (r: 1-2 “few or none”; 3-5 “half or more”)

– Semicolon separates recodes
– Assigns values of 1, 2, etc.
– Value labels can be inserted within quotes

Missing data -- anything not recoded is treated as
missing data
Monitoring the Future -Crosstabs (Multivariate)
 Now
that we have run the two-variable
crosstab, let’s add a control variable.
 We’ll add the variable sex (V150) as the
control variable.
Comparison of Means Program -Specify Variables
 Dependent
variable (required)
 Row (independent) variable (required)
 Column (control) variable
 Control (additional) variable
 Filter variables
 Weight variable
Comparison of Means Program -Select Statistics
 Mean
of dependent variable
 Difference from overall mean
 Standard deviation
 Number of cases, weighted number of cases
 Standard errors and confidence intervals
Comparison of Means Program -Select Statistics (Advanced)
 Complex
samples
– Standard errors
– Design effect
– RHO statistic
 ANOVA
Monitoring the Future -Comparison of Means
 Compute
the mean use of Marijuana over
the respondent’s lifetime by the number of
times the respondent goes out in a week
 Dependent variable is V112 (use of
Marijuana over one’s lifetime)
 Row (independent) variable is V194
(number of times goes out in a week)
 Column (control) variable is V150 (sex)
Filter Variables
Can also use filter variables to select particular
cases
 Variable name (____; ____; ___)

– Where _____ stands for a range of values or a
particular value
– E.g., sex (1)
– E.g., age (65-89)

Using more than one filter variable
– E.g., sex (1), age (65-89) to select all those who are 1
on sex and age 65 to 89
– Joins the two variables with an AND
Subsetting Data Sets

Select the files you want to construct
– Data file (ASCII)
– Codebook (ASCII)
– Data definitions for SPSS or STATA or SAS
Select the cases to include (leave blank if you
want all the cases)
 Select the variables to include
