Transcript Document

GAP Toolkit 5
Training in basic drug abuse data management
and analysis
Training session 5
Coding
closed questions
Objectives
• To establish a set of practical coding rules for closed questions
• To explain the importance of assigning numbers to
characteristics
• To construct a framework for recording missing values
• To introduce identification numbers as a method of ensuring the
anonymity of respondents, while maintaining a link between files
and questionnaires
Components of a data file
• Cases or observations
• Variables
• Values
Coding
• The identification of the possible values of a variable
and the assignment of numbers to those values
• The numbers, representing the values, are stored in a
data file
Closed questions/categorical variables
•
•
•
•
A limited number of values
The values are mutually exclusive
The values are collectively exhaustive
Code by assigning a number to each value
Example
• Coding gender
• Possible values: male; female
• Coding scheme: 1 = Male; 2 = Female
Why numbers?
• Efficient use of computers
• Quicker to enter
• Not subject to spelling mistakes
Why numbers?
• Some statisticians define measurement as necessarily
resulting in numbers
• “To measure a property means to assign numbers to
units as a way of representing that property.”
(D. S. Moore, Statistics: Concepts and Controversies, 2nd ed. (New York, W. H. Freeman
Press, 1985)).
Pre-code
• Coding takes place before the questionnaire is
delivered
• The possible responses to a question are anticipated
• The coding appears on the questionnaire
Coding rules
• Codes must be:
– Mutually exclusive
– Collectively exhaustive
– Consistent across variables
(J. Fielding, “Coding and managing data”, Researching Social Life, N. Gilbert, ed.
(London, Sage Publications, 1993) and D. De Vaus, Surveys in Social Research
(London, Routledge, 2002)).
Continuous variables
• Do not generally require coding as:
– They are already numerical
– There is a potentially infinite number of categories
Coding in SPSS
• The Values column in Variable View is used to
implement coding in SPSS
• Numbers are allocated to each of the categories of a
variable
Example: coding Drug
• In data file Ex1.sav, a
variable called Drug was
defined as a string variable
and a number of drugs were
entered
Case summariesa
Drug
1
Heroin
2
Alcohol
3
Hashish
4
Bhang
5
Heroin
6
Hashish
Total
a
N
Limited to first 100 cases.
6
Coding Drug
• Decide on a set of numeric labels for the different
categories, in this case drugs:
–
–
–
–
1 = Heroin
2 = Alcohol
3 = Hashish
4 = Bhang
Coding Drug
• Create a new variable Drug2:
type = numeric; width = 2; decimals = 0;
label = Drug Coded
• Click on the Values column and then on the three dots
that appear to the right of the Values box to generate
the following dialogue box:
Click to register code
Frequency count for Drug Coded:
Drug Coded
Frequency
Valid
Percentage
Valid
percentage
Cumulative
percentage
Heroin
2
33.3
33.3
33.3
Alcohol
2
33.3
33.3
66.7
Hashish
1
16.7
16.7
83.3
Bhang
1
16.7
16.7
100.0
Total
6
100.0
100.0
Note
• Coding data does not change the level of measurement
• The level of measurement is a guide to the selection of
appropriate statistics
SPSS
• Value labels can be assigned to numeric variables and
string variables of eight or fewer characters
• By default, SPSS sets all numeric variables to Scale
variables
Exercise: coding
ID
DAP1-007
DAP1-008
DAP1-009
DAP1-010
DAP1-011
DAP1-012
Drug
Mandrax
Mandrax
Alcohol
Hashish
Mandrax
Alcohol
Age
27
21
45
52
22
28
Condition
Recovered
Relapsed
Recovered
Relapsed
Relapsed
Relapsed
Frequency count of Drug
Drug
Frequency
Valid
Percentage
Valid
percentage
Cumulative
percentage
Alcohol
3
25.0
25.0
25.0
Bhang
1
8.3
8.3
33.3
Hashish
3
25.0
25.0
58.3
Heroin
2
16.7
16.7
75.0
Mandrax
3
25.0
25.0
100.0
12
100.0
100.0
Total
Frequency count of Condition
Condition Coded
Frequency
Valid
Percentage
Valid
Percentage
Cumulative
percentage
Recovered
5
41.7
41.7
41.7
Relapsed
7
58.3
58.3
100.0
12
100.0
100.0
Total
Missing values
Missing values: causes
•
•
•
•
The question is not applicable
The respondent does not know
The respondent refuses to answer
No response is marked on the questionnaire (i.e., truly
missing and there is no clue why)
(De Vaus, 2002)
Coding missing values
• Use codes outside of the range of common values:
– e.g., 9, 99, -99, 999
• If possible, retain the same codes for the various
missing options for all variables
• The default missing value in SPSS is a full stop . and is
called the “system’s missing value”
SPSS: missing values
•
•
Part of the variable definition
Variable View: Missing column
–
–
Click on the Missing cell in the row defining the variable
Click on the three buttons that appear to the right of the
Missing cell and the following dialogue box will appear:
Exercise
• Three additional observations are obtained for Ex1.sav:
– DAP1-0013; Alcohol; 39; -----------– DAP1-0014; Hashish; --; Recovered
– DAP1-0015; ---------; 16; Relapsed
• Code necessary missing values for the variables
• Run a frequency count on Drug and Condition,
comparing percentage and valid percentage
Identification numbers
ID numbers: purpose
• An ID number:
– Ensures anonymity
– Links a row in the data file to a physical questionnaire
ID numbers: characteristics
• A unique identifier
• Sometimes contains information in a compound form
Example
• DAP1-001, DAP1-002, … :
– DAP is short for Drug Assessment Programme
– 001, 002 are consecutive numbers that uniquely identify each
questionnaire or respondent
– There must be at most 999 respondents, as space has only
been made available for 999 unique ID numbers
Summary
•
•
•
•
•
Coding closed questions
Value labels
Frequency counts
Missing values
ID numbers