Transcript EDIT

The Edit
Anders Norberg,
Statistics Sweden (SCB)
Work Session on Statistical Data Editing
Ljubljana, Slovenia, 9-11 May 2011
The environment of SELEKT
Input, throughput, output, use
Throughput
Input
-Coding
-Editing
-Imputation
-Estimation
Respondent (u) has one or several sampled units
Sampled unit (k)
Observed Background variable
unit (l)
1
2
Measurement var. (j)
Industry Gender Occup.
B
M
2
1
2=Wage
y jkl
Use
Output
Sum of wages by Industry
Industry
A
B
-Decision making
-Information
C
D
E
F-Z
3
4
Sum of wages by Occupation and Gender
Gender
Occupation
Men
1
2
3
4
Sum
Women Sum
The environment of SELEKT
Input, throughput, output, use
Throughput
Input
-Coding
-Editing
-Imputation
-Estimation
Respondent (u) has one or several sampled units
Sampled unit (k)
Observed Background variable
unit (l)
1
2
Measurement var. (j)
Industry Gender Occup.
B
M
2
1
2=Wage
y jkl
Use
Output
Sum of wages by Industry
Industry
A
B
-Decision making
-Information
C
D
E
F-Z
3
4
Sum of wages by Occupation and Gender
Gender
Suspicion
Occupation
Men
1
2
3
4
Sum
Women Sum
SELEKT 1.1
Raw+edited past
(cold) survey data
Survey specific cold
adapter (SAS code)
Data preparation
SAS data
set
Input (hot)
survey data
Edits
SNOWDON
-X analysis
Table of
Parameters
of edits
CLAN
estimation
software
Table of
Estimates
Records to
FOLLOW-UP
PRE-SELEKT
Parameter specifications,
Analysis of cold data
SAS
data set
AUTOSELEKT
Score calculation &
record flagging
Records to
IMPUTATION
Survey specific hot
adapter (SAS code)
Data preparation
Accepted
records
Process
data and
reports
Glossary of Terms on
Statistical Data Editing (1)
“EDIT RULE SPECIFICATION
CHECK RULE SPECIFICATION
A set of check rules that should be applied
in the given editing task.”
Glossary of Terms on
Statistical Data Editing (2)
“CHECKING RULE
A logical condition or a restriction to the
value of a data item or a data group which must
be met if the data is to be considered correct. In
various connections other terms are used, e.g.
edit rule.”
Recommended Practices for
Editing and Imputation in Crosssectional Business Surveys
“EDIT
A logical condition or a restriction to the
value of a data item or a data group which must
be met if the data is to be considered correct.
Also known as edit rule or checking rule.”
Example 1
if Occupation = ‘Doctor’ and
not (29000 < Salary < 71000)
then Errcode_A01 = ‘Flag’
Example 1 The test variable
if Occupation = ‘Doctor’ and
not (29000 < Salary < 71000)
then Errcode_A01 = ‘Flag’
Example 1 The edit group
if Occupation = ‘Doctor’ and
not (29000 < Salary < 71000)
then Errcode_A01 = ‘Flag’
Example 1 The acceptance region
if Occupation = ‘Doctor’ and
not (29000 < Salary < 71000)
then Errcode_A01 = ‘Flag’
Example 2 The test variable
if
Occupation = ‘Doctor’ and
not (29000 < Salary < 71000)
or Occupation = ‘Nurse’ and
not (23300 < Salary < 43800)
then Errcode_A02 = ‘Flag’
Example 2 The edit groups
Occupation = ‘Doctor’ and
not (29000 < Salary < 71000)
if
or Occupation = ‘Nurse’ and
not (23300 < Salary < 43800)
then Errcode_A02 = ‘Flag’
Example 2 The acceptance regions
if
Occupation = ‘Doctor’ and
not (29000 < Salary < 71000)
or Occupation = ‘Nurse’ and
not (23300 < Salary < 43800)
then Errcode_A02 = ‘Flag’
Edits
EDIT
Edit identification
Type of edit
Active
Section
Internal error message
External error message
Instruction for data review
Un-edited test variable
Error flag
EDIT GROUP AND
ACCEPTANCE
REGION
Edit identification
Edit group
Acceptance region
Edits
EDIT GROUP AND
ACCEPTANCE REGION
EDIT
Edit identification
Type of edit
Active
Section
Internal error message
External error message
Instruction for data review
1
Edit identification
Edit group
Acceptance region
EDIT PRACTICAL SUPPORT
2
Un-edited test variable
Error flag
3
Edit identification
Standard edit rule
Edited test variable
Suspicion probability value
produced by the SELEKT system
IMPACT ON STATISTICS
LINK
Edit identification
Survey variable
4
5
FLAGGING EDITS,
VARIABLES AND
UNITS
Survey variable
Potent. impact on statistics
My questions (1)
• Can most edits be described as
consisting of the components
– test variable
– edit group
– acceptance region ?
• What types of edits can not?
My questions (2)
If the edits can be described this way,
what arguments are there for saying that
– one edit has only one edit group and one
acceptance region
– one edit can be composed of many edit
groups with one acceptance region each?
My questions (3)
Can you give me examples of
• similar modeling of edits
• metadata storage for edits
• edit script generator using a standard
metadata storage for edits