สำนักงานสถิติแห่งชาติ - United Nations Statistics Division

Download Report

Transcript สำนักงานสถิติแห่งชาติ - United Nations Statistics Division

Data Processing of
the 2010 Population and
Housing Census
National Statistical Office, Thailand
15-19 September 2008, Bangkok, Thailand
DATA CAPTURING
CONTENT
 Hardware & Software of ICR System
 TELEform / ABBYY Functions
 Step of ICR System in NSO
 Specific questionnaires for ICR System
DATA CAPTURING
ICR for The Population Census 2000
NSO was firstly used ICR System to process the
Population Census questionnaires in 2000 by scanning
the 16 million households (16 million Forms) which
spent only 8 months to process the raw data instead of
18 months by using Key in Data System.
DATA CAPTURING
TELEform Hardware & Software System in 2000
TELEform Hardware System
TELEform Software System
 NetServer for TELEform Server (1)
 NetServer for Database Server (1)
TELEform 6.2 Elite Enterprise
Edition Components :




Reader Modules Workstations (21)
Verifier Modules Workstations (55)
Scanner Control Workstations (6)
Scanner Fujitsu M4099D (6)
 TELEform Designer
 TELEform Reader
 TELEform Verifier
DATA CAPTURING
ICR System in 2003
ICR System in NSO (Thailand) can be
divided into 2 parts :
 TELEform Software System
 ABBYY Software System
DATA CAPTURING
ICR for The Agricultural Census 2003
NSO hired ABBYY
Software to process about
25% of The Agricultural
Census 2003 questionnaires
that were totally 5.8 million
households (24 million
forms).
DATA CAPTURING
TELEform Hardware & Software System in 2003
TELEform Hardware System
TELEform Software System
 NetServer for TELEform Server (1)
 NetServer for Database Server (1)
TELEform 6.2 Elite Enterprise
Edition Components :




Reader Modules Workstations (21)
Verifier Modules Workstations (30)
Scanner Control Workstations (6)
Scanner Fujitsu M4099D (6)
 TELEform Designer
 TELEform Reader
 TELEform Verifier
DATA CAPTURING
ABBYY Hardware & Software System in 2003
ABBYY Hardware System




IBM Server X Series 225 (1)
ABBYY Software System
Correction Station (1)
ABBYY FormReader 6.0
Enterprise Edition Components:
Verifier Modules Workstations (25)
 Form Design
Scanner Control Workstations (4)
 Administration Station
 Recognition Station
 Correction Station
 Scanner Fujitsu M4099D (4)
 Storageflex LT707 (1)
DATA
CAPTURING
TELEform & ABBYY Functions
DATA CAPTURING
TELEform / ABBYY Designer Function
To create template form by fix field boxes on questionnaire.
DATA CAPTURING
TELEform Reader / ABBYY Administration Function
 To evaluate the questionnaires
 Export the corrected questionnaires to a data file
 Send the unclear questionnaires to TELEform/ABBYY
Verifier Function for correcting and transferring the
corrected questionnaires to a data file
 Store scanned images
DATA CAPTURING
TELEform / ABBYY Verifier Function
 To correct questionnaires that
contain mismarked or illegible fields
 The corrected questionnaires are
automatically exported to a data file
DATA CAPTURING
Functions Speed
Scanning speed support A7 to A3 paper sizes
 Simplex is provided 90 papers / minute. (A4 portrait)
 Duplex is provided 180 images / minute.(A4)
NSO questionnaires projects are mostly printed with A3 (297 x 420 mm.)
paper sizes.
Functions
Estimated Speed
(sheets/minute)
Scanner
45
Reader
17
Verifier
5
DATA
CAPTURING
Step of ICR System in NSO
DATA CAPTURING
Step of ICR System in NSO
Scan and Forms Distribution :
The questionnaires are scanned in each Block / Village and created
Multi Page Image Files.
Forms Evaluation :
The questionnaire images are evaluated. The corrected questionnaires
which skipped Verifier Workstations and directly exported to
Database server.
DATA CAPTURING
Step of ICR System in NSO (cont’)
Forms Verification :
The unclear questionnaires are needed to review and corrected it in
Verifier Workstations before transferring to Database server.
Data Export :
 Link a data file from Database server to IBM Mainframe
System
 Store Scanned image files to CD.
DATA CAPTURING
Scan & Forms Distribution
Questionnaire Scan
Image File
DATA CAPTURING
Forms Verification and Data Export
Export data
for
processing
Verify
Storage
(Images files)
CD
DATA CAPTURING
Input – Output of ICR System
Ascii files
ICR
Image files
Questionnaire
ICR Linkage System
TELEform Software
Questionnaires
S
Scanners
6 unit
ABBYY Software
PC 6 unit
controller
Scanners
4 unit
PC 4 unit
controller
Storage
(HD 880
GB)
Questionnaires
Verifications
30 unit
HP
Server
CD
IBM
Server
Verifications
25 unit
Transfers
COMPAQ
Mainframe
Readers
Server
21 unit
Processing
(Editing & Reporting)
- Database
- Software
- Backup Data
- Administration
- Export
- Recognition
Correction
station
1 unit
DATA
CAPTURING
Specific Questionnaires
for ICR System
DATA CAPTURING
Specific questionnaires for ICR System
 The questionnaires must be designed and printed in
quality of paper, specific colour answer field boxes
(blue, green, red)
 To record the questionnaires should be used at least
2HB pencil
 To distribute and collect as well as return questionnaires
should be done with caution.
DATA CAPTURING
ICR Benefits




Reduce Cost
Reduce Time
Efficient Data Capture
Increase Data Accuracy
DATA CAPTURING
Major Problems Encountered in 2000 Census
 Strictly designed questionnaires : Paper, size, color,
figure and answer field boxes
 Record questionnaires should be fixed pencil and
handwriting
 Distribution and return questionnaires should be
carful
DATA EDITING
EDITING & TABULATION
DATA PROCESSING STEP
Questionnaires
ICR
Machine Edit
Tab/Report
Listing
Validate Data
(Manual, Cold deck, Hot deck)
Error?
No
Tabulation &
Report
Yes
Checking Error & List Data
Comparing with
questionnaire images
Table Checking
No
Accept?
Yes
DATA PROCESSING STEP
Editing Process
 Validity:
Check characteristics of the message structure.
 Possible code:
Check in each field which the out-of-range fields
values is shown in asterisk (*) code.
DATA PROCESSING STEP
Editing Process (cont’)
 Consistency:
Check inconsistent values within record and across
record. Messages are shown the related conditional
codes. All error is printed in continuous paper forms
to be considerated and validated by subject matter
until no messages error found.
 Imputation:
Automatic editing programs.
DATA PROCESSING STEP
Tabulation Process
 Tabulation:
Report summary data which can be processed after
data completely cleaned for subject matter to analyze
the results of output.
DATA PROCESSING STEP
Mainframe : IBM Multiprise 2000 Model 206
1.1 Operating System
- OS/390 v.2 release 8
1.2 Compiler
- PL/I
1.3 Statistic Program
- Base SAS
1.4 Application Development Tools
- Performance Reporter for OS/390
DATA PROCESSING STEP
Personal Computer (PC)
2.1 Operating System
- Windows XP
2.2 Package
- MS Office 2003
- MS Studio v.6 (Visual FoxPro)
- SPSS
- CSPro 3.3
THANK YOU
FOR YOUR ATTENTION