Data Management and Exploratory Data Analysis Yongyuth

Download Report

Transcript Data Management and Exploratory Data Analysis Yongyuth

Data Management and Exploratory Data Analysis
Yongyuth Chaiyapong Ph.D. (Mathematical Statistics)
Department of Statistics
Faculty of Science
Chiang Mai University
24 Nov 2007
Data Management and
Exploratory Data Analysis
1
Data Management
• Data: Characteristics or Attributes of Interested
That are Observed From Population or Sampled
units
• Manage: to Handle, Direct, Govern, or Control in
Action or Use
24 Nov 2007
Data Management and
Exploratory Data Analysis
2
Accomplishments
•
•
•
•
•
•
•
Post Graduate Study and Research
Fact or Conclusion
Immediate and Long Term Utilization
Reliability of Conclusion
Error
Cost
Efficiency
24 Nov 2007
Data Management and
Exploratory Data Analysis
3
Research and Statistics
•
•
•
•
•
Deductive and Inductive Methods
Statistics Science
Science of Data
Inference about Population (s) of Interest
Conclusions
24 Nov 2007
Data Management and
Exploratory Data Analysis
4
Statistical Inference
• Inductive Method
• Error and Reliability
• Quality of Research Outcomes
24 Nov 2007
Data Management and
Exploratory Data Analysis
5
Error
• Source Error
• Error Measurement
• Error Control
24 Nov 2007
Data Management and
Exploratory Data Analysis
6
Type of Error
• Sampling Error
• Non-sampling Error
24 Nov 2007
Data Management and
Exploratory Data Analysis
7
Sources of Sampling Error
– Sample Size
– Population Size
– Variance of Population
24 Nov 2007
Data Management and
Exploratory Data Analysis
8
Statistical Methodology
•
•
•
•
Point Estimation
Interval Estimation
Hypotheses Testing
Statistical Modeling
24 Nov 2007
Data Management and
Exploratory Data Analysis
9
Measurement of Sampling Error
– Variance
– Type I and Type II Errors
– Coverage Probability
24 Nov 2007
Data Management and
Exploratory Data Analysis
10
Sampling Error Control
– Research Design
– Statistical Methodology
– Design of Data Collection
– Data Analysis
24 Nov 2007
Data Management and
Exploratory Data Analysis
11
Non-sampling Error
•
•
•
•
•
Data Collection
Concepts and Definitions
Data Entry and Verification
Editing and Updating
Data Processing
24 Nov 2007
Data Management and
Exploratory Data Analysis
12
Measurement of Non-sampling Error
• Difficult to Detect
• No Measurement
• No Statistical Theory
24 Nov 2007
Data Management and
Exploratory Data Analysis
13
Non-sampling Error Control
• Quality Assurance
• Quality Control
24 Nov 2007
Data Management and
Exploratory Data Analysis
14
Biomedical Research Cycle
for clinical research
Clinical
Care
Utilize
Results
New Ideas
for basic research
•trial simulators
•trial costing
•protocol authoring
• data mgmt plan
Protocol &
Funding
Findings
for patient
care & policy
Design
Study
•data processing & analysis
•reporting
Conduct
Study
•data acquisition & management
Approval &
Preparation
•GCP compliance
24 Nov 2007
Source: Ida Sim, UCSF
Data Management and
Exploratory Data Analysis
Activate
Study
•IRB approval
•CRF design
15
Quality Assurance
• Describes the Systems and Processes
• Established to Ensure that the Data are Collected in
Compliance with the Standard Requirement
24 Nov 2007
Data Management and
Exploratory Data Analysis
16
Quality Control (QC)
• The Operational Techniques and Activities
Undertaken to Verify that the Requirements for
Quality of the Research have been Fulfilled
• No errors, Inconsistencies, or Omissions
24 Nov 2007
Data Management and
Exploratory Data Analysis
17
Research Data Management Principles
What is Research Data Management?
Project
Initiation
Data Design
a process that begins with
conception and design of a
research project, continues
through data capture and
analysis to publication, data
archiving and data sharing with
the broader scientific
community.
24 Nov 2007
Data Management and
Exploratory Data Analysis Publication
Data Acquisition
and
Quality Control
Data Manipulation
and
Quality Assurance
Data Analysis
and
Interpretation
Data Access
and
Archiving
18
Functions of Data Management
in Biomedical Research (Clinical Trial)
Time
Results
Analysis
Clinic/Site Staff:
Data Closure
Data Correction
Data Collection
GCP Compliance
Data Management Staff:
Data Cleanup
Clinic Data
Data QC
Data Entry
Management
Database Design
Research
24 Nov 2007
Data Management and
Exploratory Data Analysis
Study Initiation
Site Development
Protocol Development
19
Research Data Management Principles
Data Acquisition
• Data Storage
• Database Validation, Programming and Standards
• Data Entry and Data Processing
• Safety Data Management and Reporting
• Measuring Data Quality
• Assuring Data Quality
• Database Closure
•
24 Nov 2007
Data Management and
Exploratory Data Analysis
20
Data Management
•
•
•
•
Data Design
Data Acquisition
Data Entry
Data Quality
24 Nov 2007
Data Management and
Exploratory Data Analysis
21
Source
Document
&
CRF
(2O Data Source)
Hospital data
collection form
Medical records
Laboratory results
CRF(s)
24 Nov 2007
Data Management and
Exploratory Data Analysis
22
Standard Data Management Flow
Database Lock
Paper
CRF
Data Entry
Data Analysis
& Report
Data Validation
Data Collection
Paper
DCF
Data Clarification
Database Lock
Internet
eCRF
Web Server
Internet
Data Collection
24 Nov 2007
Data Validation
eDCF
Data Management and
Exploratory Data Analysis
Data Clarification
Data Analysis
& Report
(eSubmission)
23
Data Validation
Standards in Data Validation:
•· Making sure that the raw data were accurately entered into a
computer-readable file.
•· Checking that character variables contain only valid values.
•· Checking that numeric values are within predetermined ranges.
•· Checking for and eliminating duplicate data entries.
•· Checking if there are missing values for variables where
complete data are necessary.
•· Checking for uniqueness of certain values, such as subject ID's.
•· Checking for invalid date values and invalid date sequences.
•· Verifying that complex multi- file [or cross panel] rules have
been followed. For example, if an adverse event of type X occurs,
other data such as concomitant medications or procedures might be
expected.
24 Nov 2007
Data Management and
Exploratory Data Analysis
24
Common Types of Data Problem/Error
Data type
Size / Out-of-range values
Missing data
Coding error
Spelling error / Illegibility
Inconsistency
Errors in conduct of protocol
24 Nov 2007
Data Management and
Exploratory Data Analysis
25