Mirror Outlier Detection in Foreign Trade Data
Download
Report
Transcript Mirror Outlier Detection in Foreign Trade Data
Mirror Outlier Detection in
Foreign Trade Data
Markos Fragkakis
NTTS 2009
Introduction
Foreign Trade data
Improvement of FT quality is essential
Quality can be assessed using several
dimensions (e.g. accuracy, timeliness, clarity)
We focus on accuracy using outlier
detection
Methods for outlier outlier detection (e.g.
threshold, model based)
Presentation of the Mirror Outlier
Detection application
2
Methodology
Univariate detection in time series (value,
quantity, supplementary quantity)
Median Absolute Deviation
xi M1
xi M1
Ti
c
M2
Median(| x j M1 |)
Robust
◦ median, not mean
◦ non-parametric
3
Mirror Outlier Detection
Characterization of outliers according
mirror flow.
Possible outlier types:
◦ Green: outlier appears in mirror (same sign)
◦ Red: outlier does not appear in mirror
◦ Violet: outlier appears in mirror (opposite
sign)
◦ Black: mirror series not present
◦ Pink: mirror series not present
(confidentiality)
4
Additional functionalities
Outlier classification (error in dimension,
not observed values)
◦ Swapping of observation between series
◦ Copy of observations
◦ Time delay (hidden green outlier)
Outlier detection in short series (product
code changes)
Reporting for
◦ Detected outliers per country (e-mailed)
◦ Summary reporting
5
Example of detected outlier
6
Example of error due to swap
7
Error due to time delay
8
Technical Information
MOD-DB has RDBMS repository for
storing outlier data (support for Oracle,
MySQL).
Implemented in Java (portability,
maintainability)
Command Line Interface
Performance issues
◦ Large volume of data cause bottleneck in DB
◦ Storage is in question (several GBs per
month)
9
Architecture
10
Proposal for new platform
Use a multi dimensional viewer
Enable OLAP functions (slice, dice, rollup
drilldown)
Create dynamic charts from data
Estimated variables (indices from raw outlier
data)
Data mining could be performed for
extracting inferences from data
◦ Log linear models
Pin-point of poor data involving high values
11
Conclusions
Use of mirror flow for outlier
chacterisation
New features
Improving quality
Enable building new platform for data
exploration
Expansions of MOD to other FT data
outside EU, other domain.
12
Questions
Thank you for your attention
13